WorldWideScience

Sample records for sequence variation database

  1. Quality standards for DNA sequence variation databases to improve clinical management under development in Australia

    Directory of Open Access Journals (Sweden)

    B. Bennetts

    2014-09-01

    Full Text Available Despite the routine nature of comparing sequence variations identified during clinical testing to database records, few databases meet quality requirements for clinical diagnostics. To address this issue, The Royal College of Pathologists of Australasia (RCPA in collaboration with the Human Genetics Society of Australasia (HGSA, and the Human Variome Project (HVP is developing standards for DNA sequence variation databases intended for use in the Australian clinical environment. The outputs of this project will be promoted to other health systems and accreditation bodies by the Human Variome Project to support the development of similar frameworks in other jurisdictions.

  2. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DN...... on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn. Udgivelsesdato: 2005-Jan-1...

  3. Improvements in the HbVar database of human hemoglobin variants and thalassemia mutations for population and sequence variation studies.

    NARCIS (Netherlands)

    G.P. Patrinos (George); B. Giardine (Belinda); C. Riemer (Cathy); W. Miller (Webb); D.H. Chui (David); N.P. Anagnou (Nicholas); H. Wajcman (Henri); R.C. Hardison (Ross)

    2004-01-01

    textabstractHbVar (http://globin.cse.psu.edu/globin/hbvar/) is a relational database developed by a multi-center academic effort to provide up-to-date and high quality information on the genomic sequence changes leading to hemoglobin variants and all types of thalassemia and

  4. Winnowing sequences from a database search.

    Science.gov (United States)

    Berman, P; Zhang, Z; Wolf, Y I; Koonin, E V; Miller, W

    2000-01-01

    In database searches for sequence similarity, matches to a distinct sequence region (e.g., protein domain) are frequently obscured by numerous matches to another region of the same sequence. In order to cope with this problem, algorithms are developed to discard redundant matches. One model for this problem begins with a list of intervals, each with an associated score; each interval gives the range of positions in the query sequence that align to a database sequence, and the score is that of the alignment. If interval I is contained in interval J, and I's score is less than J's, then I is said to be dominated by J. The problem is then to identify each interval that is dominated by at least K other intervals, where K is a given level of "tolerable redundancy." An algorithm is developed to solve the problem in O(N log N) time and O(N*) space, where N is the number of intervals and N* is a precisely defined value that never exceeds N and is frequently much smaller. This criterion for discarding database hits has been implemented in the Blast program, as illustrated herein with examples. Several variations and extensions of this approach are also described.

  5. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  6. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  7. Genomic Sequence Variation Markup Language (GSVML).

    Science.gov (United States)

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  8. The International Nucleotide Sequence Database Collaboration.

    Science.gov (United States)

    Cochrane, Guy; Karsch-Mizrachi, Ilene; Nakamura, Yasukazu

    2011-01-01

    Under the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), globally comprehensive public domain nucleotide sequence is captured, preserved and presented. The partners of this long-standing collaboration work closely together to provide data formats and conventions that enable consistent data submission to their databases and support regular data exchange around the globe. Clearly defined policy and governance in relation to free access to data and relationships with journal publishers have positioned INSDC databases as a key provider of the scientific record and a core foundation for the global bioinformatics data infrastructure. While growth in sequence data volumes comes no longer as a surprise to INSDC partners, the uptake of next-generation sequencing technology by mainstream science that we have witnessed in recent years brings a step-change to growth, necessarily making a clear mark on INSDC strategy. In this article, we introduce the INSDC, outline data growth patterns and comment on the challenges of increased growth.

  9. VariVis: a visualisation toolkit for variation databases

    Directory of Open Access Journals (Sweden)

    Smith Timothy D

    2008-04-01

    Full Text Available Abstract Background With the completion of the Human Genome Project and recent advancements in mutation detection technologies, the volume of data available on genetic variations has risen considerably. These data are stored in online variation databases and provide important clues to the cause of diseases and potential side effects or resistance to drugs. However, the data presentation techniques employed by most of these databases make them difficult to use and understand. Results Here we present a visualisation toolkit that can be employed by online variation databases to generate graphical models of gene sequence with corresponding variations and their consequences. The VariVis software package can run on any web server capable of executing Perl CGI scripts and can interface with numerous Database Management Systems and "flat-file" data files. VariVis produces two easily understandable graphical depictions of any gene sequence and matches these with variant data. While developed with the goal of improving the utility of human variation databases, the VariVis package can be used in any variation database to enhance utilisation of, and access to, critical information.

  10. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  11. Understanding human DNA sequence variation.

    Science.gov (United States)

    Kidd, K K; Pakstis, A J; Speed, W C; Kidd, J R

    2004-01-01

    Over the past century researchers have identified normal genetic variation and studied that variation in diverse human populations to determine the amounts and distributions of that variation. That information is being used to develop an understanding of the demographic histories of the different populations and the species as a whole, among other studies. With the advent of DNA-based markers in the last quarter century, these studies have accelerated. One of the challenges for the next century is to understand that variation. One component of that understanding will be population genetics. We present here examples of many of the ways these new data can be analyzed from a population perspective using results from our laboratory on multiple individual DNA-based polymorphisms, many clustered in haplotypes, studied in multiple populations representing all major geographic regions of the world. These data support an "out of Africa" hypothesis for human dispersal around the world and begin to refine the understanding of population structures and genetic relationships. We are also developing baseline information against which we can compare findings at different loci to aid in the identification of loci subject, now and in the past, to selection (directional or balancing). We do not yet have a comprehensive understanding of the extensive variation in the human genome, but some of that understanding is coming from population genetics.

  12. MIPS: a database for protein sequences and complete genomes.

    Science.gov (United States)

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  13. Study of event sequence database for a nuclear power domain

    International Nuclear Information System (INIS)

    Kusumi, Yoshiaki

    1998-01-01

    A retrieval engine developed to extract event sequences from an accident information database using a time series retrieval formula expressed with ordered retrieval terms is explored. This engine outputs not only a sequence which completely matches with a time series retrieval formula, but also sequence which approximately matches the formula (fuzzy retrieval). An event sequence database in which records consist of three ordered parameters, namely the causal event, the process and result. Then the database is used to assess the feasibility of this engine and favorable results were obtained. (author)

  14. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

    Science.gov (United States)

    Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

    2011-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

  15. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  16. Polymorphism Sequence - JSNP | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us JSNP Polymorphism Sequence Data detail Data name Polymorphism Sequence DOI 10.18908/lsdba.nb...dc00114-001 Description of data contents Information on polymorphisms (SNPs and insertions/deletions) and th...se Name database name JSNP_SNP: single nucleotide polymorphism JSNP_InsDel_IND: insertion/deletion JSNP_InsD...ved allele observed 3' Flanking Sequence 3' flanking sequence Offset in Flanking Sequence position of the polymorphism...uence Accession No. accession No. of the sequence for polymorphism screening Offset in Record position of the polymorphism

  17. MSDB: A Comprehensive Database of Simple Sequence Repeats.

    Science.gov (United States)

    Avvaru, Akshay Kumar; Saxena, Saketh; Sowpati, Divya Tej; Mishra, Rakesh Kumar

    2017-06-01

    Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  19. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  20. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Directory of Open Access Journals (Sweden)

    Qingyu Chen

    Full Text Available First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases.We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  1. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Science.gov (United States)

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  2. FINDbase: A worldwide database for genetic variation allele frequencies updated

    NARCIS (Netherlands)

    M. Georgitsi (Marianthi); E. Viennas (Emmanouil); D.I. Antoniou (Dimitris I.); V. Gkantouna (Vassiliki); S. van Baal (Sjozef); E.F. Petricoin (Emanuel F.); K. Poulas (Konstantinos); G. Tzimas (Giannis); G.P. Patrinos (George)

    2011-01-01

    textabstractFrequency of INherited Disorders database (FIND base; http://www.findbase. org) records frequencies of causative genetic variations worldwide. Database records include the population and ethnic group or geographical region, the disorder name and the related gene, accompanied by links to

  3. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der [California Univ., San Francisco, CA (United States); Univ. of California, Berkeley, CA (United States)

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  4. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  5. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  6. BIOPEP database and other programs for processing bioactive peptide sequences.

    Science.gov (United States)

    Minkiewicz, Piotr; Dziuba, Jerzy; Iwaniak, Anna; Dziuba, Marta; Darewicz, Małgorzata

    2008-01-01

    This review presents the potential for application of computational tools in peptide science based on a sample BIOPEP database and program as well as other programs and databases available via the World Wide Web. The BIOPEP application contains a database of biologically active peptide sequences and a program enabling construction of profiles of the potential biological activity of protein fragments, calculation of quantitative descriptors as measures of the value of proteins as potential precursors of bioactive peptides, and prediction of bonds susceptible to hydrolysis by endopeptidases in a protein chain. Other bioactive and allergenic peptide sequence databases are also presented. Programs enabling the construction of binary and multiple alignments between peptide sequences, the construction of sequence motifs attributed to a given type of bioactivity, searching for potential precursors of bioactive peptides, and the prediction of sites susceptible to proteolytic cleavage in protein chains are available via the Internet as are other approaches concerning secondary structure prediction and calculation of physicochemical features based on amino acid sequence. Programs for prediction of allergenic and toxic properties have also been developed. This review explores the possibilities of cooperation between various programs.

  7. Human Variome Project Quality Assessment Criteria for Variation Databases.

    Science.gov (United States)

    Vihinen, Mauno; Hancock, John M; Maglott, Donna R; Landrum, Melissa J; Schaafsma, Gerard C P; Taschner, Peter

    2016-06-01

    Numerous databases containing information about DNA, RNA, and protein variations are available. Gene-specific variant databases (locus-specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision-making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance. © 2016 WILEY PERIODICALS, INC.

  8. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  9. The development and application of a Mycoplasma gallisepticum sequence database.

    Science.gov (United States)

    Armour, Natalie K; Laibinis, Victoria A; Collett, Stephen R; Ferguson-Noel, Naola

    2013-01-01

    Molecular analysis was conducted on 36 Mycoplasma gallisepticum DNA extracts from tracheal swab samples of commercial poultry in seven South African provinces between 2009 and 2012. Twelve unique M. gallisepticum genotypes were identified by polymerase chain reaction and sequence analysis of the 16S-23S rRNA intergenic spacer region (IGSR), M. gallisepticum cytadhesin 2 (mgc2), MGA_0319 and gapA genetic regions. The DNA sequences of these genotypes were distinct from those of M. gallisepticum isolates in a database composed of sequences from other countries, vaccine and reference strains. The most prevalent genotype (SA-WT#7) was detected in samples from commercial broilers, broiler breeders and layers in five provinces. South African M. gallisepticum sequences were more similar to those of the live vaccines commercially available in South Africa, but were distinct from that of F strain vaccine, which is not registered for use in South Africa. The IGSR, mgc2 or MGA_0319 sequences of three South African genotypes were identical to those of the ts-11 vaccine strain, necessitating a combination of mgc2 and IGSR targeted sequencing to differentiate South African wild-type genotypes from ts-11 vaccine. To identify and differentiate all 12 wild-types, mgc2, IGSR and MGA_0319 sequencing was required. Sequencing of gapA was least effective at strain differentiation. This research serves as a model for the development of an M. gallisepticum sequence database, and illustrates its application to characterize M. gallisepticum genotypes, select diagnostic tests and better understand the epidemiology of M. gallisepticum.

  10. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  11. Comprehensive Genetic Database of Expressed Sequence Tags for Coccolithophorids

    Science.gov (United States)

    Ranji, Mohammad; Hadaegh, Ahmad R.

    Coccolithophorids are unicellular, marine, golden-brown, single-celled algae (Haptophyta) commonly found in near-surface waters in patchy distributions. They belong to the Phytoplankton family that is known to be responsible for much of the earth reproduction. Phytoplankton, just like plants live based on the energy obtained by Photosynthesis which produces oxygen. Substantial amount of oxygen in the earth's atmosphere is produced by Phytoplankton through Photosynthesis. The single-celled Emiliana Huxleyi is the most commonly known specie of Coccolithophorids and is known for extracting bicarbonate (HCO3) from its environment and producing calcium carbonate to form Coccoliths. Coccolithophorids are one of the world's primary producers, contributing about 15% of the average oceanic phytoplankton biomass to the oceans. They produce elaborate, minute calcite platelets (Coccoliths), covering the cell to form a Coccosphere and supplying up to 60% of the bulk pelagic calcite deposited on the sea floors. In order to understand the genetics of Coccolithophorid and the complexities of their biochemical reactions, we decided to build a database to store a complete profile of these organisms' genomes. Although a variety of such databases currently exist, (http://www.geneservice.co.uk/home/) none have yet been developed to comprehensively address the sequencing efforts underway by the Coccolithophorid research community. This database is called CocooExpress and is available to public (http://bioinfo.csusm.edu) for both data queries and sequence contribution.

  12. A database and API for variation, dense genotyping and resequencing data

    Directory of Open Access Journals (Sweden)

    Flicek Paul

    2010-05-01

    Full Text Available Abstract Background Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources. Results Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments. Conclusions Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at http://www.ensembl.org.

  13. In silico detection of sequence variations modifying transcriptional regulation.

    Directory of Open Access Journals (Sweden)

    Malin C Andersen

    2008-01-01

    Full Text Available Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers. The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

  14. In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

    Science.gov (United States)

    Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

    2008-01-01

    Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319

  15. Variational multi-valued velocity field estimation for transparent sequences

    DEFF Research Database (Denmark)

    Ramírez-Manzanares, Alonso; Rivera, Mariano; Kornprobst, Pierre

    2011-01-01

    Motion estimation in sequences with transparencies is an important problem in robotics and medical imaging applications. In this work we propose a variational approach for estimating multi-valued velocity fields in transparent sequences. Starting from existing local motion estimators, we derive...... a variational model for integrating in space and time such a local information in order to obtain a robust estimation of the multi-valued velocity field. With this approach, we can indeed estimate multi-valued velocity fields which are not necessarily piecewise constant on a layer –each layer can evolve...

  16. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    Science.gov (United States)

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  17. cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project cDNA sequence quality data Data detail Data name cDNA sequence quality... data DOI 10.18908/lsdba.nbdc00838-003 Description of data contents Phred's quality score. P...tion Download License Update History of This Database Site Policy | Contact Us cDNA sequence quality

  18. MIPS: a database for protein sequences, homology data and yeast genome information.

    Science.gov (United States)

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  19. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications...

  20. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    Science.gov (United States)

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-04

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Mitochondrial D-loop sequence variation among Italian horse breeds

    Directory of Open Access Journals (Sweden)

    Zanotti Marta

    2004-11-01

    Full Text Available Abstract The genetic variability of the mitochondrial D-loop DNA sequence in seven horse breeds bred in Italy (Giara, Haflinger, Italian trotter, Lipizzan, Maremmano, Thoroughbred and Sarcidano was analysed. Five unrelated horses were chosen in each breed and twenty-two haplotypes were identified. The sequences obtained were aligned and compared with a reference sequence and with 27 mtDNA D-loop sequences selected in the GenBank database, representing Spanish, Portuguese, North African, wild horses and an Equus asinus sequence as the outgroup. Kimura two-parameter distances were calculated and a cluster analysis using the Neighbour-joining method was performed to obtain phylogenetic trees among breeds bred in Italy and among Italian and foreign breeds. The cluster analysis indicates that all the breeds but Giara are divided in the two trees, and no clear relationships were revealed between Italian populations and the other breeds. These results could be interpreted as showing the mixed origin of breeds bred in Italy and probably indicate the presence of many ancient maternal lineages with high diversity in mtDNA sequences.

  2. Sequence variations in the FAD2 gene in seeded pumpkins.

    Science.gov (United States)

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-12-21

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2.

  3. AgdbNet – antigen sequence database software for bacterial typing

    Directory of Open Access Journals (Sweden)

    Maiden Martin CJ

    2006-06-01

    Full Text Available Abstract Background Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences. Results Databases are described by XML files that are parsed by a Perl CGI script. Each database can have any number of loci, which may be defined by nucleotide and/or peptide sequences. The software is currently in use on at least five public databases for the typing of Neisseria meningitidis, Campylobacter jejuni and Streptococcus equi and can be set up to query internal isolate tables or suitably-configured external isolate databases, such as those used for multilocus sequence typing. The style of the resulting website can be fully configured by modifying stylesheets and through the use of customised header and footer files that surround the output of the script. Conclusion The software provides a rapid means of setting up customised Internet antigen sequence databases. The flexible configuration options enable typing schemes with differing requirements to be accommodated.

  4. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Directory of Open Access Journals (Sweden)

    Suldbold Jargalmaa

    2017-07-01

    Full Text Available Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II. These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species.

  5. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Science.gov (United States)

    Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

    2017-01-01

    Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785

  6. License - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project License to Use This Database Last updated : 2010/02/15 You may use this databas...ional License described below. The Standard License specifies the license terms regarding the use of this database... and the requirements you must follow in using this database. The Additiona...n the Standard License. Standard License The Standard License for this database is the license specified in ...the Creative Commons Attribution-Share Alike 2.1 Japan . If you use data from this database

  7. The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences

    Directory of Open Access Journals (Sweden)

    Gibbs Mark J

    2008-02-01

    Full Text Available Abstract Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.

  8. The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences.

    Science.gov (United States)

    Fourment, Mathieu; Gibbs, Mark J

    2008-02-05

    Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.

  9. PseudoMLSA: a database for multigenic sequence analysis of Pseudomonas species

    Directory of Open Access Journals (Sweden)

    Lalucat Jorge

    2010-04-01

    Full Text Available Abstract Background The genus Pseudomonas comprises more than 100 species of environmental, clinical, agricultural, and biotechnological interest. Although, the recommended method for discriminating bacterial species is DNA-DNA hybridisation, alternative techniques based on multigenic sequence analysis are becoming a common practice in bacterial species discrimination studies. Since there is not a general criterion for determining which genes are more useful for species resolution; the number of strains and genes analysed is increasing continuously. As a result, sequences of different genes are dispersed throughout several databases. This sequence information needs to be collected in a common database, in order to be useful for future identification-based projects. Description The PseudoMLSA Database is a comprehensive database of multiple gene sequences from strains of Pseudomonas species. The core of the database is composed of selected gene sequences from all Pseudomonas type strains validly assigned to the genus through 2008. The database is aimed to be useful for MultiLocus Sequence Analysis (MLSA procedures, for the identification and characterisation of any Pseudomonas bacterial isolate. The sequences are available for download via a direct connection to the National Center for Biotechnology Information (NCBI. Additionally, the database includes an online BLAST interface for flexible nucleotide queries and similarity searches with the user's datasets, and provides a user-friendly output for easily parsing, navigating, and analysing BLAST results. Conclusions The PseudoMLSA database amasses strains and sequence information of validly described Pseudomonas species, and allows free querying of the database via a user-friendly, web-based interface available at http://www.uib.es/microbiologiaBD/Welcome.html. The web-based platform enables easy retrieval at strain or gene sequence information level; including references to published peer

  10. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    Energy Technology Data Exchange (ETDEWEB)

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika; Tanaka, Yoshihiro; Teranishi, Kristen S.; Sunagawa, Shinichi; Wong, Mike; Stillman, Jonathon H.

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in

  11. Genome-wide sequence variations among Mycobacterium avium subspecies paratuberculosis.

    Directory of Open Access Journals (Sweden)

    Chung-Yi eHsu

    2011-12-01

    Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolates, genomic rearrangements (insertions/deletions, Indels were not detected, and only unique single nucleotide polymorphisms (SNPs were observed among the 6 M. ap strains. While most of the SNPs (~100 in M. ap genomes were non-synonymous, a total of ~ 6000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomic analysis showed that isolates from goat and Oryx are closely related to the cattle (K-10 strain while the human isolate (M. ap 4B is closely related to the environmental strains, indicating environmental source to human infections. Overall, SNPs were the most common variations among M. ap isolates while SNPs in addition to Indels were prevalent among M. avium isolates. Genomic variations will be useful in designing host-specific markers for the analysis of mycobacterial evolution and for developing novel diagnostics directed against Johne’s disease in animals.

  12. Mapping copy number variation by population-scale genome sequencing

    DEFF Research Database (Denmark)

    Mills, Ryan E.; Walter, Klaudia; Stewart, Chip

    2011-01-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is......, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications...

  13. Organizing, exploring, and analyzing antibody sequence data: the case for relational-database managers.

    Science.gov (United States)

    Owens, John

    2009-01-01

    Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.

  14. Korean Variant Archive (KOVA): a reference database of genetic variations in the Korean population.

    Science.gov (United States)

    Lee, Sangmoon; Seo, Jihae; Park, Jinman; Nam, Jae-Yong; Choi, Ahyoung; Ignatius, Jason S; Bjornson, Robert D; Chae, Jong-Hee; Jang, In-Jin; Lee, Sanghyuk; Park, Woong-Yang; Baek, Daehyun; Choi, Murim

    2017-06-27

    Despite efforts to interrogate human genome variation through large-scale databases, systematic preference toward populations of Caucasian descendants has resulted in unintended reduction of power in studying non-Caucasians. Here we report a compilation of coding variants from 1,055 healthy Korean individuals (KOVA; Korean Variant Archive). The samples were sequenced to a mean depth of 75x, yielding 101 singleton variants per individual. Population genetics analysis demonstrates that the Korean population is a distinct ethnic group comparable to other discrete ethnic groups in Africa and Europe, providing a rationale for such independent genomic datasets. Indeed, KOVA conferred 22.8% increased variant filtering power in addition to Exome Aggregation Consortium (ExAC) when used on Korean exomes. Functional assessment of nonsynonymous variant supported the presence of purifying selection in Koreans. Analysis of copy number variants detected 5.2 deletions and 10.3 amplifications per individual with an increased fraction of novel variants among smaller and rarer copy number variable segments. We also report a list of germline variants that are associated with increased tumor susceptibility. This catalog can function as a critical addition to the pre-existing variant databases in pursuing genetic studies of Korean individuals.

  15. The Moroccan Genetic Disease Database (MGDD): a database for DNA variations related to inherited disorders and disease susceptibility.

    Science.gov (United States)

    Charoute, Hicham; Nahili, Halima; Abidi, Omar; Gabi, Khalid; Rouba, Hassan; Fakiri, Malika; Barakat, Abdelhamid

    2014-03-01

    National and ethnic mutation databases provide comprehensive information about genetic variations reported in a population or an ethnic group. In this paper, we present the Moroccan Genetic Disease Database (MGDD), a catalogue of genetic data related to diseases identified in the Moroccan population. We used the PubMed, Web of Science and Google Scholar databases to identify available articles published until April 2013. The Database is designed and implemented on a three-tier model using Mysql relational database and the PHP programming language. To date, the database contains 425 mutations and 208 polymorphisms found in 301 genes and 259 diseases. Most Mendelian diseases in the Moroccan population follow autosomal recessive mode of inheritance (74.17%) and affect endocrine, nutritional and metabolic physiology. The MGDD database provides reference information for researchers, clinicians and health professionals through a user-friendly Web interface. Its content should be useful to improve researches in human molecular genetics, disease diagnoses and design of association studies. MGDD can be publicly accessed at http://mgdd.pasteur.ma.

  16. Variation of clinical expression in patients with Stargardt dystrophy and sequence variations in the ABCR gene.

    Science.gov (United States)

    Fishman, G A; Stone, E M; Grover, S; Derlacki, D J; Haines, H L; Hockey, R R

    1999-04-01

    To report the spectrum of ophthalmic findings in patients with Stargardt dystrophy or fundus flavimaculatus who have a specific sequence variation in the ABCR gene. Twenty-nine patients with Stargardt dystrophy or fundus flavimaculatus from different pedigrees were identified with possible disease-causing sequence variations in the ABCR gene from a group of 66 patients who were screened for sequence variations in this gene. Patients underwent a routine ocular examination, including slitlamp biomicroscopy and a dilated fundus examination. Fluorescein angiography was performed on 22 patients, and electroretinographic measurements were obtained on 24 of 29 patients. Kinetic visual fields were measured with a Goldmann perimeter in 26 patients. Single-strand conformation polymorphism analysis and DNA sequencing were used to identify variations in coding sequences of the ABCR gene. Three clinical phenotypes were observed among these 29 patients. In phenotype I, 9 of 12 patients had a sequence change in exon 42 of the ABCR gene in which the amino acid glutamic acid was substituted for glycine (Gly1961Glu). In only 4 of these 9 patients was a second possible disease-causing mutation found on the other ABCR allele. In addition to an atrophic-appearing macular lesion, phenotype I was characterized by localized perifoveal yellowish white flecks, the absence of a dark choroid, and normal electroretinographic amplitudes. Phenotype II consisted of 10 patients who showed a dark choroid and more diffuse yellowish white flecks in the fundus. None exhibited the Gly1961Glu change. Phenotype III consisted of 7 patients who showed extensive atrophic-appearing changes of the retinal pigment epithelium. Electroretinographic cone and rod amplitudes were reduced. One patient showed the Gly1961Glu change. A wide variation in clinical phenotype can occur in patients with sequence changes in the ABCR gene. In individual patients, a certain phenotype seems to be associated with the presence of

  17. Combining next-generation sequencing and online databases for microsatellite development in non-model organisms.

    Science.gov (United States)

    Rico, Ciro; Normandeau, Eric; Dion-Côté, Anne-Marie; Rico, María Inés; Côté, Guillaume; Bernatchez, Louis

    2013-12-03

    Next-generation sequencing (NGS) is revolutionising marker development and the rapidly increasing amount of transcriptomes published across a wide variety of taxa is providing valuable sequence databases for the identification of genetic markers without the need to generate new sequences. Microsatellites are still the most important source of polymorphic markers in ecology and evolution. Motivated by our long-term interest in the adaptive radiation of a non-model species complex of whitefishes (Coregonus spp.), in this study, we focus on microsatellite characterisation and multiplex optimisation using transcriptome sequences generated by Illumina® and Roche-454, as well as online databases of Expressed Sequence Tags (EST) for the study of whitefish evolution and demographic history. We identified and optimised 40 polymorphic loci in multiplex PCR reactions and validated the robustness of our analyses by testing several population genetics and phylogeographic predictions using 494 fish from five lakes and 2 distinct ecotypes.

  18. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    Science.gov (United States)

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  19. muBLASTP: database-indexed protein sequence search on multicore CPUs.

    Science.gov (United States)

    Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

    2016-11-04

    The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.

  20. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    Science.gov (United States)

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.

  1. Intelligent Access to Sequence and Structure Databases (IASSD) - an interface for accessing information from major web databases.

    Science.gov (United States)

    Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal

    2014-01-01

    With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.

  2. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    Science.gov (United States)

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  3. EuMicroSatdb: A database for microsatellites in the sequenced genomes of eukaryotes

    Directory of Open Access Journals (Sweden)

    Grover Atul

    2007-07-01

    Full Text Available Abstract Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database http://ipu.ac.in/usbt/EuMicroSatdb.htm is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP. The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect, repeat unit length (mono- to hexa-nucleotide, repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.

  4. Estimating the annotation error rate of curated GO database sequence annotations

    Directory of Open Access Journals (Sweden)

    Brown Alfred L

    2007-05-01

    Full Text Available Abstract Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO sequence database (GOSeqLite. This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006 at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.

  5. Sequence Variation in Rhoptry Neck Protein 10 Gene among Toxoplasma gondii Isolates from Different Hosts and Geographical Locations.

    Science.gov (United States)

    Zhao, Yu; Zhou, Donghui; Chen, Jia; Sun, Xiaolin

    2017-01-01

    Toxoplasma gondii, as a eukaryotic parasite of the phylum Apicomplexa, can infect almost all the warm-blooded animals and humans, causing toxoplasmosis. Rhoptry neck proteins (RONs) play a key role in the invasion process of T. gondii and are potential vaccine candidate molecules against toxoplasmosis. The present study examined sequence variation in the rhoptry neck protein 10 (TgRON10) gene among 10 T. gondii isolates from different hosts and geographical locations from Lanzhou province during 2014, and compared with the corresponding sequences of strains ME49 and VEG obtained from the ToxoDB database, using polymerase chain reaction (PCR) amplification, sequence analysis, and phylogenetic reconstruction by Bayesian inference (BI) and maximum parsimony (MP). Analysis of all the 12 TgRON10 genomic and cDNA sequences revealed 7 exons and 6 introns in the TgRON10 gDNA. The complete genomic sequence of the TgRON10 gene ranged from 4759 bp to 4763 bp, and sequence variation was 0-0.6% among the 12 T. gondii isolates, indicating a low sequence variation in TgRON10 gene. Phylogenetic analysis of TgRON10 sequences showed that the cluster of the 12 T. gondii isolates was not completely consistent with their respective genotypes. TgRON10 gene is not a suitable genetic marker for the differentiation of T. gondii isolates from different hosts and geographical locations, but may represent a potential vaccine candidate against toxoplasmosis, worth further studies.

  6. mESAdb: microRNA expression and sequence analysis database.

    Science.gov (United States)

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  7. An Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections

    Science.gov (United States)

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...

  8. PATACSDB—the database of polyA translational attenuators in coding sequences

    Directory of Open Access Journals (Sweden)

    Malgorzata Habich

    2016-02-01

    Full Text Available Recent additions to the repertoire of gene expression regulatory mechanisms are polyadenylate (polyA tracks encoding for poly-lysine runs in protein sequences. Such tracks stall the translation apparatus and induce frameshifting independently of the effects of charged nascent poly-lysine sequence on the ribosome exit channel. As such, they substantially influence the stability of mRNA and the amount of protein produced from a given transcript. Single base changes in these regions are enough to exert a measurable response on both protein and mRNA abundance; this makes each of these sequences a potentially interesting case study for the effects of synonymous mutation, gene dosage balance and natural frameshifting. Here we present PATACSDB, a resource that contain a comprehensive list of polyA tracks from over 250 eukaryotic genomes. Our data is based on the Ensembl genomic database of coding sequences and filtered with algorithm of 12A-1 which selects sequences of polyA tracks with a minimal length of 12 A’s allowing for one mismatched base. The PATACSDB database is accessible at: http://sysbio.ibb.waw.pl/patacsdb. The source code is available at http://github.com/habich/PATACSDB, and it includes the scripts with which the database can be recreated.

  9. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Directory of Open Access Journals (Sweden)

    Marais Gabriel AB

    2011-07-01

    Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to

  10. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Science.gov (United States)

    2011-01-01

    Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a

  11. CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

    Directory of Open Access Journals (Sweden)

    Maskell Douglas L

    2009-05-01

    Full Text Available Abstract Background The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Findings Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. Conclusion CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  12. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    Directory of Open Access Journals (Sweden)

    Arun Prabhu Dhanapal

    2015-01-01

    Full Text Available The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away.

  13. Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes

    Science.gov (United States)

    Hormozdiari, Fereydoun; Hajirasouliha, Iman; McPherson, Andrew; Eichler, Evan E.; Sahinalp, S. Cenk

    Next generation sequencing technologies have been decreasing the costs and increasing the world-wide capacity for sequence production at an unprecedented rate, making the initiation of large scale projects aiming to sequence almost 2000 genomes [1]. Structural variation detection promises to be one of the key diagnostic tools for cancer and other diseases with genomic origin. In this paper, we study the problem of detecting structural variation events in two or more sequenced genomes through high throughput sequencing . We propose to move from the current model of (1) detecting genomic variations in single next generation sequenced (NGS) donor genomes independently, and (2) checking whether two or more donor genomes indeed agree or disagree on the variations (in this paper we name this framework Independent Structural Variation Discovery and Merging - ISV&M), to a new model in which we detect structural variation events among multiple genomes simultaneously.

  14. PrionHome: a database of prions and other sequences relevant to prion phenomena.

    Directory of Open Access Journals (Sweden)

    Djamel Harbi

    Full Text Available Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions, prionoids (i.e., proteins that propagate like prions between individual cells, and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant, and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments. We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic

  15. PrionHome: a database of prions and other sequences relevant to prion phenomena.

    Science.gov (United States)

    Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M A; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M

    2012-01-01

    Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing

  16. Translational database selection and multiplexed sequence capture for up front filtering of reliable breast cancer biomarker candidates.

    Directory of Open Access Journals (Sweden)

    Patrik L Ståhl

    Full Text Available Biomarker identification is of utmost importance for the development of novel diagnostics and therapeutics. Here we make use of a translational database selection strategy, utilizing data from the Human Protein Atlas (HPA on differentially expressed protein patterns in healthy and breast cancer tissues as a means to filter out potential biomarkers for underlying genetic causatives of the disease. DNA was isolated from ten breast cancer biopsies, and the protein coding and flanking non-coding genomic regions corresponding to the selected proteins were extracted in a multiplexed format from the samples using a single DNA sequence capture array. Deep sequencing revealed an even enrichment of the multiplexed samples and a great variation of genetic alterations in the tumors of the sampled individuals. Benefiting from the upstream filtering method, the final set of biomarker candidates could be completely verified through bidirectional Sanger sequencing, revealing a 40 percent false positive rate despite high read coverage. Of the variants encountered in translated regions, nine novel non-synonymous variations were identified and verified, two of which were present in more than one of the ten tumor samples.

  17. Protein backbone angle restraints from searching a database for chemical shift and sequence homology

    Energy Technology Data Exchange (ETDEWEB)

    Cornilescu, Gabriel; Delaglio, Frank; Bax, Ad [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)

    1999-03-15

    Chemical shifts of backbone atoms in proteins are exquisitely sensitive to local conformation, and homologous proteins show quite similar patterns of secondary chemical shifts. The inverse of this relation is used to search a database for triplets of adjacent residues with secondary chemical shifts and sequence similarity which provide the best match to the query triplet of interest. The database contains 13C{alpha}, 13C{beta}, 13C', 1H{alpha} and 15N chemical shifts for 20 proteins for which a high resolution X-ray structure is available. The computer program TALOS was developed to search this database for strings of residues with chemical shift and residue type homology. The relative importance of the weighting factors attached to the secondary chemical shifts of the five types of resonances relative to that of sequence similarity was optimized empirically. TALOS yields the 10 triplets which have the closest similarity in secondary chemical shift and amino acid sequence to those of the query sequence. If the central residues in these 10 triplets exhibit similar {phi} and {psi} backbone angles, their averages can reliably be used as angular restraints for the protein whose structure is being studied. Tests carried out for proteins of known structure indicate that the root-mean-square difference (rmsd) between the output of TALOS and the X-ray derived backbone angles is about 15 deg. Approximately 3% of the predictions made by TALOS are found to be in error.

  18. GarlicESTdb: an online database and mining tool for garlic EST sequences

    Directory of Open Access Journals (Sweden)

    Choi Sang-Haeng

    2009-05-01

    Full Text Available Abstract Background Allium sativum., commonly known as garlic, is a species in the onion genus (Allium, which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. Description GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition software technology (JSP/EJB/JavaServlet for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation

  19. GarlicESTdb: an online database and mining tool for garlic EST sequences.

    Science.gov (United States)

    Kim, Dae-Won; Jung, Tae-Sung; Nam, Seong-Hyeuk; Kwon, Hyuk-Ryul; Kim, Aeri; Chae, Sung-Hwa; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Park, Hong-Seog

    2009-05-18

    Allium sativum., commonly known as garlic, is a species in the onion genus (Allium), which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST) of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum) EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition) software technology (JSP/EJB/JavaServlet) for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP) and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation information for others to view. The Garlic

  20. Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

    Science.gov (United States)

    Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

    Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.

  1. The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeq™ DNA Signature Prep Kit.

    Science.gov (United States)

    Hussing, C; Bytyci, R; Huber, C; Morling, N; Børsting, C

    2018-05-24

    Some STR loci have internal sequence variations, which are not revealed by the standard STR typing methods used in forensic genetics (PCR and fragment length analysis by capillary electrophoresis (CE)). Typing of STRs with next-generation sequencing (NGS) uncovers the sequence variation in the repeat region and in the flanking regions. In this study, 363 Danish individuals were typed for 56 STRs (26 autosomal STRs, 24 Y-STRs, and 6 X-STRs) using the ForenSeq™ DNA Signature Prep Kit to establish a Danish STR sequence database. Increased allelic diversity was observed in 34 STRs by the PCR-NGS assay. The largest increases were found in DYS389II and D12S391, where the numbers of sequenced alleles were around four times larger than the numbers of alleles determined by repeat length alone. Thirteen SNPs and one InDel were identified in the flanking regions of 12 STRs. Furthermore, 36 single positions and five longer stretches in the STR flanking regions were found to have dubious genotyping quality. The combined match probability of the 26 autosomal STRs was 10,000 times larger using the PCR-NGS assay than by using PCR-CE. The typical paternity indices for trios and duos were 500 and 100 times larger, respectively, than those obtained with PCR-CE. The assay also amplified 94 SNPs selected for human identification. Eleven of these loci were not in Hardy-Weinberg equilibrium in the Danish population, most likely because the minimum threshold for allele calling (30 reads) in the ForenSeq™ Universal Analysis Software was too low and frequent allele dropouts were not detected.

  2. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    Directory of Open Access Journals (Sweden)

    Dumontier Michel

    2002-10-01

    Full Text Available Abstract Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.

  3. Geochemical variations during the 2012 Emilia seismic sequence

    Science.gov (United States)

    Sciarra, Alessandra; Cantucci, Barbara; Galli, Gianfranco; Cinti, Daniele; Pizzino, Luca

    2015-04-01

    Several geochemical surveys (soil gas and shallow water) were performed in the Modena province (Massa Finalese, Finale Emilia, Medolla and S. Felice sul Panaro), during 2006-2014 period. In May-June 2012, a seismic sequence (main shocks of ML 5.9 and 5.8) was occurred closely to the investigated area. In this area 300 CO2 and CH4 fluxes measurements, 150 soil gas concentrations (He, H2, CO2, CH4 and C2H6), 30 shallow waters and their isotopic analyses (δ13C- CH4, δD- CH4 and δ13C- CO2) were performed in April-May 2006, October and December 2008, repeated in May and September 2012, June 2013 and July 2014 afterwards the 2012 Emilia seismic sequences. Chemical composition of soil gas are dominated by CH4 in the southern part by CO2 in the northern part. Very anomalous fluxes and concentrations are recorded in spot areas; elsewhere CO2 and CH4 values are very low, within the typical range of vegetative and of organic exhalation of the cultivated soil. After the seismic sequence the CH4 and CO2 fluxes are increased of one order of magnitude in the spotty areas, whereas in the surrounding area the values are within the background. On the contrary, CH4 concentration decrease (40%v/v in the 2012 surveys) and CO2 concentration increase until to 12.7%v/v (2013 survey). Isotopic gas analysis were carried out only on samples with anomalous values. Pre-seismic data hint a thermogenic origin of CH4 probably linked to leakage from a deep source in the Medolla area. Conversely, 2012/2013 isotopic data indicate a typical biogenic origin (i.e. microbial hydrocarbon production) of the CH4, as recognized elsewhere in the Po Plain and surroundings. The δ13C-CO2 value suggests a prevalent shallow origin of CO2 (i.e. organic and/or soil-derived) probably related to anaerobic oxidation of heavy hydrocarbons. Water samples, collected from domestic, industrial and hydrocarbons exploration wells, allowed us to recognize different families of waters. Waters are meteoric in origin and

  4. Identification of Alternative Splice Variants Using Unique Tryptic Peptide Sequences for Database Searches.

    Science.gov (United States)

    Tran, Trung T; Bollineni, Ravi C; Strozynski, Margarita; Koehler, Christian J; Thiede, Bernd

    2017-07-07

    Alternative splicing is a mechanism in eukaryotes by which different forms of mRNAs are generated from the same gene. Identification of alternative splice variants requires the identification of peptides specific for alternative splice forms. For this purpose, we generated a human database that contains only unique tryptic peptides specific for alternative splice forms from Swiss-Prot entries. Using this database allows an easy access to splice variant-specific peptide sequences that match to MS data. Furthermore, we combined this database without alternative splice variant-1-specific peptides with human Swiss-Prot. This combined database can be used as a general database for searching of LC-MS data. LC-MS data derived from in-solution digests of two different cell lines (LNCaP, HeLa) and phosphoproteomics studies were analyzed using these two databases. Several nonalternative splice variant-1-specific peptides were found in both cell lines, and some of them seemed to be cell-line-specific. Control and apoptotic phosphoproteomes from Jurkat T cells revealed several nonalternative splice variant-1-specific peptides, and some of them showed clear quantitative differences between the two states.

  5. A Public Database of Memory and Naive B-Cell Receptor Sequences.

    Directory of Open Access Journals (Sweden)

    William S DeWitt

    Full Text Available The vast diversity of B-cell receptors (BCR and secreted antibodies enables the recognition of, and response to, a wide range of epitopes, but this diversity has also limited our understanding of humoral immunity. We present a public database of more than 37 million unique BCR sequences from three healthy adult donors that is many fold deeper than any existing resource, together with a set of online tools designed to facilitate the visualization and analysis of the annotated data. We estimate the clonal diversity of the naive and memory B-cell repertoires of healthy individuals, and provide a set of examples that illustrate the utility of the database, including several views of the basic properties of immunoglobulin heavy chain sequences, such as rearrangement length, subunit usage, and somatic hypermutation positions and dynamics.

  6. High Performance Protein Sequence Database Scanning on the Cell Broadband Engine

    Directory of Open Access Journals (Sweden)

    Adrianto Wirawan

    2009-01-01

    Full Text Available The enormous growth of biological sequence databases has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of low cost parallel multicore accelerator technologies has made it possible to reduce execution times of many bioinformatics applications. In this paper, we demonstrate how the Cell Broadband Engine can be used as a computational platform to accelerate two approaches for protein sequence database scanning: exhaustive and heuristic. We present efficient parallelization techniques for two representative algorithms: the dynamic programming based Smith–Waterman algorithm and the popular BLASTP heuristic. Their implementation on a Playstation®3 leads to significant runtime savings compared to corresponding sequential implementations.

  7. A Public Database of Memory and Naive B-Cell Receptor Sequences.

    Science.gov (United States)

    DeWitt, William S; Lindau, Paul; Snyder, Thomas M; Sherwood, Anna M; Vignali, Marissa; Carlson, Christopher S; Greenberg, Philip D; Duerkopp, Natalie; Emerson, Ryan O; Robins, Harlan S

    2016-01-01

    The vast diversity of B-cell receptors (BCR) and secreted antibodies enables the recognition of, and response to, a wide range of epitopes, but this diversity has also limited our understanding of humoral immunity. We present a public database of more than 37 million unique BCR sequences from three healthy adult donors that is many fold deeper than any existing resource, together with a set of online tools designed to facilitate the visualization and analysis of the annotated data. We estimate the clonal diversity of the naive and memory B-cell repertoires of healthy individuals, and provide a set of examples that illustrate the utility of the database, including several views of the basic properties of immunoglobulin heavy chain sequences, such as rearrangement length, subunit usage, and somatic hypermutation positions and dynamics.

  8. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

    Science.gov (United States)

    Carver, Tim; Berriman, Matthew; Tivey, Adrian; Patel, Chinmay; Böhme, Ulrike; Barrell, Barclay G; Parkhill, Julian; Rajandream, Marie-Adèle

    2008-12-01

    Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences. Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text. Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/

  9. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  10. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database

    Science.gov (United States)

    Carver, Tim; Berriman, Matthew; Tivey, Adrian; Patel, Chinmay; Böhme, Ulrike; Barrell, Barclay G.; Parkhill, Julian; Rajandream, Marie-Adèle

    2008-01-01

    Motivation: Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences. Results: Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text. Availability: Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/ Contact: artemis@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:18845581

  11. The need for high-quality whole-genome sequence databases in microbial forensics.

    Science.gov (United States)

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  12. Databases

    Digital Repository Service at National Institute of Oceanography (India)

    Kunte, P.D.

    Information on bibliographic as well as numeric/textual databases relevant to coastal geomorphology has been included in a tabular form. Databases cover a broad spectrum of related subjects like coastal environment and population aspects, coastline...

  13. The first Malay database toward the ethnic-specific target molecular variation.

    Science.gov (United States)

    Halim-Fikri, Hashim; Etemad, Ali; Abdul Latif, Ahmad Zubaidi; Merican, Amir Feisal; Baig, Atif Amin; Annuar, Azlina Ahmad; Ismail, Endom; Salahshourifar, Iman; Liza-Sharmini, Ahmad Tajudin; Ramli, Marini; Shah, Mohamed Irwan; Johan, Muhammad Farid; Hassan, Nik Norliza Nik; Abdul-Aziz, Noraishah Mydin; Mohd Noor, Noor Haslina; Nur-Shafawati, Ab Rajab; Hassan, Rosline; Bahar, Rosnah; Zain, Rosnah Binti; Yusoff, Shafini Mohamed; Yusoff, Surini; Tan, Soon Guan; Thong, Meow-Keong; Wan-Isa, Hatin; Abdullah, Wan Zaidah; Mohamed, Zahurin; Abdul Latiff, Zarina; Zilfalil, Bin Alwi

    2015-04-30

    The Malaysian Node of the Human Variome Project (MyHVP) is one of the eighteen official Human Variome Project (HVP) country-specific nodes. Since its inception in 9(th) October 2010, MyHVP has attracted the significant number of Malaysian clinicians and researchers to participate and contribute their data to this project. MyHVP also act as the center of coordination for genotypic and phenotypic variation studies of the Malaysian population. A specialized database was developed to store and manage the data based on genetic variations which also associated with health and disease of Malaysian ethnic groups. This ethnic-specific database is called the Malaysian Node of the Human Variome Project database (MyHVPDb). Currently, MyHVPDb provides only information about the genetic variations and mutations found in the Malays. In the near future, it will expand for the other Malaysian ethnics as well. The data sets are specified based on diseases or genetic mutation types which have three main subcategories: Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV) followed by the mutations which code for the common diseases among Malaysians. MyHVPDb has been open to the local researchers, academicians and students through the registration at the portal of MyHVP ( http://hvpmalaysia.kk.usm.my/mhgvc/index.php?id=register ). This database would be useful for clinicians and researchers who are interested in doing a study on genomics population and genetic diseases in order to obtain up-to-date and accurate information regarding the population-specific variations and also useful for those in countries with similar ethnic background.

  14. Sequence Variation in Rhoptry Neck Protein 10 Gene among Toxoplasma gondii Isolates from Different Hosts and Geographical Locations

    Directory of Open Access Journals (Sweden)

    Yu ZHAO

    2017-09-01

    Full Text Available Background: Toxoplasma gondii, as a eukaryotic parasite of the phylum Apicomplexa, can infect almost all the warm-blooded animals and humans, causing toxoplasmosis. Rhoptry neck proteins (RONs play a key role in the invasion process of T. gondii and are potential vaccine candidate molecules against toxoplasmosis.Methods: The present study examined sequence variation in the rhoptry neck protein 10 (TgRON10 gene among 10 T. gondii isolates from different hosts and geographical locations from Lanzhou province during 2014, and compared with the corresponding sequences of strains ME49 and VEG obtained from the ToxoDB database, using polymerase chain reaction (PCR amplification, sequence analysis, and phylogenetic reconstruction by Bayesian inference (BI and maximum parsimony (MP. Results: Analysis of all the 12 TgRON10 genomic and cDNA sequences revealed 7 exons and 6 introns in the TgRON10 gDNA. The complete genomic sequence of the TgRON10 gene ranged from 4759 bp to 4763 bp, and sequence variation was 0-0.6% among the 12 T. gondii isolates, indicating a low sequence variation in TgRON10 gene. Phylogenetic analysis of TgRON10 sequences showed that the cluster of the 12 T. gondii isolates was not completely consistent with their respective genotypes.Conclusion: TgRON10 gene is not a suitable genetic marker for the differentiation of T. gondii isolates from different hosts and geographical locations, but may represent a potential vaccine candidate against toxoplasmosis, worth further studies.

  15. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

    Science.gov (United States)

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

    2018-01-01

    Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have

  16. Forward Genetics by Sequencing EMS Variation-Induced Inbred Lines

    Directory of Open Access Journals (Sweden)

    Charles Addo-Quaye

    2017-02-01

    Full Text Available In order to leverage novel sequencing techniques for cloning genes in eukaryotic organisms with complex genomes, the false positive rate of variant discovery must be controlled for by experimental design and informatics. We sequenced five lines from three pedigrees of ethyl methanesulfonate (EMS-mutagenized Sorghum bicolor, including a pedigree segregating a recessive dwarf mutant. Comparing the sequences of the lines, we were able to identify and eliminate error-prone positions. One genomic region contained EMS mutant alleles in dwarfs that were homozygous reference sequences in wild-type siblings and heterozygous in segregating families. This region contained a single nonsynonymous change that cosegregated with dwarfism in a validation population and caused a premature stop codon in the Sorghum ortholog encoding the gibberellic acid (GA biosynthetic enzyme ent-kaurene oxidase. Application of exogenous GA rescued the mutant phenotype. Our method for mapping did not require outcrossing and introduced no segregation variance. This enables work when line crossing is complicated by life history, permitting gene discovery outside of genetic models. This inverts the historical approach of first using recombination to define a locus and then sequencing genes. Our formally identical approach first sequences all the genes and then seeks cosegregation with the trait. Mutagenized lines lacking obvious phenotypic alterations are available for an extension of this approach: mapping with a known marker set in a line that is phenotypically identical to starting material for EMS mutant generation.

  17. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

    Science.gov (United States)

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

    2018-01-04

    Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Protein 3D structure computed from evolutionary sequence variation.

    Directory of Open Access Journals (Sweden)

    Debora S Marks

    Full Text Available The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org. This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of

  19. Genotyping common and rare variation using overlapping pool sequencing

    Directory of Open Access Journals (Sweden)

    Pasaniuc Bogdan

    2011-07-01

    Full Text Available Abstract Background Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants. Results In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications. Conclusions Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences.

  20. Databases

    Directory of Open Access Journals (Sweden)

    Nick Ryan

    2004-01-01

    Full Text Available Databases are deeply embedded in archaeology, underpinning and supporting many aspects of the subject. However, as well as providing a means for storing, retrieving and modifying data, databases themselves must be a result of a detailed analysis and design process. This article looks at this process, and shows how the characteristics of data models affect the process of database design and implementation. The impact of the Internet on the development of databases is examined, and the article concludes with a discussion of a range of issues associated with the recording and management of archaeological data.

  1. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

    Directory of Open Access Journals (Sweden)

    Rognes Torbjørn

    2011-06-01

    Full Text Available Abstract Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

  2. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    Science.gov (United States)

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.

    Science.gov (United States)

    Rognes, Torbjørn

    2011-06-01

    The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

  4. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.

    Science.gov (United States)

    Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A

    2010-02-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.

  5. Mitochondrial DNA sequence variation in the Anatolian Peninsula ...

    Indian Academy of Sciences (India)

    Unknown

    necting the Middle East, Europe and Central Asia, and, thus, has been subject to major population movements. The ... from different parts of Anatolia by direct sequencing. Analysis of the two ... the country, samples were obtained from individuals com- ing from ..... Arlequin: a software environment for the analysis of popula-.

  6. Variation in Symbiodinium ITS2 sequence assemblages among coral colonies.

    Science.gov (United States)

    Stat, Michael; Bird, Christopher E; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J; Concepcion, Gregory T; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J; Gates, Ruth D

    2011-01-05

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping.

  7. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  8. Solar Luminosity on the Main Sequence, Standard Model and Variations

    Science.gov (United States)

    Ayukov, S. V.; Baturin, V. A.; Gorshkov, A. B.; Oreshina, A. V.

    2017-05-01

    Our Sun became Main Sequence star 4.6 Gyr ago according Standard Solar Model. At that time solar luminosity was 30% lower than current value. This conclusion is based on assumption that Sun is fueled by thermonuclear reactions. If Earth's albedo and emissivity in infrared are unchanged during Earth history, 2.3 Gyr ago oceans had to be frozen. This contradicts to geological data: there was liquid water 3.6-3.8 Gyr ago on Earth. This problem is known as Faint Young Sun Paradox. We analyze luminosity change in standard solar evolution theory. Increase of mean molecular weight in the central part of the Sun due to conversion of hydrogen to helium leads to gradual increase of luminosity with time on the Main Sequence. We also consider several exotic models: fully mixed Sun; drastic change of pp reaction rate; Sun consisting of hydrogen and helium only. Solar neutrino observations however exclude most non-standard solar models.

  9. Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology

    International Nuclear Information System (INIS)

    Shen Yang; Bax, Ad

    2007-01-01

    Chemical shifts of nuclei in or attached to a protein backbone are exquisitely sensitive to their local environment. A computer program, SPARTA, is described that uses this correlation with local structure to predict protein backbone chemical shifts, given an input three-dimensional structure, by searching a newly generated database for triplets of adjacent residues that provide the best match in φ/ψ/χ 1 torsion angles and sequence similarity to the query triplet of interest. The database contains 15 N, 1 H N , 1 H α , 13 C α , 13 C β and 13 C' chemical shifts for 200 proteins for which a high resolution X-ray (≤2.4 A) structure is available. The relative importance of the weighting factors for the φ/ψ/χ 1 angles and sequence similarity was optimized empirically. The weighted, average secondary shifts of the central residues in the 20 best-matching triplets, after inclusion of nearest neighbor, ring current, and hydrogen bonding effects, are used to predict chemical shifts for the protein of known structure. Validation shows good agreement between the SPARTA-predicted and experimental shifts, with standard deviations of 2.52, 0.51, 0.27, 0.98, 1.07 and 1.08 ppm for 15 N, 1 H N , 1 H α , 13 C α , 13 C β and 13 C', respectively, including outliers

  10. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE).

    Science.gov (United States)

    Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja

    2014-01-01

    Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http

  11. Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database

    NARCIS (Netherlands)

    Ritari, Jarmo; Salojärvi, Jarkko; Lahti, Leo; Vos, de Willem M.

    2015-01-01

    Background: Current sequencing technology enables taxonomic profiling of microbial ecosystems at high resolution and depth by using the 16S rRNA gene as a phylogenetic marker. Taxonomic assignation of newly acquired data is based on sequence comparisons with comprehensive reference databases to

  12. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; Platteel, Mathieu; Swertz, Morris A.; Wijmenga, Cisca

    Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring

  13. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  14. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing

    NARCIS (Netherlands)

    Aflitos, S.A.; Schijlen, E.G.W.M.; Jong, de J.H.S.G.M.; Ridder, de D.; Smit, S.; Finkers, H.J.; Bakker, F.T.; Geest, van de H.C.; Lintel Hekkert, te B.; Haarst, van J.C.; Smits, L.W.M.; Koops, A.J.; Sanchez-Perez, M.J.; Heusden, van A.W.; Visser, R.G.F.; Schranz, M.E.; Peters, S.A.

    2014-01-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative for the Lycopersicon, Arcanum, Eriopersicon, and Neolycopersicon groups which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new

  15. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing

    NARCIS (Netherlands)

    Aflitos, S.; Schijlen, E.; de Jong, H.; de Ridder, D.; Smit, S.; Finkers, R.; Wang, J.; Zhang, G.; Li, N.; Mao, L.; Bakker, F.; Dirks, R.; Breit, T.; Gravendeel, B.; Huits, H.; Struss, D.; Swanson-Wagner, R.; van Leeuwen, H.; van Ham, R.C.H.J.; Fito, L.; Guignier, L.; Sevilla, M.; Ellul, P.; Ganko, E.; Kapur, A.; Reclus, E.; de Geus, B.; van de Geest, H.; te Lintel Hekkert, B.; van Haarst, J.; Smits, L.; Koops, A.; Sanchez-Perez, G.; van Heusden, A.W.; Visser, R.; Quan, Z.; Min, J.; Liao, L.; Wang, X.; Wang, G.; Yue, Z.; Yang, X.; Xu, N.; Schranz, E.; Smets, E.; Vos, R.; Rauwerda, J.; Ursem, R.; Schuit, C.; Kerns, M.; van den Berg, J.; Vriezen, W.; Janssen, A.; Datema, E.; Jahrman, T.; Moquet, F.; Bonnet, J.; Peters, S.

    2014-01-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new

  16. Identification, variation and transcription of pneumococcal repeat sequences

    Science.gov (United States)

    2011-01-01

    Background Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics. Results Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR. Conclusions BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/. PMID:21333003

  17. A two-locus DNA sequence database for typing plant and human pathogens within the Fusarium oxysporum species complex

    DEFF Research Database (Denmark)

    O'Donnell, Kerry; Gueidan, C; Sink, S

    2009-01-01

    We constructed a two-locus database, comprising partial translation elongation factor (EF-1alpha) gene sequences and nearly full-length sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA) for 850 isolates spanning the phylogenetic breadth of the Fusarium oxysporum species compl...... of the IGS rDNA sequences may be non-orthologous. We also evaluated enniatin, fumonisin and moniliformin mycotoxin production in vitro within a phylogenetic framework....

  18. Sequence length variation, indel costs, and congruence in sensitivity analysis

    DEFF Research Database (Denmark)

    Aagesen, Lone; Petersen, Gitte; Seberg, Ole

    2005-01-01

    The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which...... the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously...... preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation...

  19. Spectrum of sequence variations in the FANCA gene: an International Fanconi Anemia Registry (IFAR) study.

    Science.gov (United States)

    Levran, Orna; Diotti, Raffaella; Pujara, Kanan; Batish, Sat D; Hanenberg, Helmut; Auerbach, Arleen D

    2005-02-01

    Fanconi anemia (FA) is an autosomal recessive disorder that is defined by cellular hypersensitivity to DNA cross-linking agents, and is characterized clinically by developmental abnormalities, progressive bone-marrow failure, and predisposition to leukemia and solid tumors. There is extensive genetic heterogeneity, with at least 11 different FA complementation groups. FA-A is the most common group, accounting for approximately 65% of all affected individuals. The mutation spectrum of the FANCA gene, located on chromosome 16q24.3, is highly heterogeneous. Here we summarize all sequence variations (mutations and polymorphisms) in FANCA described in the literature and listed in the Fanconi Anemia Mutation Database as of March 2004, and report 61 novel FANCA mutations identified in FA patients registered in the International Fanconi Anemia Registry (IFAR). Thirty-eight novel SNPs, previously unreported in the literature or in dbSNP, were also identified. We studied the segregation of common FANCA SNPs in FA families to generate haplotypes. We found that FANCA SNP data are highly useful for carrier testing, prenatal diagnosis, and preimplantation genetic diagnosis, particularly when the disease-causing mutations are unknown. Twenty-two large genomic deletions were identified by detection of apparent homozygosity for rare SNPs. In addition, a conserved SNP haplotype block spanning at least 60 kb of the FANCA gene was identified in individuals from various ethnic groups. (c) 2005 Wiley-Liss, Inc.

  20. Next-generation sequencing can reveal in vitro-generated PCR crossover products: some artifactual sequences correspond to HLA alleles in the IMGT/HLA database.

    Science.gov (United States)

    Holcomb, C L; Rastrou, M; Williams, T C; Goodridge, D; Lazaro, A M; Tilanus, M; Erlich, H A

    2014-01-01

    The high-resolution human leukocyte antigen (HLA) genotyping assay that we developed using 454 sequencing and Conexio software uses generic polymerase chain reaction (PCR) primers for DRB exon 2. Occasionally, we observed low abundance DRB amplicon sequences that resulted from in vitro PCR 'crossing over' between DRB1 and DRB3/4/5. These hybrid sequences, revealed by the clonal sequencing property of the 454 system, were generally observed at a read depth of 5%-10% of the true alleles. They usually contained at least one mismatch with the IMGT/HLA database, and consequently, were easily recognizable and did not cause a problem for HLA genotyping. Sometimes, however, these artifactual sequences matched a rare allele and the automatic genotype assignment was incorrect. These observations raised two issues: (1) could PCR conditions be modified to reduce such artifacts? and (2) could some of the rare alleles listed in the IMGT/HLA database be artifacts rather than true alleles? Because PCR crossing over occurs during late cycles of PCR, we compared DRB genotypes resulting from 28 and (our standard) 35 cycles of PCR. For all 21 cell line DNAs amplified for 35 cycles, crossover products were detected. In 33% of the cases, these hybrid sequences corresponded to named alleles. With amplification for only 28 cycles, these artifactual sequences were not detectable. To investigate whether some rare alleles in the IMGT/HLA database might be due to PCR artifacts, we analyzed four samples obtained from the investigators who submitted the sequences. In three cases, the sequences were generated from true alleles. In one case, our 454 sequencing revealed an error in the previously submitted sequence. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  1. Haplotypes and Sequence Variation in the Ovine Adiponectin Gene (ADIPOQ

    Directory of Open Access Journals (Sweden)

    Qing-Ming An

    2015-11-01

    Full Text Available The adiponectin gene (ADIPOQ plays an important role in energy homeostasis. In this study five separate regions (regions 1 to 5 of ovine ADIPOQ were analysed using PCR-SSCP. Four different PCR-SSCP patterns (A1-D1, A2-D2 were detected in region-1 and region-2, respectively, with seven and six SNPs being revealed. In region-3, three different patterns (A3-C3 and three SNPs were observed. Two patterns (A4-B4, A5-B5 and two and one SNPs were observed in region-4 and region-5, respectively. In total, nineteen SNPs were detected, with five of them in the coding region and two (c.46T/C and c.515G/A putatively resulting in amino acid changes (p.Tyr16His and p.Lys172Arg. In region-1, -2 and -3 of 316 sheep from eight New Zealand breeds, variants A1, A2 and A3 were the most common, although variant frequencies differed in the eight breeds. Across region-1 and region-3, nine haplotypes were identified and haplotypes A1-A3, A1-C3, B1-A3 and B1-C3 were most common. These results indicate that the ADIPOQ gene is polymorphic and suggest that further analysis is required to see if the variation in the gene is associated with animal production traits.

  2. Reticulamoeba Is a Long-Branched Granofilosean (Cercozoa) That Is Missing from Sequence Databases

    Science.gov (United States)

    Bass, David; Yabuki, Akinori; Santini, Sébastien; Romac, Sarah; Berney, Cédric

    2012-01-01

    We sequenced the 18S ribosomal RNA gene of seven isolates of the enigmatic marine amoeboflagellate Reticulamoeba Grell, which resolved into four genetically distinct Reticulamoeba lineages, two of which correspond to R. gemmipara Grell and R. minor Grell, another with a relatively large cell body forming lacunae, and another that has similarities to both R. minor and R. gemmipara but with a greater propensity to form cell clusters. These lineages together form a long-branched clade that branches within the cercozoan class Granofilosea (phylum Cercozoa), showing phylogenetic affinities with the genus Mesofila. The basic morphology of Reticulamoeba is a roundish or ovoid cell with a more or less irregular outline. Long and branched reticulopodia radiate from the cell. The reticulopodia bear granules that are bidirectionally motile. There is also a biflagellate dispersal stage. Reticulamoeba is frequently observed in coastal marine environmental samples. PCR primers specific to the Reticulamoeba clade confirm that it is a frequent member of benthic marine microbial communities, and is also found in brackish water sediments and freshwater biofilm. However, so far it has not been found in large molecular datasets such as the nucleotide database in NCBI GenBank, metagenomic datasets in Camera, and the marine microbial eukaryote sampling and sequencing consortium BioMarKs, although closely related lineages can be found in some of these datasets using a highly targeted approach. Therefore, although such datasets are very powerful tools in microbial ecology, they may, for several methodological reasons, fail to detect ecologically and evolutionary key lineages. PMID:23226495

  3. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database.

    Science.gov (United States)

    Billoud, B; Kontic, M; Viari, A

    1996-01-01

    At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usually dedicated to particular structures and are written using general purpose programming languages through a complex and time consuming process where the biological problem of defining the structure and the computer engineering problem of looking for it are intimately intertwined. In this paper, we describe a general representation of structures, suitable for database scanning, together with a programming language, Palingol, designed to manipulate it. Palingol has specific data types, corresponding to structural elements-basically helices-that can be arranged in any way to form a complex structure. As a consequence of the declarative approach used in Palingol, the user should only focus on 'what to search for' while the language engine takes care of 'how to look for it'. Therefore, it becomes simpler to write a scanning program and the structural constraints that define the required structure are more clearly identified. PMID:8628670

  4. Determining Clostridium difficile intra-taxa diversity by mining multilocus sequence typing databases.

    Science.gov (United States)

    Muñoz, Marina; Ríos-Chaparro, Dora Inés; Patarroyo, Manuel Alfonso; Ramírez, Juan David

    2017-03-14

    Multilocus sequence typing (MLST) is a highly discriminatory typing strategy; it is reproducible and scalable. There is a MLST scheme for Clostridium difficile (CD), a gram positive bacillus causing different pathologies of the gastrointestinal tract. This work was aimed at describing the frequency of sequence types (STs) and Clades (C) reported and evalute the intra-taxa diversity in the CD MLST database (CD-MLST-db) using an MLSA approach. Analysis of 1778 available isolates showed that clade 1 (C1) was the most frequent worldwide (57.7%), followed by C2 (29.1%). Regarding sequence types (STs), it was found that ST-1, belonging to C2, was the most frequent. The isolates analysed came from 17 countries, mostly from the United Kingdom (UK) (1541 STs, 87.0%). The diversity of the seven housekeeping genes in the MLST scheme was evaluated, and alleles from the profiles (STs), for identifying CD population structure. It was found that adk and atpA are conserved genes allowing a limited amount of clusters to be discriminated; however, different genes such as drx, glyA and particularly sodA showed high diversity indexes and grouped CD populations in many clusters, suggesting that these genes' contribution to CD typing should be revised. It was identified that CD STs reported to date have a mostly clonal population structure with foreseen events of recombination; however, one group of STs was not assigned to a clade being highly different containing at least nine well-supported clusters, suggesting a greater amount of clades for CD. This study shows the usefulness of CD-MLST-db as a tool for studying CD distribution and population structure, identifying the need for reviewing the usefulness of sodA as housekeeping gene within the MLST scheme and suggesting the existence of a greater amount of CD clades. The study also shows the plausible exchange of genetic material between STs, contributing towards intra-taxa genetic diversity.

  5. The SDH mutation database: an online resource for succinate dehydrogenase sequence variants involved in pheochromocytoma, paraganglioma and mitochondrial complex II deficiency

    Directory of Open Access Journals (Sweden)

    Devilee Peter

    2005-11-01

    Full Text Available Abstract Background The SDHA, SDHB, SDHC and SDHD genes encode the subunits of succinate dehydrogenase (succinate: ubiquinone oxidoreductase, a component of both the Krebs cycle and the mitochondrial respiratory chain. SDHA, a flavoprotein and SDHB, an iron-sulfur protein together constitute the catalytic domain, while SDHC and SDHD encode membrane anchors that allow the complex to participate in the respiratory chain as complex II. Germline mutations of SDHD and SDHB are a major cause of the hereditary forms of the tumors paraganglioma and pheochromocytoma. The largest subunit, SDHA, is mutated in patients with Leigh syndrome and late-onset optic atrophy, but has not as yet been identified as a factor in hereditary cancer. Description The SDH mutation database is based on the recently described Leiden Open (source Variation Database (LOVD system. The variants currently described in the database were extracted from the published literature and in some cases annotated to conform to current mutation nomenclature. Researchers can also directly submit new sequence variants online. Since the identification of SDHD, SDHC, and SDHB as classic tumor suppressor genes in 2000 and 2001, studies from research groups around the world have identified a total of 120 variants. Here we introduce all reported paraganglioma and pheochromocytoma related sequence variations in these genes, in addition to all reported mutations of SDHA. The database is now accessible online. Conclusion The SDH mutation database offers a valuable tool and resource for clinicians involved in the treatment of patients with paraganglioma-pheochromocytoma, clinical geneticists needing an overview of current knowledge, and geneticists and other researchers needing a solid foundation for further exploration of both these tumor syndromes and SDHA-related phenotypes.

  6. Genetic diversity in breonadia salicina based on intra-species sequence variation of chloroplast dna spacer sequence

    International Nuclear Information System (INIS)

    Qurainy, F.A.; Gaafar, A.R.Z.

    2014-01-01

    Assessment and knowledge of the genetic diversity and variation within and between populations of rare and endangered plants is very important for effective conservation. Intergenic spacer sequences variation of psbA-trnH locus of chloroplast genome was assessed within Breonadia salicina (Rubiaceae), a critically endangered and endemic plant species to South western part of Kingdom of Saudi Arabia. The obtained sequence data from 19 individuals in three populations revealed nine haplotypes. The aligned sequences obtained from the overall Saudi accessions extended to 355 bp, revealing nine haplotypes. A high level of haplotype diversity (Hd = 0.842) and low level of nucleotide diversity (Pi = 0.0058) were detected. Consistently, both hierarchical analysis of molecular variance (AMOVA) and constructed neighbor-joining tree indicated null genetic differentiation among populations. This level of differentiation between populations or between regions in psbA-trnH sequences may be due to effects of the abundance of ancestral haplotype sharing and the presence of private haplotypes fixed for each population. Furthermore, the results revealed almost the same level of genetic diversity in comparison with Yemeni accessions, in which Saudi accessions were sharing three haplotypes from the four haplotypes found in Yemeni accessions. (author)

  7. Sequencing by ligation variation with endonuclease V digestion and deoxyinosine-containing query oligonucleotides

    Directory of Open Access Journals (Sweden)

    Ho Antoine

    2011-12-01

    Full Text Available Abstract Background Sequencing-by-ligation (SBL is one of several next-generation sequencing methods that has been developed for massive sequencing of DNA immobilized on arrayed beads (or other clonal amplicons. SBL has the advantage of being easy to implement and accessible to all because it can be performed with off-the-shelf reagents. However, SBL has the limitation of very short read lengths. Results To overcome the read length limitation, research groups have developed complex library preparation processes, which can be time-consuming, difficult, and result in low complexity libraries. Herein we describe a variation on traditional SBL protocols that extends the number of sequential bases that can be sequenced by using Endonuclease V to nick a query primer, thus leaving a ligatable end extended into the unknown sequence for further SBL cycles. To demonstrate the protocol, we constructed a known DNA sequence and utilized our SBL variation, cyclic SBL (cSBL, to resequence this region. Using our method, we were able to read thirteen contiguous bases in the 3' - 5' direction. Conclusions Combining this read length with sequencing in the 5' - 3' direction would allow a read length of over twenty bases on a single tage. Implementing mate-paired tags and this SBL variation could enable > 95% coverage of the genome.

  8. HIVBrainSeqDB: a database of annotated HIV envelope sequences from brain and other anatomical sites

    Directory of Open Access Journals (Sweden)

    O'Connor Niall

    2010-12-01

    Full Text Available Abstract Background The population of HIV replicating within a host consists of independently evolving and interacting sub-populations that can be genetically distinct within anatomical compartments. HIV replicating within the brain causes neurocognitive disorders in up to 20-30% of infected individuals and is a viral sanctuary site for the development of drug resistance. The primary determinant of HIV neurotropism is macrophage tropism, which is primarily determined by the viral envelope (env gene. However, studies of genetic aspects of HIV replicating in the brain are hindered because existing repositories of HIV sequences are not focused on neurotropic virus nor annotated with neurocognitive and neuropathological status. To address this need, we constructed the HIV Brain Sequence Database. Results The HIV Brain Sequence Database is a public database of HIV envelope sequences, directly sequenced from brain and other tissues from the same patients. Sequences are annotated with clinical data including viral load, CD4 count, antiretroviral status, neurocognitive impairment, and neuropathological diagnosis, all curated from the original publication. Tissue source is coded using an anatomical ontology, the Foundational Model of Anatomy, to capture the maximum level of detail available, while maintaining ontological relationships between tissues and their subparts. 44 tissue types are represented within the database, grouped into 4 categories: (i brain, brainstem, and spinal cord; (ii meninges, choroid plexus, and CSF; (iii blood and lymphoid; and (iv other (bone marrow, colon, lung, liver, etc. Patient coding is correlated across studies, allowing sequences from the same patient to be grouped to increase statistical power. Using Cytoscape, we visualized relationships between studies, patients and sequences, illustrating interconnections between studies and the varying depth of sequencing, patient number, and tissue representation across studies

  9. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    OpenAIRE

    Sep?lveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain5, Arnab; Clark, Taane G

    2013-01-01

    BACKGROUND: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poi...

  10. Improvisation Planning and Jam Session Design using concepts of Sequence Variation and Flow Experience

    OpenAIRE

    Dubnov , Shlomo; Assayag , Gérard

    2005-01-01

    cote interne IRCAM: Assayag05a; National audience; We describe a model for improvisation design based on Factor Oracle automation, which is extended to perform learning and analysis of incoming sequences in terms of sequence variation parameters, namely replication, recombination and innovation. These parameters describe the improvisation plan and allow the design of new improvisations or analysis and modification of plans of existing improvisations. We further introduce an idea of flow exper...

  11. Barcoding lichen-forming fungi using 454 pyrosequencing is challenged by artifactual and biological sequence variation.

    Science.gov (United States)

    Mark, Kristiina; Cornejo, Carolina; Keller, Christine; Flück, Daniela; Scheidegger, Christoph

    2016-09-01

    Although lichens (lichen-forming fungi) play an important role in the ecological integrity of many vulnerable landscapes, only a minority of lichen-forming fungi have been barcoded out of the currently accepted ∼18 000 species. Regular Sanger sequencing can be problematic when analyzing lichens since saprophytic, endophytic, and parasitic fungi live intimately admixed, resulting in low-quality sequencing reads. Here, high-throughput, long-read 454 pyrosequencing in a GS FLX+ System was tested to barcode the fungal partner of 100 epiphytic lichen species from Switzerland using fungal-specific primers when amplifying the full internal transcribed spacer region (ITS). The present study shows the potential of DNA barcoding using pyrosequencing, in that the expected lichen fungus was successfully sequenced for all samples except one. Alignment solutions such as BLAST were found to be largely adequate for the generated long reads. In addition, the NCBI nucleotide database-currently the most complete database for lichen-forming fungi-can be used as a reference database when identifying common species, since the majority of analyzed lichens were identified correctly to the species or at least to the genus level. However, several issues were encountered, including a high sequencing error rate, multiple ITS versions in a genome (incomplete concerted evolution), and in some samples the presence of mixed lichen-forming fungi (possible lichen chimeras).

  12. Sequence Variation in Toxoplasma gondii rop17 Gene among Strains from Different Hosts and Geographical Locations

    Directory of Open Access Journals (Sweden)

    Nian-Zhang Zhang

    2014-01-01

    Full Text Available Genetic diversity of T. gondii is a concern of many studies, due to the biological and epidemiological diversity of this parasite. The present study examined sequence variation in rhoptry protein 17 (ROP17 gene among T. gondii isolates from different hosts and geographical regions. The rop17 gene was amplified and sequenced from 10 T. gondii strains, and phylogenetic relationship among these T. gondii strains was reconstructed using maximum parsimony (MP, neighbor-joining (NJ, and maximum likelihood (ML analyses. The partial rop17 gene sequences were 1375 bp in length and A+T contents varied from 49.45% to 50.11% among all examined T. gondii strains. Sequence analysis identified 33 variable nucleotide positions (2.1%, 16 of which were identified as transitions. Phylogeny reconstruction based on rop17 gene data revealed two major clusters which could readily distinguish Type I and Type II strains. Analyses of sequence variations in nucleotides and amino acids among these strains revealed high ratio of nonsynonymous to synonymous polymorphisms (>1, indicating that rop17 shows signs of positive selection. This study demonstrated the existence of slightly high sequence variability in the rop17 gene sequences among T. gondii strains from different hosts and geographical regions, suggesting that rop17 gene may represent a new genetic marker for population genetic studies of T. gondii isolates.

  13. Somatic Genetic Variation in Solid Pseudopapillary Tumor of the Pancreas by Whole Exome Sequencing

    Directory of Open Access Journals (Sweden)

    Meng Guo

    2017-01-01

    Full Text Available Solid pseudopapillary tumor of the pancreas (SPT is a rare pancreatic disease with a unique clinical manifestation. Although CTNNB1 gene mutations had been universally reported, genetic variation profiles of SPT are largely unidentified. We conducted whole exome sequencing in nine SPT patients to probe the SPT-specific insertions and deletions (indels and single nucleotide polymorphisms (SNPs. In total, 54 SNPs and 41 indels of prominent variations were demonstrated through parallel exome sequencing. We detected that CTNNB1 mutations presented throughout all patients studied (100%, and a higher count of SNPs was particularly detected in patients with older age, larger tumor, and metastatic disease. By aggregating 95 detected variation events and viewing the interconnections among each of the genes with variations, CTNNB1 was identified as the core portion in the network, which might collaborate with other events such as variations of USP9X, EP400, HTT, MED12, and PKD1 to regulate tumorigenesis. Pathway analysis showed that the events involved in other cancers had the potential to influence the progression of the SNPs count. Our study revealed an insight into the variation of the gene encoding region underlying solid-pseudopapillary neoplasm tumorigenesis. The detection of these variations might partly reflect the potential molecular mechanism.

  14. Sequence variation in TgROP7 gene among Toxoplasma gondii ...

    African Journals Online (AJOL)

    Yomi

    2012-03-27

    Mar 27, 2012 ... Toxoplasma gondii can infect a wide range of hosts including mammals and birds, causing toxoplasmosis which is one of the most common parasitic zoonoses worldwide. The present study examined sequence variation in rhoptry 7 (ROP7) gene among different T. gondii isolates from different hosts and ...

  15. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of

  16. Sequence Variation of MHC Class II DQB Gene in Bottlenose Dolphin (Tursiops truncatus from Taiwanese Waters

    Directory of Open Access Journals (Sweden)

    Wei-Cheng Yang

    2008-03-01

    Full Text Available The major histocompatibility complex (MHC is a large multigene coding for glycoproteins that play a key role in the initiation of immune responses in vertebrates. For a better understanding of the immunologic diversity in thriving marine mammal species, the sequence variation of the exon 2 region of MHC DQB locus was analyzed in 42 bottlenose dolphins (Tursiops truncatus collected from strandings and fishery bycatch in Taiwanese waters. The 172 bp sequences amplified showed no more than two alleles in each individual. The high proportion of non-synonymous nucleotide substitutions and the moderate amount of variation suggest positive selection pressure on this locus, arguing against a reduction in the marine environment selection pressure. The phylogenetic relationship among DQB exon 2 sequences of T. truncatus and other cetaceans did not coincide with taxonomic relationship, indicating a trans-species evolutionary pattern.

  17. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  18. Mitochondrial DNA D-loop sequence variation among 5 maternal lines of the Zemaitukai horse breed

    Directory of Open Access Journals (Sweden)

    E. Gus Cothran

    2005-12-01

    Full Text Available Genetic variation in Zemaitukai horses was investigated using mitochondrial DNA (mtDNA sequencing. The study was performed on 421 bp of the mitochondrial DNA control region, which is known to be more variable than other sections of the mitochondrial genome. Samples from each of the remaining maternal family lines of Zemaitukai horses and three random samples for other Lithuanian (Lithuanian Heavy Draught, Zemaitukai large type and ten European horse breeds were sequenced. Five distinct haplotypes were obtained for the five Zemaitukai maternal families supporting the pedigree data. The minimal difference between two different sequence haplotypes was 6 and the maximal 11 nucleotides in Zemaitukai horse breed. A total of 20 nucleotide differences compared to the reference sequence were found in Lithuanian horse breeds. Genetic cluster analysis did not shown any clear pattern of relationship among breeds of different type.

  19. FishPathogens.eu/vhsv: a user-friendly viral haemorrhagic septicaemia virus isolate and sequence database

    DEFF Research Database (Denmark)

    Jonstrup, Søren Peter; Gray, Tanya; Kahns, Søren

    2009-01-01

    A database has been created, http://www.Fish Pathogens.eu, with the aim of providing a single repository for collating important information on significant pathogens of aquaculture, relevant to their control and management. This database will be developed, maintained and managed as part of the Eu......A database has been created, http://www.Fish Pathogens.eu, with the aim of providing a single repository for collating important information on significant pathogens of aquaculture, relevant to their control and management. This database will be developed, maintained and managed as part...... of the European Community Reference Laboratory for Fish Diseases function. This concept has been initially developed for viral haemorrhagic septicaemia virus and will be extended in future to include information on other significant aquaculture pathogens. Information included for each isolate comprises sequence...... to obtain data from any selected part of the genome of interest. The output of the sequence search can be readily retrieved as a FASTA file ready to be imported into a sequence alignment tool of choice, facilitating further molecular epidemiological study....

  20. Identification of the sequence variations of 15 autosomal STR loci in a Chinese population.

    Science.gov (United States)

    Chen, Wenjing; Cheng, Jianding; Ou, Xueling; Chen, Yong; Tong, Dayue; Sun, Hongyu

    2014-01-01

    DNA sequence variation including base(s) changes and insertion or deletion in the primer binding region may cause a null allele and, if this changes the length of the amplified fragment out of the allelic ladder, off-ladder (OL) alleles may be detected. In order to provide accurate and reliable DNA evidence for forensic DNA analysis, it is essential to clarify sequence variations in prevalently used STR loci. Suspected null alleles and OL alleles of PlowerPlex16® System from 21,934 unrelated Chinese individuals were verified by alternative systems and sequenced. A total of 17 cases with null alleles were identified, including 12 kinds of point mutations in 16 cases and a 19-base deletion in one case. The total frequency of null alleles was 7.751 × 10(-4). Eight hundred and forty-four OL alleles classified as being of 97 different kinds were observed at 15 STR loci of the PowerPlex®16 system except vWA. All the frequencies of OL alleles were under 0.01. Null alleles should be confirmed by alternative primers and OL alleles should be named appropriately. Particular attention should be paid to sequence variation, since incorrect designation could lead to false conclusions.

  1. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

    Science.gov (United States)

    Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

    2014-04-01

    Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.

  2. SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations.

    Directory of Open Access Journals (Sweden)

    Steven N Hart

    Full Text Available BACKGROUND: Structural variation (SV represents a significant, yet poorly understood contribution to an individual's genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. RESULTS: We developed and validated SoftSearch using real and synthetic datasets. SoftSearch's key features are 1 not requiring secondary (or exhaustive primary alignment, 2 portability into established sequencing workflows, and 3 is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.. SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. CONCLUSIONS: We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.

  3. MECP2 variation in Rett syndrome-An overview of current coverage of genetic and phenotype data within existing databases.

    Science.gov (United States)

    Townend, Gillian S; Ehrhart, Friederike; van Kranen, Henk J; Wilkinson, Mark; Jacobsen, Annika; Roos, Marco; Willighagen, Egon L; van Enckevort, David; Evelo, Chris T; Curfs, Leopold M G

    2018-04-27

    Rett syndrome (RTT) is a monogenic rare disorder that causes severe neurological problems. In most cases, it results from a loss-of-function mutation in the gene encoding methyl-CPG-binding protein 2 (MECP2). Currently, about 900 unique MECP2 variations (benign and pathogenic) have been identified and it is suspected that the different mutations contribute to different levels of disease severity. For researchers and clinicians, it is important that genotype-phenotype information is available to identify disease-causing mutations for diagnosis, to aid in clinical management of the disorder, and to provide counseling for parents. In this study, 13 genotype-phenotype databases were surveyed for their general functionality and availability of RTT-specific MECP2 variation data. For each database, we investigated findability and interoperability alongside practical user functionality, and type and amount of genetic and phenotype data. The main conclusions are that, as well as being challenging to find these databases and specific MECP2 variants held within, interoperability is as yet poorly developed and requires effort to search across databases. Nevertheless, we found several thousand online database entries for MECP2 variations and their associated phenotypes, diagnosis, or predicted variant effects, which is a good starting point for researchers and clinicians who want to provide, annotate, and use the data. © 2018 The Authors. Human Mutation published by Wiley Periodicals, Inc.

  4. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Ussery, David

    2004-01-01

    , these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web...... and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently...... content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues....

  5. Structural and sequence variants in patients with Silver-Russell syndrome or similar features-Curation of a disease database

    DEFF Research Database (Denmark)

    Tümer, Zeynep; López-Hernández, Julia Angélica; Netchine, Irène

    2018-01-01

    data of these patients. The clinical features are scored according to the Netchine-Harbison clinical scoring system (NH-CSS), which has recently been accepted as standard by consensus. The structural and sequence variations are reviewed and where necessary redescribed according to recent...

  6. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David

    2012-01-01

    Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful...... for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps...... more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...

  7. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Directory of Open Access Journals (Sweden)

    Rodrigues NB

    2002-01-01

    Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  8. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions

    Science.gov (United States)

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M.; Greenwood, Alex D.; Roca, Alfred L.

    2014-01-01

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. PMID:25462343

  9. A map of human genome variation from population-scale sequencing.

    Science.gov (United States)

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  10. Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences.

    Science.gov (United States)

    Mi, Tian; Merlin, Jerlin Camilus; Deverasetty, Sandeep; Gryk, Michael R; Bill, Travis J; Brooks, Andrew W; Lee, Logan Y; Rathnayake, Viraj; Ross, Christian A; Sargeant, David P; Strong, Christy L; Watts, Paula; Rajasekaran, Sanguthevar; Schiller, Martin R

    2012-01-01

    Minimotif Miner (MnM available at http://minimotifminer.org or http://mnm.engr.uconn.edu) is an online database for identifying new minimotifs in protein queries. Minimotifs are short contiguous peptide sequences that have a known function in at least one protein. Here we report the third release of the MnM database which has now grown 60-fold to approximately 300,000 minimotifs. Since short minimotifs are by their nature not very complex we also summarize a new set of false-positive filters and linear regression scoring that vastly enhance minimotif prediction accuracy on a test data set. This online database can be used to predict new functions in proteins and causes of disease.

  11. Sequence variation of the feline immunodeficiency virus genome and its clinical relevance.

    Science.gov (United States)

    Stickney, A L; Dunowska, M; Cave, N J

    2013-06-08

    The ongoing evolution of feline immunodeficiency virus (FIV) has resulted in the existence of a diverse continuum of viruses. FIV isolates differ with regards to their mutation and replication rates, plasma viral loads, cell tropism and the ability to induce apoptosis. Clinical disease in FIV-infected cats is also inconsistent. Genomic sequence variation of FIV is likely to be responsible for some of the variation in viral behaviour. The specific genetic sequences that influence these key viral properties remain to be determined. With knowledge of the specific key determinants of pathogenicity, there is the potential for veterinarians in the future to apply this information for prognostic purposes. Genomic sequence variation of FIV also presents an obstacle to effective vaccine development. Most challenge studies demonstrate acceptable efficacy of a dual-subtype FIV vaccine (Fel-O-Vax FIV) against FIV infection under experimental settings; however, vaccine efficacy in the field still remains to be proven. It is important that we discover the key determinants of immunity induced by this vaccine; such data would compliment vaccine field efficacy studies and provide the basis to make informed recommendations on its use.

  12. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    Science.gov (United States)

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  13. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  14. SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Directory of Open Access Journals (Sweden)

    Patel Viren

    2010-09-01

    Full Text Available Abstract Background The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results SeqAnt (Sequence Annotator is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds. Conclusion SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.

  15. Mitochondrial DNA sequence variation in Finnish patients with matrilineal diabetes mellitus

    Directory of Open Access Journals (Sweden)

    Soini Heidi K

    2012-07-01

    Full Text Available Abstract Background The genetic background of type 2 diabetes is complex involving contribution by both nuclear and mitochondrial genes. There is an excess of maternal inheritance in patients with type 2 diabetes and, furthermore, diabetes is a common symptom in patients with mutations in mitochondrial DNA (mtDNA. Polymorphisms in mtDNA have been reported to act as risk factors in several complex diseases. Findings We examined the nucleotide variation in complete mtDNA sequences of 64 Finnish patients with matrilineal diabetes. We used conformation sensitive gel electrophoresis and sequencing to detect sequence variation. We analysed the pathogenic potential of nonsynonymous variants detected in the sequences and examined the role of the m.16189 T>C variant. Controls consisted of non-diabetic subjects ascertained in the same population. The frequency of mtDNA haplogroup V was 3-fold higher in patients with diabetes. Patients harboured many nonsynonymous mtDNA substitutions that were predicted to be possibly or probably damaging. Furthermore, a novel m.13762 T>G in MTND5 leading to p.Ser476Ala and several rare mtDNA variants were found. Haplogroup H1b harbouring m.16189 T > C and m.3010 G > A was found to be more frequent in patients with diabetes than in controls. Conclusions Mildly deleterious nonsynonymous mtDNA variants and rare population-specific haplotypes constitute genetic risk factors for maternally inherited diabetes.

  16. Bm86 midgut protein sequence variation in South Texas cattle fever ticks

    Directory of Open Access Journals (Sweden)

    Kammlah Diane M

    2010-11-01

    Full Text Available Abstract Background Cattle fever ticks, Rhipicephalus (Boophilus microplus and R. (B. annulatus, vector bovine and equine babesiosis, and have significantly expanded beyond the permanent quarantine zone established in South Texas. Currently, there are no vaccines approved for use within the United States for controlling these vectors. Vaccines developed in Australia and Cuba based on the midgut antigen Bm86 have variable efficacy against cattle fever ticks. A possible explanation for this variation in vaccine efficacy is amino acid sequence divergence between the recombinant Bm86 vaccine component and native Bm86 expressed in ticks from different geographical regions of the world. Results There was 91.8% amino acid sequence identity in Bm86 among R. microplus and R. annulatus sequenced from South Texas infestations. When South Texas isolates were compared to the Australian Yeerongpilly and Cuban Camcord vaccine strains, there was 89.8% and 90.0% identity, respectively. Most of the sequence divergence was focused in one region of the protein, amino acids 206-298. Hydrophilicity profiles revealed that two short regions of Bm86 (amino acids 206-210 and 560-570 appear to be more hydrophilic in South Texas isolates compared to vaccine strains. Only one amino acid difference was found between South Texas and vaccine strains within two previously described B-cell epitopes. A total of 4 amino acid differences were observed within three peptides previously shown to induce protective immune responses in cattle. Conclusions Sequence differences between South Texas isolates and Yeerongpilly and Camcord strains are spread throughout the entire Bm86 sequence, suggesting that geographic variation does exist. Differences within previously described B-cell epitopes between South Texas isolates and vaccine strains are minimal; however, short regions of hydrophilic amino acids found unique to South Texas isolates suggest that additional unique surface exposed

  17. Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

    Science.gov (United States)

    Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

    2017-08-16

    High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.

  18. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    Directory of Open Access Journals (Sweden)

    Aurélien Chateigner

    2015-07-01

    Full Text Available Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%. K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs. Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential.

  19. Rare and common regulatory variation in population-scale sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Stephen B Montgomery

    2011-07-01

    Full Text Available Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

  20. Modeling bias and variation in the stochastic processes of small RNA sequencing.

    Science.gov (United States)

    Argyropoulos, Christos; Etheridge, Alton; Sakhanenko, Nikita; Galas, David

    2017-06-20

    The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can use the generalized additive models for location, scale and shape (GAMLSS) distributional regression framework to calculate and apply empirical correction factors for ligase bias. Bias correction could remove more than 40% of the bias for miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition. Using synthetic mixes of known composition, we show that the GAMLSS approach can analyze differential expression with greater accuracy, higher sensitivity and specificity than six existing algorithms (DESeq2, edgeR, EBSeq, limma, DSS, voom) for the analysis of small RNA-seq data. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism spe...cies Data detail Data name Cluster based on sequence comparison of homologous proteins of 95 organism specie...istory of This Database Site Policy | Contact Us Cluster based on sequence compariso

  2. Genetic variation in the Staphylococcus aureus 8325 strain lineage revealed by whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Kristoffer T Bæk

    Full Text Available Staphylococcus aureus strains of the 8325 lineage, especially 8325-4 and derivatives lacking prophage, have been used extensively for decades of research. We report herein the results of our deep sequence analysis of strain 8325-4. Assignment of sequence variants compared with the reference strain 8325 (NRS77/PS47 required correction of errors in the 8325 reference genome, and reassessment of variation previously attributed to chemical mutagenesis of the restriction-defective RN4220. Using an extensive strain pedigree analysis, we discovered that 8325-4 contains 16 single nucleotide polymorphisms (SNP arising prior to the construction of RN4220. We identified 5 indels in 8325-4 compared with 8325. Three indels correspond to expected Φ11, 12, 13 excisions, one indel is explained by a sequence assembly artifact, and the final indel (Δ63bp in the spa-sarS intergenic region is common to only a sub-lineage of 8325-4 strains including SH1000. This deletion was found to significantly decrease (75% steady state sarS but not spa transcript levels in post-exponential phase. The sub-lineage 8325-4 was also found to harbor 4 additional SNPs. We also found large sequence variation between 8325, 8325-4 and RN4220 in a cluster of repetitive hypothetical proteins (SA0282 homologs near the Ess secretion cluster. The overall 8325-4 SNP set results in 17 alterations within coding sequences. Remarkably, we discovered that all tested strains of the 8325-4 lineage lack phenol soluble modulin α3 (PSMα3, a virulence determinant implicated in neutrophil chemotaxis, biofilm architecture and surface spreading. Collectively, our results clarify and define the 8325-4 pedigree and reveal clear evidence that mutations existing throughout all branches of this lineage, including the widely used RN6390 and SH1000 strains, could conceivably impact virulence regulation.

  3. MannDB – A microbial database of automated protein sequence analyses and evidence integration for protein characterization

    Directory of Open Access Journals (Sweden)

    Kuczmarski Thomas A

    2006-10-01

    Full Text Available Abstract Background MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. Description MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. Conclusion MannDB comprises a large number of genomes and comprehensive protein

  4. ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    Science.gov (United States)

    Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

    2002-12-19

    Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.

  5. Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.

    Science.gov (United States)

    Casals, Ferran; Anglada, Roger; Bonet, Núria; Rasal, Raquel; van der Gaag, Kristiaan J; Hoogenboom, Jerry; Solé-Morata, Neus; Comas, David; Calafell, Francesc

    2017-09-01

    We have genotyped the 58 STRs (27 autosomal, 24 Y-STRs and 7 X-STRs) and 94 autosomal SNPs in Illumina ForenSeq™ Primer Mix A in 88 Spanish Roma (Gypsy) samples and 143 Catalans. Since this platform is based in massive parallel sequencing, we have used simple R scripts to uncover the sequence variation in the repeat region. Thus, we have found, across 58 STRs, 541 length-based alleles, which, after considering repeat-sequence variation, became 804 different alleles. All loci in both populations were in Hardy-Weinberg equilibrium. F ST between both populations was 0.0178 for autosomal SNPs, 0.0146 for autosomal STRs, 0.0101 for X-STRs and 0.1866 for Y-STRs. Combined a priori statistics showed quite large; for instance, pooling all the autosomal loci, the a priori probabilities of discriminating a suspect become 1-(2.3×10 -70 ) and 1-(5.9×10 -73 ), for Roma and Catalans respectively, and the chances of excluding a false father in a trio are 1-(2.6×10 -20 ) and 1-(2.0×10 -21 ). Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Inferring Variation in Copy Number Using High Throughput Sequencing Data in R.

    Science.gov (United States)

    Knaus, Brian J; Grünwald, Niklaus J

    2018-01-01

    Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori . Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R package vcfR . This method is based on the relative frequency of each allele (in both genic and non-genic regions) sequenced at heterozygous positions throughout a genome. These heterozygous positions are summarized by using arbitrarily sized windows of heterozygous positions, binning the allele frequencies, and selecting the bin with the greatest abundance of positions. This provides a non-parametric summary of the frequency that alleles were sequenced at. The method is applicable to organisms that have reference genomes that consist of full chromosomes or sub-chromosomal contigs. In contrast to other software designed to detect copy number variation, our method does not rely on an assumption of base ploidy, but instead infers it. We validated these approaches with the model system of Saccharomyces cerevisiae and applied it to the oomycete Phytophthora infestans , both known to vary in copy number. This functionality has been incorporated into the current release of the R package vcfR to provide modular and flexible methods to investigate copy number variation in genomic projects.

  7. TMC-SNPdb: an Indian germline variant database derived from whole exome sequences.

    Science.gov (United States)

    Upadhyay, Pawan; Gardi, Nilesh; Desai, Sanket; Sahoo, Bikram; Singh, Ankita; Togar, Trupti; Iyer, Prajish; Prasad, Ratnam; Chandrani, Pratik; Gupta, Sudeep; Dutt, Amit

    2016-01-01

    Cancer is predominantly a somatic disease. A mutant allele present in a cancer cell genome is considered somatic when it's absent in the paired normal genome along with public SNP databases. The current build of dbSNP, the most comprehensive public SNP database, however inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations. We present the T: ata M: emorial C: entre-SNP D: ata B: ase (TMC-SNPdb), as the first open source, flexible, upgradable, and freely available SNP database (accessible through dbSNP build 149 and ANNOVAR)-representing 114 309 unique germline variants-generated from whole exome data of 62 normal samples derived from cancer patients of Indian origin. The TMC-SNPdb is presented with a companion subtraction tool that can be executed with command line option or using an easy-to-use graphical user interface with the ability to deplete additional Indian population specific SNPs over and above dbSNP and 1000 Genomes databases. Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb could deplete 42, 33 and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively. Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases. In addition to dbSNP build 149 and ANNOVAR, the TMC-SNPdb along with the subtraction tool is available for download in the public domain at the following:Database URL: http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNP/TMCSNPdp.html. © The Author(s) 2016. Published by Oxford University Press.

  8. Polymorphisms and resistance mutations of hepatitis C virus on sequences in the European hepatitis C virus database

    Science.gov (United States)

    Kliemann, Dimas Alexandre; Tovo, Cristiane Valle; da Veiga, Ana Beatriz Gorini; de Mattos, Angelo Alves; Wood, Charles

    2016-01-01

    AIM To evaluate the occurrence of resistant mutations in treatment-naïve hepatitis C virus (HCV) sequences deposited in the European hepatitis C virus database (euHCVdb). METHODS The sequences were downloaded from the euHCVdb (https://euhcvdb.ibcp.fr/euHCVdb/). The search was performed for full-length NS3 protease, NS5A and NS5B polymerase sequences of HCV, separated by genotypes 1a, 1b, 2a, 2b and 3a, and resulted in 798 NS3, 708 NS5A and 535 NS5B sequences from HCV genotypes 1a, 1b, 2a, 2b and 3a, after the exclusion of sequences containing errors and/or gaps or incomplete sequences, and sequences from patients previously treated with direct antiviral agents (DAA). The sequence alignment was performed with MEGA 6.06 MAC and the resulting protein sequences were then analyzed using the BioEdit 7.2.5. for mutations associated with resistance. Only positions that have been described as being associated with failure in treatment in in vivo studies, and/or as conferring a more than 2-fold change in replication in comparison to the wildtype reference strain in in vitro phenotypic assays were included in the analysis. RESULTS The Q80K variant in the NS3 gene was the most prevalent mutation, being found in 44.66% of subtype 1a and 0.25% of subtype 1b. Other frequent mutations observed in more than 2% of the NS3 sequences were: I170V (3.21%) in genotype 1a, and Y56F (15.93%), V132I (23.28%) and I170V (65.20%) in genotype 1b. For the NS5A, 2.21% of the genotype 1a sequences have the P58S mutation, 5.95% of genotype 1b sequences have the R30Q mutation, 15.79% of subtypes 2a sequences have the Q30R mutation, 23.08% of subtype 2b sequences have a L31M mutation, and in subtype 3a sequences, 23.08% have the M31L resistant variants. For the NS5B, the V321L RAV was identified in 0.60% of genotype 1a and in 0.32% of genotype 1b sequences, and the N142T variant was observed in 0.32% of subtype 1b sequences. The C316Y, S556G, D559N RAV were identified in 0.33%, 7.82% and 0.32% of

  9. Polymorphisms and resistance mutations of hepatitis C virus on sequences in the European hepatitis C virus database.

    Science.gov (United States)

    Kliemann, Dimas Alexandre; Tovo, Cristiane Valle; da Veiga, Ana Beatriz Gorini; de Mattos, Angelo Alves; Wood, Charles

    2016-10-28

    To evaluate the occurrence of resistant mutations in treatment-naïve hepatitis C virus (HCV) sequences deposited in the European hepatitis C virus database (euHCVdb). The sequences were downloaded from the euHCVdb (https://euhcvdb.ibcp.fr/euHCVdb/). The search was performed for full-length NS3 protease, NS5A and NS5B polymerase sequences of HCV, separated by genotypes 1a, 1b, 2a, 2b and 3a, and resulted in 798 NS3, 708 NS5A and 535 NS5B sequences from HCV genotypes 1a, 1b, 2a, 2b and 3a, after the exclusion of sequences containing errors and/or gaps or incomplete sequences, and sequences from patients previously treated with direct antiviral agents (DAA). The sequence alignment was performed with MEGA 6.06 MAC and the resulting protein sequences were then analyzed using the BioEdit 7.2.5. for mutations associated with resistance. Only positions that have been described as being associated with failure in treatment in in vivo studies, and/or as conferring a more than 2-fold change in replication in comparison to the wildtype reference strain in in vitro phenotypic assays were included in the analysis. The Q80K variant in the NS3 gene was the most prevalent mutation, being found in 44.66% of subtype 1a and 0.25% of subtype 1b. Other frequent mutations observed in more than 2% of the NS3 sequences were: I170V (3.21%) in genotype 1a, and Y56F (15.93%), V132I (23.28%) and I170V (65.20%) in genotype 1b. For the NS5A, 2.21% of the genotype 1a sequences have the P58S mutation, 5.95% of genotype 1b sequences have the R30Q mutation, 15.79% of subtypes 2a sequences have the Q30R mutation, 23.08% of subtype 2b sequences have a L31M mutation, and in subtype 3a sequences, 23.08% have the M31L resistant variants. For the NS5B, the V321L RAV was identified in 0.60% of genotype 1a and in 0.32% of genotype 1b sequences, and the N142T variant was observed in 0.32% of subtype 1b sequences. The C316Y, S556G, D559N RAV were identified in 0.33%, 7.82% and 0.32% of genotype 1b sequences

  10. Rapid detection of SMARCB1 sequence variation using high resolution melting

    Directory of Open Access Journals (Sweden)

    Ashley David M

    2009-12-01

    Full Text Available Abstract Background Rhabdoid tumors are rare cancers of early childhood arising in the kidney, central nervous system and other organs. The majority are caused by somatic inactivating mutations or deletions affecting the tumor suppressor locus SMARCB1 [OMIM 601607]. Germ-line SMARCB1 inactivation has been reported in association with rhabdoid tumor, epitheloid sarcoma and familial schwannomatosis, underscoring the importance of accurate mutation screening to ascertain recurrence and transmission risks. We describe a rapid and sensitive diagnostic screening method, using high resolution melting (HRM, for detecting sequence variations in SMARCB1. Methods Amplicons, encompassing the nine coding exons of SMARCB1, flanking splice site sequences and the 5' and 3' UTR, were screened by both HRM and direct DNA sequencing to establish the reliability of HRM as a primary mutation screening tool. Reaction conditions were optimized with commercially available HRM mixes. Results The false negative rate for detecting sequence variants by HRM in our sample series was zero. Nine amplicons out of a total of 140 (6.4% showed variant melt profiles that were subsequently shown to be false positive. Overall nine distinct pathogenic SMARCB1 mutations were identified in a total of 19 possible rhabdoid tumors. Two tumors had two distinct mutations and two harbored SMARCB1 deletion. Other mutations were nonsense or frame-shifts. The detection sensitivity of the HRM screening method was influenced by both sequence context and specific nucleotide change and varied from 1: 4 to 1:1000 (variant to wild-type DNA. A novel method involving digital HRM, followed by re-sequencing, was used to confirm mutations in tumor specimens containing associated normal tissue. Conclusions This is the first report describing SMARCB1 mutation screening using HRM. HRM is a rapid, sensitive and inexpensive screening technology that is likely to be widely adopted in diagnostic laboratories to

  11. Rapid detection of SMARCB1 sequence variation using high resolution melting

    International Nuclear Information System (INIS)

    Dagar, Vinod; Chow, Chung-Wo; Ashley, David M; Algar, Elizabeth M

    2009-01-01

    Rhabdoid tumors are rare cancers of early childhood arising in the kidney, central nervous system and other organs. The majority are caused by somatic inactivating mutations or deletions affecting the tumor suppressor locus SMARCB1 [OMIM 601607]. Germ-line SMARCB1 inactivation has been reported in association with rhabdoid tumor, epitheloid sarcoma and familial schwannomatosis, underscoring the importance of accurate mutation screening to ascertain recurrence and transmission risks. We describe a rapid and sensitive diagnostic screening method, using high resolution melting (HRM), for detecting sequence variations in SMARCB1. Amplicons, encompassing the nine coding exons of SMARCB1, flanking splice site sequences and the 5' and 3' UTR, were screened by both HRM and direct DNA sequencing to establish the reliability of HRM as a primary mutation screening tool. Reaction conditions were optimized with commercially available HRM mixes. The false negative rate for detecting sequence variants by HRM in our sample series was zero. Nine amplicons out of a total of 140 (6.4%) showed variant melt profiles that were subsequently shown to be false positive. Overall nine distinct pathogenic SMARCB1 mutations were identified in a total of 19 possible rhabdoid tumors. Two tumors had two distinct mutations and two harbored SMARCB1 deletion. Other mutations were nonsense or frame-shifts. The detection sensitivity of the HRM screening method was influenced by both sequence context and specific nucleotide change and varied from 1: 4 to 1:1000 (variant to wild-type DNA). A novel method involving digital HRM, followed by re-sequencing, was used to confirm mutations in tumor specimens containing associated normal tissue. This is the first report describing SMARCB1 mutation screening using HRM. HRM is a rapid, sensitive and inexpensive screening technology that is likely to be widely adopted in diagnostic laboratories to facilitate whole gene mutation screening

  12. VWF mutations and new sequence variations identified in healthy controls are more frequent in the African-American population.

    Science.gov (United States)

    Bellissimo, Daniel B; Christopherson, Pamela A; Flood, Veronica H; Gill, Joan Cox; Friedman, Kenneth D; Haberichter, Sandra L; Shapiro, Amy D; Abshire, Thomas C; Leissinger, Cindy; Hoots, W Keith; Lusher, Jeanne M; Ragni, Margaret V; Montgomery, Robert R

    2012-03-01

    Diagnosis and classification of VWD is aided by molecular analysis of the VWF gene. Because VWF polymorphisms have not been fully characterized, we performed VWF laboratory testing and gene sequencing of 184 healthy controls with a negative bleeding history. The controls included 66 (35.9%) African Americans (AAs). We identified 21 new sequence variations, 13 (62%) of which occurred exclusively in AAs and 2 (G967D, T2666M) that were found in 10%-15% of the AA samples, suggesting they are polymorphisms. We identified 14 sequence variations reported previously as VWF mutations, the majority of which were type 1 mutations. These controls had VWF Ag levels within the normal range, suggesting that these sequence variations might not always reduce plasma VWF levels. Eleven mutations were found in AAs, and the frequency of M740I, H817Q, and R2185Q was 15%-18%. Ten AA controls had the 2N mutation H817Q; 1 was homozygous. The average factor VIII level in this group was 99 IU/dL, suggesting that this variation may confer little or no clinical symptoms. This study emphasizes the importance of sequencing healthy controls to understand ethnic-specific sequence variations so that asymptomatic sequence variations are not misidentified as mutations in other ethnic or racial groups.

  13. HLA DNA sequence variation among human populations: molecular signatures of demographic and selective events.

    Directory of Open Access Journals (Sweden)

    Stéphane Buhler

    2011-02-01

    Full Text Available Molecular differences between HLA alleles vary up to 57 nucleotides within the peptide binding coding region of human Major Histocompatibility Complex (MHC genes, but it is still unclear whether this variation results from a stochastic process or from selective constraints related to functional differences among HLA molecules. Although HLA alleles are generally treated as equidistant molecular units in population genetic studies, DNA sequence diversity among populations is also crucial to interpret the observed HLA polymorphism. In this study, we used a large dataset of 2,062 DNA sequences defined for the different HLA alleles to analyze nucleotide diversity of seven HLA genes in 23,500 individuals of about 200 populations spread worldwide. We first analyzed the HLA molecular structure and diversity of these populations in relation to geographic variation and we further investigated possible departures from selective neutrality through Tajima's tests and mismatch distributions. All results were compared to those obtained by classical approaches applied to HLA allele frequencies.Our study shows that the global patterns of HLA nucleotide diversity among populations are significantly correlated to geography, although in some specific cases the molecular information reveals unexpected genetic relationships. At all loci except HLA-DPB1, populations have accumulated a high proportion of very divergent alleles, suggesting an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model. However, both different intensities of selection and unequal levels of gene conversion may explain the heterogeneous mismatch distributions observed among the loci. Also, distinctive patterns of sequence divergence observed at the HLA-DPB1 locus suggest current neutrality but old selective pressures on this gene. We conclude that HLA DNA sequences advantageously complement HLA allele frequencies as a source of data used

  14. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    Energy Technology Data Exchange (ETDEWEB)

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  15. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999; FINAL

    International Nuclear Information System (INIS)

    Harger, Carol A.

    1999-01-01

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence.[The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  16. Performance of Correspondence Algorithms in Vision-Based Driver Assistance Using an Online Image Sequence Database

    DEFF Research Database (Denmark)

    Klette, Reinhard; Krüger, Norbert; Vaudrey, Tobi

    2011-01-01

    the classification of recorded video data into situations defined by a cooccurrence of some events in recorded traffic scenes. About 100-400 stereo frames (or 4-16 s of recording) are considered a basic sequence, which will be identified with one particular situation. Future testing is expected to be on data...

  17. Amino acid sequences of predicted proteins and their annotation for 95 organism species. - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Amino acid sequences of predicted proteins and their annotation for 95 organis...m species. Data detail Data name Amino acid sequences of predicted proteins and their annotation for 95 orga...nism species. DOI 10.18908/lsdba.nbdc00464-001 Description of data contents Amino acid sequences of predicted proteins...Database Description Download License Update History of This Database Site Policy | Contact Us Amino acid sequences of predicted prot...eins and their annotation for 95 organism species. - Gclust Server | LSDB Archive ...

  18. Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

    Science.gov (United States)

    Gillet-Markowska, Alexandre; Richard, Hugues; Fischer, Gilles; Lafontaine, Ingrid

    2015-03-15

    The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Sequence Classification - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ansmembrane helical proteins by applying statistical and machine learning methods to each amino acid sequenc.... Amino Acid Result of predicting β-barrel membrane protein with a statistical method using amino acid compo...sition. ( TMBETADISC-COMP ) Dipeptide Result of predicting β-barrel membrane protein with a statistic...ting β-barrel membrane protein with a statistical method using motifs. ( TMBETADISC-MOTIF ) SVM Result of pr

  20. Null alleles and sequence variations at primer binding sites of STR loci within multiplex typing systems.

    Science.gov (United States)

    Yao, Yining; Yang, Qinrui; Shao, Chengchen; Liu, Baonian; Zhou, Yuxiang; Xu, Hongmei; Zhou, Yueqin; Tang, Qiqun; Xie, Jianhui

    2018-01-01

    Rare variants are widely observed in human genome and sequence variations at primer binding sites might impair the process of PCR amplification resulting in dropouts of alleles, named as null alleles. In this study, 5 cases from routine paternity testing using PowerPlex ® 21 System for STR genotyping were considered to harbor null alleles at TH01, FGA, D5S818, D8S1179, and D16S539, respectively. The dropout of alleles was confirmed by using alternative commercial kits AGCU Expressmarker 22 PCR amplification kit and AmpFℓSTR ® . Identifiler ® Plus Kit, and sequencing results revealed a single base variation at the primer binding site of each STR locus. Results from the collection of previous reports show that null alleles at D5S818 were frequently observed in population detected by two PowerPlex ® typing systems and null alleles at D19S433 were mostly observed in Japanese population detected by two AmpFℓSTR™ typing systems. Furthermore, the most popular mutation type appeared the transition from C to T with G to A, which might have a potential relationship with DNA methylation. Altogether, these results can provide helpful information in forensic practice to the elimination of genotyping discrepancy and the development of primer sets. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences

    OpenAIRE

    Lescot, Magali; Déhais, Patrice; Thijs, Gert; Marchal, Kathleen; Moreau, Yves; Van de Peer, Yves; Rouzé, Pierre; Rombauts, Stephane

    2002-01-01

    PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific t...

  2. Extra-binomial variation approach for analysis of pooled DNA sequencing data

    Science.gov (United States)

    Wallace, Chris

    2012-01-01

    Motivation: The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate. Results: We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods. Availability: Package ‘extraBinomial’ is on http://cran.r-project.org/ Contact: chris.wallace@cimr.cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:22976083

  3. Human Y chromosome copy number variation in the next generation sequencing era and beyond.

    Science.gov (United States)

    Massaia, Andrea; Xue, Yali

    2017-05-01

    The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.

  4. Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha).

    Science.gov (United States)

    da Silva, M N; Patton, J L

    1993-09-01

    Patterns of evolutionary relationships among haplotype clades of sequences of the mitochondrial cytochrome b DNA gene are examined for five genera of arboreal rodents of the Caviomorph family Echimyidae from the Amazon Basin. Data are available for 798 bp of sequence from a total of 24 separate localities in Peru, Venezuela, Bolivia, and Brazil for Mesomys, Isothrix, Makalata, Dactylomys, and Echimys. Sequence divergence, corrected for multiple hits, is extensive, ranging from less than 1% for comparisons within populations of over 20% among geographic units within genera. Both the degree of differentiation and the geographic patterning of the variation suggest that more than one species composes the Amazonian distribution of the currently recognized Mesomys hispidus, Isothrix bistriata, Makalata didelphoides, and Dactylomys dactylinus. There is general concordance in the geographic range of haplotype clades for each of these taxa, and the overall level of differentiation within them is largely equivalent. These observations suggest that a common vicariant history underlies the respective diversification of each genus. However, estimated times of divergence based on the rate of third position transversion substitutions for the major clades within each genus typically range above 1 million years. Thus, allopatric isolation precipitating divergence must have been considerably earlier than the late Pleistocene forest fragmentation events commonly invoked for Amazonian biota.

  5. Identification of Anhydrobiosis-related Genes from an Expressed Sequence Tag Database in the Cryptobiotic Midge Polypedilum vanderplanki (Diptera; Chironomidae)*

    Science.gov (United States)

    Cornette, Richard; Kanamori, Yasushi; Watanabe, Masahiko; Nakahara, Yuichi; Gusev, Oleg; Mitsumasu, Kanako; Kadono-Okuda, Keiko; Shimomura, Michihiko; Mita, Kazuei; Kikawada, Takahiro; Okuda, Takashi

    2010-01-01

    Some organisms are able to survive the loss of almost all their body water content, entering a latent state known as anhydrobiosis. The sleeping chironomid (Polypedilum vanderplanki) lives in the semi-arid regions of Africa, and its larvae can survive desiccation in an anhydrobiotic form during the dry season. To unveil the molecular mechanisms of this resistance to desiccation, an anhydrobiosis-related Expressed Sequence Tag (EST) database was obtained from the sequences of three cDNA libraries constructed from P. vanderplanki larvae after 0, 12, and 36 h of desiccation. The database contained 15,056 ESTs distributed into 4,807 UniGene clusters. ESTs were classified according to gene ontology categories, and putative expression patterns were deduced for all clusters on the basis of the number of clones in each library; expression patterns were confirmed by real-time PCR for selected genes. Among up-regulated genes, antioxidants, late embryogenesis abundant (LEA) proteins, and heat shock proteins (Hsps) were identified as important groups for anhydrobiosis. Genes related to trehalose metabolism and various transporters were also strongly induced by desiccation. Those results suggest that the oxidative stress response plays a central role in successful anhydrobiosis. Similarly, protein denaturation and aggregation may be prevented by marked up-regulation of Hsps and the anhydrobiosis-specific LEA proteins. A third major feature is the predicted increase in trehalose synthesis and in the expression of various transporter proteins allowing the distribution of trehalose and other solutes to all tissues. PMID:20833722

  6. Comprehensive assessment of sequence variation within the copy number variable defensin cluster on 8p23 by target enriched in-depth 454 sequencing

    Directory of Open Access Journals (Sweden)

    Zhang Xinmin

    2011-05-01

    Full Text Available Abstract Background In highly copy number variable (CNV regions such as the human defensin gene locus, comprehensive assessment of sequence variations is challenging. PCR approaches are practically restricted to tiny fractions, and next-generation sequencing (NGS approaches of whole individual genomes e.g. by the 1000 Genomes Project is confined by an affordable sequence depth. Combining target enrichment with NGS may represent a feasible approach. Results As a proof of principle, we enriched a ~850 kb section comprising the CNV defensin gene cluster DEFB, the invariable DEFA part and 11 control regions from two genomes by sequence capture and sequenced it by 454 technology. 6,651 differences to the human reference genome were found. Comparison to HapMap genotypes revealed sensitivities and specificities in the range of 94% to 99% for the identification of variations. Using error probabilities for rigorous filtering revealed 2,886 unique single nucleotide variations (SNVs including 358 putative novel ones. DEFB CN determinations by haplotype ratios were in agreement with alternative methods. Conclusion Although currently labor extensive and having high costs, target enriched NGS provides a powerful tool for the comprehensive assessment of SNVs in highly polymorphic CNV regions of individual genomes. Furthermore, it reveals considerable amounts of putative novel variations and simultaneously allows CN estimation.

  7. Functional translation and linguistic variation: the use of didactic sequence in teaching languages

    Directory of Open Access Journals (Sweden)

    Valdecy Oliveira Pontes

    2017-12-01

    Full Text Available In the context of the approach of the linguistic variation of Spanish and the use of Functionalist Translation in Foreign Language classes, this article aims to report the results of the application of a Didactic Sequence (SD, in the style of the Geneva School, Hispanic plays for the teaching of linguistic variation in the pronominal treatment forms of the Spanish-Portuguese Brazilian language pair. SD was applied in the subject "Introduction to Translation Studies in Spanish Language" (2nd semester, offered by the course in Letters - Spanish Language and its Literatures, of the Federal University of Ceará. This article was based on the theoretical foundations of Functionalist Translation (NORD, 1994, 1996, 2009, 2012, Translation and Sociolinguistics (BOLAÑOS-CUELLAR, 2000; MAYORAL, 1998, elaboration of SD (DOLZ; NOVERRAZ; SCHNEUWLY, 2004; CRISTÓVÃO, 2010; BARROS, 2012 and research on the variation in the forms of treatment of Spanish and Portuguese (FONTANELLA DE WEINBER, 1999; SCHERRE et al, 2015.

  8. PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

    Directory of Open Access Journals (Sweden)

    Adeel Malik

    2010-01-01

    Full Text Available Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins as well as noncovalently (protein-carbohydrate complexes is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB, (ii Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

  9. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    KAUST Repository

    Sepú lveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

    2013-01-01

    Background: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.Results: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.Conclusions: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. 2013 Seplveda et al.; licensee BioMed Central Ltd.

  10. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

    Science.gov (United States)

    Sepúlveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

    2013-02-26

    The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.

  11. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    KAUST Repository

    Sepúlveda, Nuno

    2013-02-26

    Background: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.Results: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.Conclusions: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. 2013 Seplveda et al.; licensee BioMed Central Ltd.

  12. Variations in CCL3L gene cluster sequence and non-specific gene copy numbers

    Directory of Open Access Journals (Sweden)

    Edberg Jeffrey C

    2010-03-01

    Full Text Available Abstract Background Copy number variations (CNVs of the gene CC chemokine ligand 3-like1 (CCL3L1 have been implicated in HIV-1 susceptibility, but the association has been inconsistent. CCL3L1 shares homology with a cluster of genes localized to chromosome 17q12, namely CCL3, CCL3L2, and, CCL3L3. These genes are involved in host defense and inflammatory processes. Several CNV assays have been developed for the CCL3L1 gene. Findings Through pairwise and multiple alignments of these genes, we have shown that the homology between these genes ranges from 50% to 99% in complete gene sequences and from 70-100% in the exonic regions, with CCL3L1 and CCL3L3 being identical. By use of MEGA 4 and BioEdit, we aligned sense primers, anti-sense primers, and probes used in several previously described assays against pre-multiple alignments of all four chemokine genes. Each set of probes and primers aligned and matched with overlapping sequences in at least two of the four genes, indicating that previously utilized RT-PCR based CNV assays are not specific for only CCL3L1. The four available assays measured median copies of 2 and 3-4 in European and African American, respectively. The concordance between the assays ranged from 0.44-0.83 suggesting individual discordant calls and inconsistencies with the assays from the expected gene coverage from the known sequence. Conclusions This indicates that some of the inconsistencies in the association studies could be due to assays that provide heterogenous results. Sequence information to determine CNV of the three genes separately would allow to test whether their association with the pathogenesis of a human disease or phenotype is affected by an individual gene or by a combination of these genes.

  13. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

    Science.gov (United States)

    Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

    2016-08-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  14. Insights into mechanisms of bacterial antigenic variation derived from the complete genome sequence of Anaplasma marginale.

    Science.gov (United States)

    Palmer, Guy H; Futse, James E; Knowles, Donald P; Brayton, Kelly A

    2006-10-01

    Persistence of Anaplasma spp. in the animal reservoir host is required for efficient tick-borne transmission of these pathogens to animals and humans. Using A. marginale infection of its natural reservoir host as a model, persistent infection has been shown to reflect sequential cycles in which antigenic variants emerge, replicate, and are controlled by the immune system. Variation in the immunodominant outer-membrane protein MSP2 is generated by a process of gene conversion, in which unique hypervariable region sequences (HVRs) located in pseudogenes are recombined into a single operon-linked msp2 expression site. Although organisms expressing whole HVRs derived from pseudogenes emerge early in infection, long-term persistent infection is dependent on the generation of complex mosaics in which segments from different HVRs recombine into the expression site. The resulting combinatorial diversity generates the number of variants both predicted and shown to emerge during persistence.

  15. An Exome Sequencing Study to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis.

    Science.gov (United States)

    Petrovski, Slavé; Todd, Jamie L; Durheim, Michael T; Wang, Quanli; Chien, Jason W; Kelly, Fran L; Frankel, Courtney; Mebane, Caroline M; Ren, Zhong; Bridgers, Joshua; Urban, Thomas J; Malone, Colin D; Finlen Copeland, Ashley; Brinkley, Christie; Allen, Andrew S; O'Riordan, Thomas; McHutchison, John G; Palmer, Scott M; Goldstein, David B

    2017-07-01

    Idiopathic pulmonary fibrosis (IPF) is an increasingly recognized, often fatal lung disease of unknown etiology. The aim of this study was to use whole-exome sequencing to improve understanding of the genetic architecture of pulmonary fibrosis. We performed a case-control exome-wide collapsing analysis including 262 unrelated individuals with pulmonary fibrosis clinically classified as IPF according to American Thoracic Society/European Respiratory Society/Japanese Respiratory Society/Latin American Thoracic Association guidelines (81.3%), usual interstitial pneumonia secondary to autoimmune conditions (11.5%), or fibrosing nonspecific interstitial pneumonia (7.2%). The majority (87%) of case subjects reported no family history of pulmonary fibrosis. We searched 18,668 protein-coding genes for an excess of rare deleterious genetic variation using whole-exome sequence data from 262 case subjects with pulmonary fibrosis and 4,141 control subjects drawn from among a set of individuals of European ancestry. Comparing genetic variation across 18,668 protein-coding genes, we found a study-wide significant (P RTEL1, and PARN. A model qualifying ultrarare, deleterious, nonsynonymous variants implicated TERT and RTEL1, and a model specifically qualifying loss-of-function variants implicated RTEL1 and PARN. A subanalysis of 186 case subjects with sporadic IPF confirmed TERT, RTEL1, and PARN as study-wide significant contributors to sporadic IPF. Collectively, 11.3% of case subjects with sporadic IPF carried a qualifying variant in one of these three genes compared with the 0.3% carrier rate observed among control subjects (odds ratio, 47.7; 95% confidence interval, 21.5-111.6; P = 5.5 × 10 -22 ). We identified TERT, RTEL1, and PARN-three telomere-related genes previously implicated in familial pulmonary fibrosis-as significant contributors to sporadic IPF. These results support the idea that telomere dysfunction is involved in IPF pathogenesis.

  16. Genetic variation and DNA fingerprinting of durian types in Malaysia using simple sequence repeat (SSR) markers.

    Science.gov (United States)

    Siew, Ging Yang; Ng, Wei Lun; Tan, Sheau Wei; Alitheen, Noorjahan Banu; Tan, Soon Guan; Yeap, Swee Keong

    2018-01-01

    Durian ( Durio zibethinus ) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, H E  = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10 -3 . Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called "clones", "varieties", or "cultivars". Such matters have a direct impact on the regulation and management of durian genetic resources in the region.

  17. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    Science.gov (United States)

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  18. Characterization and compilation of polymorphic simple sequence repeat (SSR markers of peanut from public database

    Directory of Open Access Journals (Sweden)

    Zhao Yongli

    2012-07-01

    Full Text Available Abstract Background There are several reports describing thousands of SSR markers in the peanut (Arachis hypogaea L. genome. There is a need to integrate various research reports of peanut DNA polymorphism into a single platform. Further, because of lack of uniformity in the labeling of these markers across the publications, there is some confusion on the identities of many markers. We describe below an effort to develop a central comprehensive database of polymorphic SSR markers in peanut. Findings We compiled 1,343 SSR markers as detecting polymorphism (14.5% within a total of 9,274 markers. Amongst all polymorphic SSRs examined, we found that AG motif (36.5% was the most abundant followed by AAG (12.1%, AAT (10.9%, and AT (10.3%.The mean length of SSR repeats in dinucleotide SSRs was significantly longer than that in trinucleotide SSRs. Dinucleotide SSRs showed higher polymorphism frequency for genomic SSRs when compared to trinucleotide SSRs, while for EST-SSRs, the frequency of polymorphic SSRs was higher in trinucleotide SSRs than in dinucleotide SSRs. The correlation of the length of SSR and the frequency of polymorphism revealed that the frequency of polymorphism was decreased as motif repeat number increased. Conclusions The assembled polymorphic SSRs would enhance the density of the existing genetic maps of peanut, which could also be a useful source of DNA markers suitable for high-throughput QTL mapping and marker-assisted selection in peanut improvement and thus would be of value to breeders.

  19. Effect of laying sequence on egg mercury in captive zebra finches: an interpretation considering individual variation.

    Science.gov (United States)

    Ou, Langbo; Varian-Ramos, Claire W; Cristol, Daniel A

    2015-08-01

    Bird eggs are used widely as noninvasive bioindicators for environmental mercury availability. Previous studies, however, have found varying relationships between laying sequence and egg mercury concentrations. Some studies have reported that the mercury concentration was higher in first-laid eggs or declined across the laying sequence, whereas in other studies mercury concentration was not related to egg order. Approximately 300 eggs (61 clutches) were collected from captive zebra finches dosed throughout their reproductive lives with methylmercury (0.3 μg/g, 0.6 μg/g, 1.2 μg/g, or 2.4 μg/g wet wt in diet); the total mercury concentration (mean ± standard deviation [SD] dry wt basis) of their eggs was 7.03 ± 1.38 μg/g, 14.15 ± 2.52 μg/g, 26.85 ± 5.85 μg/g, and 49.76 ± 10.37 μg/g, respectively (equivalent to fresh wt egg mercury concentrations of 1.24 μg/g, 2.50 μg/g, 4.74 μg/g, and 8.79 μg/g). The authors observed a significant decrease in the mercury concentration of successive eggs when compared with the first egg and notable variation between clutches within treatments. The mercury level of individual females within and among treatments did not alter this relationship. Based on the results, sampling of a single egg in each clutch from any position in the laying sequence is sufficient for purposes of population risk assessment, but it is not recommended as a proxy for individual female exposure or as an estimate of average mercury level within the clutch. © 2015 SETAC.

  20. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    Science.gov (United States)

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.

  1. Constitutional sequence variation in the Fanconi anaemia group C (FANCC) gene in childhood acute myeloid leukaemia.

    Science.gov (United States)

    Barber, Lisa M; McGrath, Helen E N; Meyer, Stefan; Will, Andrew M; Birch, Jillian M; Eden, Osborn B; Taylor, G Malcolm

    2003-04-01

    The extent to which genetic susceptibility contributes to the causation of childhood acute myeloid leukaemia (AML) is not known. The inherited bone marrow failure disorder Fanconi anaemia (FA) carries a substantially increased risk of AML, raising the possibility that constitutional variation in the FA (FANC) genes is involved in the aetiology of childhood AML. We have screened genomic DNA extracted from remission blood samples of 97 children with sporadic AML and 91 children with sporadic acute lymphoblastic leukaemia (ALL), together with 104 cord blood DNA samples from newborn children, for variations in the Fanconi anaemia group C (FANCC) gene. We found no evidence of known FANCC pathogenic mutations in children with AML, ALL or in the cord blood samples. However, we detected 12 different FANCC sequence variants, of which five were novel to this study. Among six FANCC variants leading to amino-acid substitutions, one (S26F) was present at a fourfold greater frequency in children with AML than in the cord blood samples (odds ratio: 4.09, P = 0.047; 95% confidence interval 1.08-15.54). Our results thus do not exclude the possibility that this polymorphic variant contributes to the risk of a small proportion of childhood AML.

  2. Understanding gene sequence variation in the context of transcription regulation in yeast.

    Directory of Open Access Journals (Sweden)

    Irit Gat-Viks

    2010-01-01

    Full Text Available DNA sequence polymorphism in a regulatory protein can have a widespread transcriptional effect. Here we present a computational approach for analyzing modules of genes with a common regulation that are affected by specific DNA polymorphisms. We identify such regulatory-linkage modules by integrating genotypic and expression data for individuals in a segregating population with complementary expression data of strains mutated in a variety of regulatory proteins. Our procedure searches simultaneously for groups of co-expressed genes, for their common underlying linkage interval, and for their shared regulatory proteins. We applied the method to a cross between laboratory and wild strains of S. cerevisiae, demonstrating its ability to correctly suggest modules and to outperform extant approaches. Our results suggest that middle sporulation genes are under the control of polymorphism in the sporulation-specific tertiary complex Sum1p/Rfm1p/Hst1p. In another example, our analysis reveals novel inter-relations between Swi3 and two mitochondrial inner membrane proteins underlying variation in a module of aerobic cellular respiration genes. Overall, our findings demonstrate that this approach provides a useful framework for the systematic mapping of quantitative trait loci and their role in gene expression variation.

  3. Novel sequence variations in LAMA2 and SGCG genes modulating cis-acting regulatory elements and RNA secondary structure

    Directory of Open Access Journals (Sweden)

    Olfa Siala

    2010-01-01

    Full Text Available In this study, we detected new sequence variations in LAMA2 and SGCG genes in 5 ethnic populations, and analysed their effect on enhancer composition and mRNA structure. PCR amplification and DNA sequencing were performed and followed by bioinformatics analyses using ESEfinder as well as MFOLD software. We found 3 novel sequence variations in the LAMA2 (c.3174+22_23insAT and c.6085 +12delA and SGCG (c.*102A/C genes. These variations were present in 210 tested healthy controls from Tunisian, Moroccan, Algerian, Lebanese and French populations suggesting that they represent novel polymorphisms within LAMA2 and SGCG genes sequences. ESEfinder showed that the c.*102A/C substitution created a new exon splicing enhancer in the 3'UTR of SGCG genes, whereas the c.6085 +12delA deletion was situated in the base pairing region between LAMA2 mRNA and the U1snRNA spliceosomal components. The RNA structure analyses showed that both variations modulated RNA secondary structure. Our results are suggestive of correlations between mRNA folding and the recruitment of spliceosomal components mediating splicing, including SR proteins. The contribution of common sequence variations to mRNA structural and functional diversity will contribute to a better study of gene expression.

  4. The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences

    Science.gov (United States)

    Portales-Casamar, Elodie; Arenillas, David; Lim, Jonathan; Swanson, Magdalena I.; Jiang, Steven; McCallum, Anthony; Kirov, Stefan; Wasserman, Wyeth W.

    2009-01-01

    The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein–DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data ‘boutiques’ within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk. PMID:18971253

  5. Spatial and Temporal Stress Drop Variations of the 2011 Tohoku Earthquake Sequence

    Science.gov (United States)

    Miyake, H.

    2013-12-01

    The 2011 Tohoku earthquake sequence consists of foreshocks, mainshock, aftershocks, and repeating earthquakes. To quantify spatial and temporal stress drop variations is important for understanding M9-class megathrust earthquakes. Variability and spatial and temporal pattern of stress drop is a basic information for rupture dynamics as well as useful to source modeling. As pointed in the ground motion prediction equations by Campbell and Bozorgnia [2008, Earthquake Spectra], mainshock-aftershock pairs often provide significant decrease of stress drop. We here focus strong motion records before and after the Tohoku earthquake, and analyze source spectral ratios considering azimuth- and distance dependency [Miyake et al., 2001, GRL]. Due to the limitation of station locations on land, spatial and temporal stress drop variations are estimated by adjusting shifts from the omega-squared source spectral model. The adjustment is based on the stochastic Green's function simulations of source spectra considering azimuth- and distance dependency. We assumed the same Green's functions for event pairs for each station, both the propagation path and site amplification effects are cancelled out. Precise studies of spatial and temporal stress drop variations have been performed [e.g., Allmann and Shearer, 2007, JGR], this study targets the relations between stress drop vs. progression of slow slip prior to the Tohoku earthquake by Kato et al. [2012, Science] and plate structures. Acknowledgement: This study is partly supported by ERI Joint Research (2013-B-05). We used the JMA unified earthquake catalogue and K-NET, KiK-net, and F-net data provided by NIED.

  6. Functional role of bacteriophage transfer RNAs: codon usage analysis of genomic sequences stored in the GENBANK/EMBL/DDBJ databases

    Directory of Open Access Journals (Sweden)

    T Kunisawa

    2006-01-01

    Full Text Available Complete genomic sequence data are stored in the public GenBank/EMBL/DDBJ databases so that any investigator can make use of the data. This report describes a comparative analysis of codon usage that is impossible without such a public and open data system. A limited number of bacteriophages harbor their own transfer RNAs. Based on a comparison between T4 phage-encoded tRNA species and the relative cellular amounts of host Escherichia coli tRNAs, it is hypothesized that T4 tRNAs could serve to supplement host isoacceptor tRNA species that are present in minor amounts and thus enhance the translational efficiency of phage proteins. When compared to their respective host bacteria, the codon usage data of bacteriophages D3, φC31, HP1, D29 and 933W all show an increased frequency of synonymous codons or amino acids that correspond to phage tRNA species, suggesting their supplemental role in the efficient production of phage proteins. The data-analysis presents an example in which the availability of an open and fully accessible database system would allow one to obtain comprehensive insights into a fundamental problem in molecular biology.

  7. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life.

    Science.gov (United States)

    Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S

    2018-01-01

    Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA

  8. MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard A.; Panyala, Ajay R.; Glass, Kevin A.; Colby, Sean M.; Glaesemann, Kurt R.; Jansson, Georg C.; Jansson, Janet K.

    2017-02-21

    MerCat is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. MerCat inputs include assembled contigs and raw sequence reads from any platform resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST and/or DIAMOND for compositional analysis of whole community shotgun sequencing (e.g. metagenomes and metatranscriptomes).

  9. Draft genome sequence of an elite Dura palm and whole-genome patterns of DNA variation in oil palm.

    Science.gov (United States)

    Jin, Jingjing; Lee, May; Bai, Bin; Sun, Yanwei; Qu, Jing; Rahmadsyah; Alfiko, Yuzer; Lim, Chin Huat; Suwanto, Antonius; Sugiharti, Maria; Wong, Limsoon; Ye, Jian; Chua, Nam-Hai; Yue, Gen Hua

    2016-12-01

    Oil palm is the world's leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  10. Sequence variation in mitochondrial cox1 and nad1 genes of ascaridoid nematodes in cats and dogs from Iran.

    Science.gov (United States)

    Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B

    2015-07-01

    The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.

  11. PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

    Science.gov (United States)

    Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

    2016-01-01

    Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.

  12. Population clustering based on copy number variations detected from next generation sequencing data.

    Science.gov (United States)

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2014-08-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

  13. Unexpected tolerance of alpha-cleavage of the prion protein to sequence variations.

    Directory of Open Access Journals (Sweden)

    José B Oliveira-Martins

    Full Text Available The cellular form of the prion protein, PrP(C, undergoes extensive proteolysis at the alpha site (109K [see text]H110. Expression of non-cleavable PrP(C mutants in transgenic mice correlates with neurotoxicity, suggesting that alpha-cleavage is important for PrP(C physiology. To gain insights into the mechanisms of alpha-cleavage, we generated a library of PrP(C mutants with mutations in the region neighbouring the alpha-cleavage site. The prevalence of C1, the carboxy adduct of alpha-cleavage, was determined for each mutant. In cell lines of disparate origin, C1 prevalence was unaffected by variations in charge and hydrophobicity of the region neighbouring the alpha-cleavage site, and by substitutions of the residues in the palindrome that flanks this site. Instead, alpha-cleavage was size-dependently impaired by deletions within the domain 106-119. Almost no cleavage was observed upon full deletion of this domain. These results suggest that alpha-cleavage is executed by an alpha-PrPase whose activity, despite surprisingly limited sequence specificity, is dependent on the size of the central region of PrP(C.

  14. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    Science.gov (United States)

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  15. MPID-T2: a database for sequence-structure-function analyses of pMHC and TR/pMHC structures.

    Science.gov (United States)

    Khan, Javed Mohammed; Cheruku, Harish Reddy; Tong, Joo Chuan; Ranganathan, Shoba

    2011-04-15

    Sequence-structure-function information is critical in understanding the mechanism of pMHC and TR/pMHC binding and recognition. A database for sequence-structure-function information on pMHC and TR/pMHC interactions, MHC-Peptide Interaction Database-TR version 2 (MPID-T2), is now available augmented with the latest PDB and IMGT/3Dstructure-DB data, advanced features and new parameters for the analysis of pMHC and TR/pMHC structures. http://biolinfo.org/mpid-t2. shoba.ranganathan@mq.edu.au Supplementary data are available at Bioinformatics online.

  16. The master two-dimensional gel database of human AMA cell proteins: towards linking protein and genome sequence and mapping information (update 1991)

    DEFF Research Database (Denmark)

    Celis, J E; Leffers, H; Rasmussen, H H

    1991-01-01

    autoantigens" and "cDNAs". For convenience we have included an alphabetical list of all known proteins recorded in this database. In the long run, the main goal of this database is to link protein and DNA sequencing and mapping information (Human Genome Program) and to provide an integrated picture......The master two-dimensional gel database of human AMA cells currently lists 3801 cellular and secreted proteins, of which 371 cellular polypeptides (306 IEF; 65 NEPHGE) were added to the master images during the last 10 months. These include: (i) very basic and acidic proteins that do not focus...

  17. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    Science.gov (United States)

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.

  18. Sequence-length variation of mtDNA HVS-I C-stretch in Chinese ethnic groups.

    Science.gov (United States)

    Chen, Feng; Dang, Yong-hui; Yan, Chun-xia; Liu, Yan-ling; Deng, Ya-jun; Fulton, David J R; Chen, Teng

    2009-10-01

    The purpose of this study was to investigate mitochondrial DNA (mtDNA) hypervariable segment-I (HVS-I) C-stretch variations and explore the significance of these variations in forensic and population genetics studies. The C-stretch sequence variation was studied in 919 unrelated individuals from 8 Chinese ethnic groups using both direct and clone sequencing approaches. Thirty eight C-stretch haplotypes were identified, and some novel and population specific haplotypes were also detected. The C-stretch genetic diversity (GD) values were relatively high, and probability (P) values were low. Additionally, C-stretch length heteroplasmy was observed in approximately 9% of individuals studied. There was a significant correlation (r=-0.961, Ppopulations. The results from the Fst and dA genetic distance matrix, neighbor-joining tree, and principal component map also suggest that C-stretch could be used as a reliable genetic marker in population genetics.

  19. Leishmania-specific surface antigens show sub-genus sequence variation and immune recognition.

    Directory of Open Access Journals (Sweden)

    Daniel P Depledge

    2010-09-01

    Full Text Available A family of hydrophilic acylated surface (HASP proteins, containing extensive and variant amino acid repeats, is expressed at the plasma membrane in infective extracellular (metacyclic and intracellular (amastigote stages of Old World Leishmania species. While HASPs are antigenic in the host and can induce protective immune responses, the biological functions of these Leishmania-specific proteins remain unresolved. Previous genome analysis has suggested that parasites of the sub-genus Leishmania (Viannia have lost HASP genes from their genomes.We have used molecular and cellular methods to analyse HASP expression in New World Leishmania mexicana complex species and show that, unlike in L. major, these proteins are expressed predominantly following differentiation into amastigotes within macrophages. Further genome analysis has revealed that the L. (Viannia species, L. (V. braziliensis, does express HASP-like proteins of low amino acid similarity but with similar biochemical characteristics, from genes present on a region of chromosome 23 that is syntenic with the HASP/SHERP locus in Old World Leishmania species and the L. (L. mexicana complex. A related gene is also present in Leptomonas seymouri and this may represent the ancestral copy of these Leishmania-genus specific sequences. The L. braziliensis HASP-like proteins (named the orthologous (o HASPs are predominantly expressed on the plasma membrane in amastigotes and are recognised by immune sera taken from 4 out of 6 leishmaniasis patients tested in an endemic region of Brazil. Analysis of the repetitive domains of the oHASPs has shown considerable genetic variation in parasite isolates taken from the same patients, suggesting that antigenic change may play a role in immune recognition of this protein family.These findings confirm that antigenic hydrophilic acylated proteins are expressed from genes in the same chromosomal region in species across the genus Leishmania. These proteins are

  20. AcEST(EST sequences of Adiantum capillus-veneris and their annotation) - AcEST | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us AcEST AcEST(EST sequences of Adiantum capillus-veneris and their annotation) Data detail Dat...a name AcEST(EST sequences of Adiantum capillus-veneris and their annotation) DOI 10.18908/lsdba.nbdc00839-0...01 Description of data contents EST sequence of Adiantum capillus-veneris and its annotation (clone ID, libr...le search URL http://togodb.biosciencedbc.jp/togodb/view/archive_acest#en Data acquisition method Capillary ...ainst UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases) Number of data entries Adiantum capillus-veneris

  1. The use of high-throughput DNA sequencing in the investigation of antigenic variation: application to Neisseria species.

    Directory of Open Access Journals (Sweden)

    John K Davies

    Full Text Available Antigenic variation occurs in a broad range of species. This process resembles gene conversion in that variant DNA is unidirectionally transferred from partial gene copies (or silent loci into an expression locus. Previous studies of antigenic variation have involved the amplification and sequencing of individual genes from hundreds of colonies. Using the pilE gene from Neisseria gonorrhoeae we have demonstrated that it is possible to use PCR amplification, followed by high-throughput DNA sequencing and a novel assembly process, to detect individual antigenic variation events. The ability to detect these events was much greater than has previously been possible. In N. gonorrhoeae most silent loci contain multiple partial gene copies. Here we show that there is a bias towards using the copy at the 3' end of the silent loci (copy 1 as the donor sequence. The pilE gene of N. gonorrhoeae and some strains of Neisseria meningitidis encode class I pilin, but strains of N. meningitidis from clonal complexes 8 and 11 encode a class II pilin. We have confirmed that the class II pili of meningococcal strain FAM18 (clonal complex 11 are non-variable, and this is also true for the class II pili of strain NMB from clonal complex 8. In addition when a gene encoding class I pilin was moved into the meningococcal strain NMB background there was no evidence of antigenic variation. Finally we investigated several members of the opa gene family of N. gonorrhoeae, where it has been suggested that limited variation occurs. Variation was detected in the opaK gene that is located close to pilE, but not at the opaJ gene located elsewhere on the genome. The approach described here promises to dramatically improve studies of the extent and nature of antigenic variation systems in a variety of species.

  2. Understanding human genetic variation in the era of high-throughput sequencing

    OpenAIRE

    Knight, Julian C.

    2010-01-01

    The EMBO/EMBL symposium ‘Human Variation: Cause and Consequence' highlighted advances in understanding the molecular basis of human genetic variation and its myriad implications for biology, human origins and disease.

  3. Genetic variation among the Mapuche Indians from the Patagonian region of Argentina: mitochondrial DNA sequence variation and allele frequencies of several nuclear genes.

    Science.gov (United States)

    Ginther, C; Corach, D; Penacino, G A; Rey, J A; Carnese, F R; Hutz, M H; Anderson, A; Just, J; Salzano, F M; King, M C

    1993-01-01

    DNA samples from 60 Mapuche Indians, representing 39 maternal lineages, were genetically characterized for (1) nucleotide sequences of the mtDNA control region; (2) presence or absence of a nine base duplication in mtDNA region V; (3) HLA loci DRB1 and DQA1; (4) variation at three nuclear genes with short tandem repeats; and (5) variation at the polymorphic marker D2S44. The genetic profile of the Mapuche population was compared to other Amerinds and to worldwide populations. Two highly polymorphic portions of the mtDNA control region, comprising 650 nucleotides, were amplified by the polymerase chain reaction (PCR) and directly sequenced. The 39 maternal lineages were defined by two or three generation families identified by the Mapuches. These 39 lineages included 19 different mtDNA sequences that could be grouped into four classes. The same classes of sequences appear in other Amerinds from North, Central, and South American populations separated by thousands of miles, suggesting that the origin of the mtDNA patterns predates the migration to the Americas. The mtDNA sequence similarity between Amerind populations suggests that the migration throughout the Americas occurred rapidly relative to the mtDNA mutation rate. HLA DRB1 alleles 1602 and 1402 were frequent among the Mapuches. These alleles also occur at high frequency among other Amerinds in North and South America, but not among Spanish, Chinese or African-American populations. The high frequency of these alleles throughout the Americas, and their specificity to the Americas, supports the hypothesis that Mapuches and other Amerind groups are closely related.(ABSTRACT TRUNCATED AT 250 WORDS)

  4. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    DEFF Research Database (Denmark)

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs...... to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads...

  5. Overview of errors in the reference sequence and annotation of Mycobacterium tuberculosis H37Rv, and variation amongst its isolates

    KAUST Repository

    Köser, Claudio U.

    2012-06-01

    Since its publication in 1998, the genome sequence of the Mycobacterium tuberculosis H37Rv laboratory strain has acted as the cornerstone for the study of tuberculosis. In this review we address some of the practical aspects that have come to light relating to the use of H37Rv throughout the past decade which are of relevance for the ongoing genomic and laboratory studies of this pathogen. These include errors in the genome reference sequence and its annotation, as well as the recently detected variation amongst isolates of H37Rv from different laboratories. © 2011 Elsevier B.V..

  6. Sequence variation in the alpha-toxin encoding plc gene of Clostridium perfringens strains isolated from diseased and healthy chickens

    DEFF Research Database (Denmark)

    Abildgaard, L; Engberg, RM; Pedersen, Karl

    2009-01-01

    The aim of the present study was to analyse the genetic diversity of the alpha-toxin encoding plc gene and the variation in a-toxin production of Clostridium perfringens type A strains isolated from presumably healthy chickens and chickens suffering from either necrotic enteritis (NE) or cholangio......-hepatitis. The a-toxin encoding plc genes from 60 different pulsed-field gel electrophoresis (PFGE) types (strains) of C perfringens were sequenced and translated in silico to amino acid sequences and the a-toxin production was investigated in batch cultures of 45 of the strains using an enzyme...

  7. A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

    Science.gov (United States)

    Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

    2017-06-23

    The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass

  8. Detailed analysis of sequence changes occurring during vlsE antigenic variation in the mouse model of Borrelia burgdorferi infection.

    Directory of Open Access Journals (Sweden)

    Loïc Coutte

    2009-02-01

    Full Text Available Lyme disease Borrelia can infect humans and animals for months to years, despite the presence of an active host immune response. The vls antigenic variation system, which expresses the surface-exposed lipoprotein VlsE, plays a major role in B. burgdorferi immune evasion. Gene conversion between vls silent cassettes and the vlsE expression site occurs at high frequency during mammalian infection, resulting in sequence variation in the VlsE product. In this study, we examined vlsE sequence variation in B. burgdorferi B31 during mouse infection by analyzing 1,399 clones isolated from bladder, heart, joint, ear, and skin tissues of mice infected for 4 to 365 days. The median number of codon changes increased progressively in C3H/HeN mice from 4 to 28 days post infection, and no clones retained the parental vlsE sequence at 28 days. In contrast, the decrease in the number of clones with the parental vlsE sequence and the increase in the number of sequence changes occurred more gradually in severe combined immunodeficiency (SCID mice. Clones containing a stop codon were isolated, indicating that continuous expression of full-length VlsE is not required for survival in vivo; also, these clones continued to undergo vlsE recombination. Analysis of clones with apparent single recombination events indicated that recombinations into vlsE are nonselective with regard to the silent cassette utilized, as well as the length and location of the recombination event. Sequence changes as small as one base pair were common. Fifteen percent of recovered vlsE variants contained "template-independent" sequence changes, which clustered in the variable regions of vlsE. We hypothesize that the increased frequency and complexity of vlsE sequence changes observed in clones recovered from immunocompetent mice (as compared with SCID mice is due to rapid clearance of relatively invariant clones by variable region-specific anti-VlsE antibody responses.

  9. Complete plastid genome sequence of Primula sinensis (Primulaceae: structure comparison, sequence variation and evidence for accD transfer to nucleus

    Directory of Open Access Journals (Sweden)

    Tong-Jian Liu

    2016-06-01

    Full Text Available Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp were separated by a large single-copy region (82,064 bp and a small single-copy region (17,725 bp. The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36–rps8, rps16–trnQ, trnH–psbA and ndhC–trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis.

  10. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    Science.gov (United States)

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  11. FishPathogens.eu/vhsv: A user-friendly Viral Haemorrhagic Septicaemia Virus (VHSV) isolate and sequence database

    DEFF Research Database (Denmark)

    Jonstrup, Søren Peter; Gray, Tanya; Kahns, Søren

    A database has been created, www.FishPathogens.eu, with the aim of providing a single repository for collating important information on significant pathogens of aquaculture, relevant to their control and management. This database will be developed, maintained and managed as part of the European...

  12. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  13. Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus.

    Science.gov (United States)

    Lim, M Y; Fry, K; Yun, A; Chong, S; Linnen, J; Fung, K; Kim, J P

    1997-11-01

    A transfusion-transmissible agent provisionally designated hepatitis G virus (HGV) was recently identified. In this study, we examined the variability of the HGV genome by analysing sequences in the putative envelope region from 72 isolates obtained from diverse geographical sources. The 1561 nucleotide sequence of the E1/E2/NS2a region of HGV was determined from 12 isolates, and compared with three published sequences. The most variability was observed in 400 nucleotides at the N terminus of E2. We next analysed this 400 nucleotide envelope variable region (EV) from an additional 60 HGV isolates. This sequence varied considerably among the 75 isolates, with overall identity ranging from 79.3% to 99.5% at the nucleotide level, and from 83.5% to 100% at the amino acid level. However, hypervariable regions were not identified. Phylogenetic analyses indicated that the 75 HGV isolates belong to a single genotype. A single-tier distribution of evolutionary distances was observed among the 15 E1/E2/NS2a sequences and the 75 EV sequences. In contrast, 11 isolates of HCV were analysed and showed a three-tiered distribution, representing genotypes, subtypes, and isolates. The 75 isolates of HGV fell into four clusters on the phylogenetic tree. Tight geographical clustering was observed among the HGV isolates from Japan and Korea.

  14. BLAZAR ANTI-SEQUENCE OF SPECTRAL VARIATION WITHIN INDIVIDUAL BLAZARS: CASES FOR MRK 501 AND 3C 279

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Jin; Zhang, Shuang-Nan; Liang, En-Wei, E-mail: zhang.jin@hotmail.com [National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012 (China)

    2013-04-10

    The jet properties of Mrk 501 and 3C 279 are derived by fitting broadband spectral energy distributions (SEDs) with lepton models. The derived {gamma}{sub b} (the break Lorenz factor of the electron distribution) is 10{sup 4}-10{sup 6} for Mrk 501 and 200 {approx} 600 for 3C 279. The magnetic field strength (B) of Mrk 501 is usually one order of magnitude lower than that of 3C 279, but their Doppler factors ({delta}) are comparable. A spectral variation feature where the peak luminosity is correlated with the peak frequency, which is opposite from the blazar sequence, is observed in the two sources. We find that (1) the peak luminosities of the two bumps in the SEDs for Mrk 501 depend on {gamma}{sub b} in both the observer and co-moving frames, but they are not correlated with B and {delta} and (2) the luminosity variation of 3C 279 is dominated by the external Compton (EC) peak and its peak luminosity is correlated with {gamma}{sub b} and {delta}, but anti-correlated with B. These results suggest that {gamma}{sub b} may govern the spectral variation of Mrk 501 and {delta} and B would be responsible for the spectral variation of 3C 279. The narrow distribution of {gamma}{sub b} and the correlation of {gamma}{sub b} and B in 3C 279 would be due to the cooling from the EC process and the strong magnetic field. Based on our brief discussion, we propose that this spectral variation feature may originate from the instability of the corona but not from the variation of the accretion rate as does the blazar sequence.

  15. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    Science.gov (United States)

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in

  16. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing.

    Science.gov (United States)

    Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D

    2013-03-07

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.

  17. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Directory of Open Access Journals (Sweden)

    Sathishkumar Natarajan

    Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  18. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Science.gov (United States)

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  19. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    Science.gov (United States)

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-08

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    Science.gov (United States)

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  1. Thickness of patellofemoral articular cartilage as measured on MR imaging: sequence comparison of accuracy, reproducibility, and interobserver variation

    Energy Technology Data Exchange (ETDEWEB)

    Van Leersum, M.D. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Schweitzer, M.E. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Gannon, F. [Dept. of Pathology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Vinitski, S. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Finkel, G. [Dept. of Pathology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Mitchell, D.G. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States)

    1995-08-01

    This study was undertaken to assess the accuracy, precision, and reliability of magnetic resonance (MR) measurements of articular cartilage. Fifteen cadaveric patellas were imaged in the axial plane at 1.5 T. Gradient echo and fat-suppressed FSE, T2-weighted, proton density, and T1-weighted sequences were performed. We measured each 5-mm section separately at three standardized positions, giving a total of 900 measurements. These findings were correlated with independently performed measurements of the corresponding anatomic sections. A hundred random measurements were also evaluated for reproducibility and interobserver variation. Although all sequences were highly accurate, the T1-weighted images were the most accurate, with a mean difference of 0.25 mm and a correlation coefficient of 0.85. All sequences were also highly reproducible with little inter-observer variation. In an attempt to improve the accuracy of the MR measurements further, we retrospectively evaluated all measurements with discrepancies greater than 1 mm from the specimen. All these differences were attributable to focal defects causing exaggeration of the thickness on MR imaging. (orig.)

  2. Thickness of patellofemoral articular cartilage as measured on MR imaging: sequence comparison of accuracy, reproducibility, and interobserver variation

    International Nuclear Information System (INIS)

    Van Leersum, M.D.; Schweitzer, M.E.; Gannon, F.; Vinitski, S.; Finkel, G.; Mitchell, D.G.

    1995-01-01

    This study was undertaken to assess the accuracy, precision, and reliability of magnetic resonance (MR) measurements of articular cartilage. Fifteen cadaveric patellas were imaged in the axial plane at 1.5 T. Gradient echo and fat-suppressed FSE, T2-weighted, proton density, and T1-weighted sequences were performed. We measured each 5-mm section separately at three standardized positions, giving a total of 900 measurements. These findings were correlated with independently performed measurements of the corresponding anatomic sections. A hundred random measurements were also evaluated for reproducibility and interobserver variation. Although all sequences were highly accurate, the T1-weighted images were the most accurate, with a mean difference of 0.25 mm and a correlation coefficient of 0.85. All sequences were also highly reproducible with little inter-observer variation. In an attempt to improve the accuracy of the MR measurements further, we retrospectively evaluated all measurements with discrepancies greater than 1 mm from the specimen. All these differences were attributable to focal defects causing exaggeration of the thickness on MR imaging. (orig.)

  3. Intraspecific variations in Cyt b and D-loop sequences of Testudine species, Lissemys punctata from south Karnataka

    Directory of Open Access Journals (Sweden)

    R. Lalitha

    2018-01-01

    Full Text Available The freshwater Testudine species have gained importance in recent years, as most of their population is threatened due to exploitation for delicacy and pet trade. In this regard, Lissemys punctata, a freshwater terrapin, predominantly distributed in Asian countries has gained its significance for the study. A pilot study report on mitochondrial markers (Cyt b and D-loop conducted on L. punctata species from southern Karnataka, India was presented in this investigation. A complete region spanning 1.14 kb and ∼1 kb was amplified by HotStart PCR and sequenced by Sanger sequencing. The Cyt b sequence revealed 85 substitution sites, no indels and 17 parsimony informative sites, whereas D-loop showed 189 variable sites, 51 parsimony informative sites with 5′ functional domains TAS, CSB-F, CSBs (1, 2, 3 preceding tandem repeat at 3′ end. Current data highlights the intraspecific variations in these target regions and variations validated using suitable evolutionary models points out that the overall point mutations observed in the region are transitions leading to no structural and functional alterations. The mitochondrial data generated uncover the genetic diversity within species and conservationist can utilize the data to estimate the effective population size or for forensic identification of animal or its seizures during unlawful trade activities.

  4. Sequence diversity and copy number variation of Mutator-like transposases in wheat

    Directory of Open Access Journals (Sweden)

    Nobuaki Asakura

    2008-01-01

    Full Text Available Partial transposase-coding sequences of Mutator-like elements (MULEs were isolated from a wild einkorn wheat, Triticum urartu, by degenerate PCR. The isolated sequences were classified into a MuDR or Class I clade and divided into two distinct subclasses (subclass I and subclass II. The average pair-wise identity between members of both subclasses was 58.8% at the nucleotide sequence level. Sequence diversity of subclass I was larger than that of subclass II. DNA gel blot analysis showed that subclass I was present as low copy number elements in the genomes of all Triticum and Aegilops accessions surveyed, while subclass II was present as high copy number elements. These two subclasses seemed uncapable of recognizing each other for transposition. The number of copies of subclass II elements was much higher in Aegilops with the S, Sl and D genomes and polyploid Triticum species than in diploid Triticum with the A genome, indicating that active transposition occurred in S, Sl and D genomes before polyploidization. DNA gel blot analysis of six species selected from three subfamilies of Poaceae demonstrated that only the tribe Triticeae possessed both subclasses. These results suggest that the differentiation of these two subclasses occurred before or immediately after the establishment of the tribe Triticeae.

  5. Sequencing the CHO DXB11 genome reveals regional variations in genomic stability and haploidy

    DEFF Research Database (Denmark)

    Kaas, Christian Schrøder; Kristensen, Claus; Betenbaugh, Michael J.

    2015-01-01

    Background: The DHFR negative CHO DXB11 cell line (also known as DUX-B11 and DUKX) was historically the first CHO cell line to be used for large scale production of heterologous proteins and is still used for production of a number of complex proteins.  Results: Here we present the genomic sequence...... of the CHO DXB11 genome sequenced to a depth of 33x. Overall a significant genomic drift was seen favoring GC -> AT point mutations in line with the chemical mutagenesis strategy used for generation of the cell line. The sequencing depth for each gene in the genome revealed distinct peaks at sequencing...... in eight additional analyzed CHO genomes (15-20% haploidy) but not in the genome of the Chinese hamster. The dhfr gene is confirmed to be haploid in CHO DXB11; transcriptionally active and the remaining allele contains a G410C point mutation causing a Thr137Arg missense mutation. We find similar to 2...

  6. Striking structural dynamism and nucleotide sequence variation of the transposon Galileo in the genome of Drosophila mojavensis.

    Science.gov (United States)

    Marzo, Mar; Bello, Xabier; Puig, Marta; Maside, Xulio; Ruiz, Alfredo

    2013-02-04

    Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome.

  7. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  8. A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation.

    Directory of Open Access Journals (Sweden)

    Rosemary M McCloskey

    2017-11-01

    Full Text Available Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP, which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85% and specificity (91% than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46% as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where

  9. Population genetic implications from sequence variation in four Y chromosome genes.

    Science.gov (United States)

    Shen, P; Wang, F; Underhill, P A; Franco, C; Yang, W H; Roxas, A; Sung, R; Lin, A A; Hyman, R W; Vollrath, D; Davis, R W; Cavalli-Sforza, L L; Oefner, P J

    2000-06-20

    Some insight into human evolution has been gained from the sequencing of four Y chromosome genes. Primary genomic sequencing determined gene SMCY to be composed of 27 exons that comprise 4,620 bp of coding sequence. The unfinished sequencing of the 5' portion of gene UTY1 was completed by primer walking, and a total of 20 exons were found. By using denaturing HPLC, these two genes, as well as DBY and DFFRY, were screened for polymorphic sites in 53-72 representatives of the five continents. A total of 98 variants were found, yielding nucleotide diversity estimates of 2.45 x 10(-5), 5. 07 x 10(-5), and 8.54 x 10(-5) for the coding regions of SMCY, DFFRY, and UTY1, respectively, with no variant having been observed in DBY. In agreement with most autosomal genes, diversity estimates for the noncoding regions were about 2- to 3-fold higher and ranged from 9. 16 x 10(-5) to 14.2 x 10(-5) for the four genes. Analysis of the frequencies of derived alleles for all four genes showed that they more closely fit the expectation of a Luria-Delbrück distribution than a distribution expected under a constant population size model, providing evidence for exponential population growth. Pairwise nucleotide mismatch distributions date the occurrence of population expansion to approximately 28,000 years ago. This estimate is in accord with the spread of Aurignacian technology and the disappearance of the Neanderthals.

  10. Assessment of genetic variation for the LINE-1 retrotransposon from next generation sequence data

    Directory of Open Access Journals (Sweden)

    Ramos Kenneth

    2010-10-01

    Full Text Available Abstract Background In humans, copies of the Long Interspersed Nuclear Element 1 (LINE-1 retrotransposon comprise 21% of the reference genome, and have been shown to modulate expression and produce novel splice isoforms of transcripts from genes that span or neighbor the LINE-1 insertion site. Results In this work, newly released pilot data from the 1000 Genomes Project is analyzed to detect previously unreported full length insertions of the retrotransposon LINE-1. By direct analysis of the sequence data, we have identified 22 previously unreported LINE-1 insertion sites within the sequence data reported for a mother/father/daughter trio. Conclusions It is demonstrated here that next generation sequencing data, as well as emerging high quality datasets from individual genome projects allow us to assess the amount of heterogeneity with respect to the LINE-1 retrotransposon amongst humans, and provide us with a wealth of testable hypotheses as to the impact that this diversity may have on the health of individuals and populations.

  11. Sedimentology, sequence-stratigraphy, and geochemical variations in the Mesoproterozoic Nonesuch Formation, northern Wisconsin, USA

    Science.gov (United States)

    Kingsbury Stewart, Esther; Mauk, Jeffrey L.

    2017-01-01

    We use core descriptions and portable X-ray fluorescence analyses to identify lithofacies and stratigraphic surfaces for the Mesoproterozoic Nonesuch Formation within the Ashland syncline, Wisconsin. We group lithofacies into facies associations and construct a sequence stratigraphic framework based on lithofacies stacking and stratigraphic surfaces. The fluvial-alluvial facies association (upper Copper Harbor Conglomerate) is overlain across a transgressive surface by the fluctuating-profundal facies association (lower Nonesuch Formation). The fluctuating-profundal facies association comprises a retrogradational sequence set overlain across a maximum flooding surface by an aggradational-progradational sequence set comprising fluctuating-profundal, fluvial-lacustrine, and fluvial-alluvial facies associations (middle Nonesuch through lower Freda Formations). Lithogeochemistry supports sedimentologic and stratigraphic interpretations. Fe/S molar ratios reflect the oxidation state of the lithofacies; values are most depleted above the maximum flooding surface where lithofacies are chemically reduced and are greatest in the chemically oxidized lithofacies. Si/Al and Zr/Al molar ratios reflect the relative abundance of detrital heavy minerals vs. clay minerals; greater values correlate with larger grain size. Vertical facies association stacking records depositional environments that evolved from fluvial and alluvial, to balanced-fill lake, to overfilled lake, and returning to fluvial and alluvial. Elsewhere in the basin, where accommodation was greatest, some volume of fluvial-lacustrine facies is likely present below the transgressive stratigraphic surface. This succession of continental and lake-basin types indicates a predominant tectonic driver of basin evolution. Lithofacies distribution and geochemistry indicate deposition within an asymmetric half-graben bounded on the east by a west-dipping growth fault. While facies assemblages are lacustrine and continental

  12. Rapid Characterization of Insulin Modifications and Sequence Variations by Proteinase K Digestion and UHPLC-ESI-MS

    Science.gov (United States)

    Yang, Rong-Sheng; Tang, Weijuan; Sheng, Huaming; Meng, Fanyu

    2018-01-01

    Discovery of novel insulin analogs as therapeutics has remained an active area of research. Compared with native human insulin, insulin analog molecules normally incorporate either covalent modifications or amino acid sequence variations. From the drug discovery and development perspective, methods for efficient and detailed characterization of these primary structural changes are very important. In this report, we demonstrate that proteinase K digestion coupled with UPLC-ESI-MS analysis provides a simple and rapid approach to characterize the modifications and sequence variations of insulin molecules. A commercially available proteinase K digestion kit was used to process recombinant human insulin (RHI), insulin glargine, and fluorescein isothiocynate-labeled recombinant human insulin (FITC-RHI) samples. The LC-MS data clearly showed that RHI and insulin glargine samples can be differentiated, and the FITC modifications in all three amine sites of the RHI molecule are well characterized. The end-to-end experiment and data interpretation was achieved within 60 min. This approach is fast and simple, and can be easily implemented in early drug discovery laboratories to facilitate research on more advanced insulin therapeutics. [Figure not available: see fulltext.

  13. Rapid Characterization of Insulin Modifications and Sequence Variations by Proteinase K Digestion and UHPLC-ESI-MS

    Science.gov (United States)

    Yang, Rong-Sheng; Tang, Weijuan; Sheng, Huaming; Meng, Fanyu

    2018-05-01

    Discovery of novel insulin analogs as therapeutics has remained an active area of research. Compared with native human insulin, insulin analog molecules normally incorporate either covalent modifications or amino acid sequence variations. From the drug discovery and development perspective, methods for efficient and detailed characterization of these primary structural changes are very important. In this report, we demonstrate that proteinase K digestion coupled with UPLC-ESI-MS analysis provides a simple and rapid approach to characterize the modifications and sequence variations of insulin molecules. A commercially available proteinase K digestion kit was used to process recombinant human insulin (RHI), insulin glargine, and fluorescein isothiocynate-labeled recombinant human insulin (FITC-RHI) samples. The LC-MS data clearly showed that RHI and insulin glargine samples can be differentiated, and the FITC modifications in all three amine sites of the RHI molecule are well characterized. The end-to-end experiment and data interpretation was achieved within 60 min. This approach is fast and simple, and can be easily implemented in early drug discovery laboratories to facilitate research on more advanced insulin therapeutics. [Figure not available: see fulltext.

  14. PPARGC1A sequence variation and cardiovascular risk-factor levels

    DEFF Research Database (Denmark)

    Brito, E C; Vimaleswaran, K S; Brage, S

    2009-01-01

    .005; rs13117172, p = 0.008) and fasting glucose concentrations (rs7657071, p = 0.002). None remained significant after correcting for the number of statistical comparisons. We proceeded by testing for gene x physical activity interactions for the polymorphisms that showed nominal evidence of association...... in the main effect models. None of these tests was statistically significant. CONCLUSIONS/INTERPRETATION: Variants at PPARGC1A may influence several metabolic traits in this European paediatric cohort. However, variation at PPARGC1A is unlikely to have a major impact on cardiovascular or metabolic health...

  15. Next-Generation Sequencing-Based Detection of Germline Copy Number Variations in BRCA1/BRCA2

    DEFF Research Database (Denmark)

    Schmidt, Ane Y; Hansen, Thomas V O; Ahlborn, Lise B

    2017-01-01

    Genetic testing of BRCA1/2 includes screening for single nucleotide variants and small insertions/deletions and for larger copy number variations (CNVs), primarily by Sanger sequencing and multiplex ligation-dependent probe amplification (MLPA). With the advent of next-generation sequencing (NGS)...

  16. Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing.

    Science.gov (United States)

    Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning

    2014-11-07

    Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.

  17. System for face recognition under expression variations of neutral-sampled individuals using recognized expression warping and a virtual expression-face database

    Science.gov (United States)

    Petpairote, Chayanut; Madarasmi, Suthep; Chamnongthai, Kosin

    2018-01-01

    The practical identification of individuals using facial recognition techniques requires the matching of faces with specific expressions to faces from a neutral face database. A method for facial recognition under varied expressions against neutral face samples of individuals via recognition of expression warping and the use of a virtual expression-face database is proposed. In this method, facial expressions are recognized and the input expression faces are classified into facial expression groups. To aid facial recognition, the virtual expression-face database is sorted into average facial-expression shapes and by coarse- and fine-featured facial textures. Wrinkle information is also employed in classification by using a process of masking to adjust input faces to match the expression-face database. We evaluate the performance of the proposed method using the CMU multi-PIE, Cohn-Kanade, and AR expression-face databases, and we find that it provides significantly improved results in terms of face recognition accuracy compared to conventional methods and is acceptable for facial recognition under expression variation.

  18. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases.

    Science.gov (United States)

    Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric

    2014-04-15

    Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

  19. A search for pre-main-sequence stars in high-latitude molecular clouds. 3: A survey of the Einstein database

    Science.gov (United States)

    Caillault, Jean-Pierre; Magnani, Loris; Fryer, Chris

    1995-01-01

    In order to discern whether the high-latitude molecular clouds are regions of ongoing star formation, we have used X-ray emission as a tracer of youthful stars. The entire Einstein database yields 18 images which overlap 10 of the clouds mapped partially or completely in the CO (1-0) transition, providing a total of approximately 6 deg squared of overlap. Five previously unidentified X-ray sources were detected: one has an optical counterpart which is a pre-main-sequence (PMS) star, and two have normal main-sequence stellar counterparts, while the other two are probably extragalactic sources. The PMS star is located in a high Galactic latitude Lynds dark cloud, so this result is not too suprising. The translucent clouds, though, have yet to reveal any evidence of star formation.

  20. Deep Sequencing of 71 Candidate Genes to Characterize Variation Associated with Alcohol Dependence.

    Science.gov (United States)

    Clark, Shaunna L; McClay, Joseph L; Adkins, Daniel E; Kumar, Gaurav; Aberg, Karolina A; Nerella, Srilaxmi; Xie, Linying; Collins, Ann L; Crowley, James J; Quackenbush, Corey R; Hilliard, Christopher E; Shabalin, Andrey A; Vrieze, Scott I; Peterson, Roseann E; Copeland, William E; Silberg, Judy L; McGue, Matt; Maes, Hermine; Iacono, William G; Sullivan, Patrick F; Costello, Elizabeth J; van den Oord, Edwin J

    2017-04-01

    Previous genomewide association studies (GWASs) have identified a number of putative risk loci for alcohol dependence (AD). However, only a few loci have replicated and these replicated variants only explain a small proportion of AD risk. Using an innovative approach, the goal of this study was to generate hypotheses about potentially causal variants for AD that can be explored further through functional studies. We employed targeted capture of 71 candidate loci and flanking regions followed by next-generation deep sequencing (mean coverage 78X) in 806 European Americans. Regions included in our targeted capture library were genes identified through published GWAS of alcohol, all human alcohol and aldehyde dehydrogenases, reward system genes including dopaminergic and opioid receptors, prioritized candidate genes based on previous associations, and genes involved in the absorption, distribution, metabolism, and excretion of drugs. We performed single-locus tests to determine if any single variant was associated with AD symptom count. Sets of variants that overlapped with biologically meaningful annotations were tested for association in aggregate. No single, common variant was significantly associated with AD in our study. We did, however, find evidence for association with several variant sets. Two variant sets were significant at the q-value <0.10 level: a genic enhancer for ADHFE1 (p = 1.47 × 10 -5 ; q = 0.019), an alcohol dehydrogenase, and ADORA1 (p = 5.29 × 10 -5 ; q = 0.035), an adenosine receptor that belongs to a G-protein-coupled receptor gene family. To our knowledge, this is the first sequencing study of AD to examine variants in entire genes, including flanking and regulatory regions. We found that in addition to protein coding variant sets, regulatory variant sets may play a role in AD. From these findings, we have generated initial functional hypotheses about how these sets may influence AD. Copyright © 2017 by the Research Society on

  1. Use of next-generation sequencing to detect LDLR gene copy number variation in familial hypercholesterolemia.

    Science.gov (United States)

    Iacocca, Michael A; Wang, Jian; Dron, Jacqueline S; Robinson, John F; McIntyre, Adam D; Cao, Henian; Hegele, Robert A

    2017-11-01

    Familial hypercholesterolemia (FH) is a heritable condition of severely elevated LDL cholesterol, caused predominantly by autosomal codominant mutations in the LDL receptor gene ( LDLR ). In providing a molecular diagnosis for FH, the current procedure often includes targeted next-generation sequencing (NGS) panels for the detection of small-scale DNA variants, followed by multiplex ligation-dependent probe amplification (MLPA) in LDLR for the detection of whole-exon copy number variants (CNVs). The latter is essential because ∼10% of FH cases are attributed to CNVs in LDLR ; accounting for them decreases false negative findings. Here, we determined the potential of replacing MLPA with bioinformatic analysis applied to NGS data, which uses depth-of-coverage analysis as its principal method to identify whole-exon CNV events. In analysis of 388 FH patient samples, there was 100% concordance in LDLR CNV detection between these two methods: 38 reported CNVs identified by MLPA were also successfully detected by our NGS method, while 350 samples negative for CNVs by MLPA were also negative by NGS. This result suggests that MLPA can be removed from the routine diagnostic screening for FH, significantly reducing associated costs, resources, and analysis time, while promoting more widespread assessment of this important class of mutations across diagnostic laboratories. Copyright © 2017 by the American Society for Biochemistry and Molecular Biology, Inc.

  2. Association Mapping and Nucleotide Sequence Variation in Five Drought Tolerance Candidate Genes in Spring Wheat

    Directory of Open Access Journals (Sweden)

    Erena A. Edae

    2013-07-01

    Full Text Available Functional markers are needed for key genes involved in drought tolerance to improve selection for crop yield under moisture stress conditions. The objectives of this study were to (i characterize five drought tolerance candidate genes, namely dehydration responsive element binding 1A (, enhanced response to abscisic acid ( and , and fructan 1-exohydrolase ( and , in wheat ( L. for nucleotide and haplotype diversity, Tajima’s D value, and linkage disequilibrium (LD and (ii associate within-gene single nucleotide polymorphisms (SNPs with phenotypic traits in a spring wheat association mapping panel ( = 126. Field trials were grown under contrasting moisture regimes in Greeley, CO, and Melkassa, Ethiopia, in 2010 and 2011. Genome-specific amplification and DNA sequence analysis of the genes identified SNPs and revealed differences in nucleotide and haplotype diversity, Tajima’s D, and patterns of LD. showed associations (false discovery rate adjusted probability value = 0.1 with normalized difference vegetation index, heading date, biomass, and spikelet number. Both and were associated with harvest index, flag leaf width, and leaf senescence. was associated with grain yield, and was associated with thousand kernel weight and test weight. If validated in relevant genetic backgrounds, the identified marker–trait associations may be applied to functional marker-assisted selection.

  3. Detecting Flood Variations in Shanghai over 1949–2009 with Mann-Kendall Tests and a Newspaper-Based Database

    Directory of Open Access Journals (Sweden)

    Shiqiang Du

    2015-04-01

    Full Text Available A valuable aid to assessing and managing flood risk lies in a reliable database of historical floods. In this study, a newspaper-based flood database for Shanghai (NFDS for the period 1949–2009 was developed through a systematic scanning of newspapers. After calibration and validation of the database, Mann-Kendall tests and correlation analysis were applied to detect possible changes in flood frequencies. The analysis was carried out for three different flood types: overbank flood, agricultural waterlogging, and urban waterlogging. The compiled NFDS registered 146 floods and 92% of them occurred in the flood-prone season from June to September. The statistical analyses showed that both the annual flood and the floods in June–August increased significantly. Urban waterlogging showed a very strong increasing trend, probably because of insufficient capacity of urban drainage system and impacts of rapid urbanization. By contrast, the decrease in overbank flooding and the slight increase in agricultural waterlogging were likely because of the construction of river levees and seawalls and the upgrade of agricultural drainage systems, respectively. This study demonstrated the usefulness of local newspapers in building a historical flood database and in assessing flood characterization.

  4. Variations of the ISM conditions accross the Main Sequence of star forming galaxies: observations and simulations.

    Science.gov (United States)

    Martinez Galarza, Juan R.; Smith, Howard Alan; Lanz, Lauranne; Hayward, Christopher C.; Zezas, Andreas; Hung, Chao-Ling; Rosenthal, Lee; Weiner, Aaron

    2015-01-01

    A significant amount of evidence has been gathered that leads to the existence of a main sequence (MS) of star formation in galaxies. This MS is expressed in terms of a correlation between the SFR and the stellar mass of the form SFR ∝ M* and spans a few orders of magnitude in both quantities. Several ideas have been suggested to explain fundamental properties of the MS, such as its slope, its dispersion, and its evolution with redshift, but no consensus has been reached regarding its true nature, and whether the membership or not of particular galaxies to this MS underlies the existence of two different modes of star formation. In order to advance in the understanding of the MS, here we use a statistically robust Bayesian SED analysis method (CHIBURST) to consistently analyze the star-forming properties of a set of hydro-dynamical simulations of mergers, as well as observations of real mergers, both local and at intermediate redshift. We find a remarkable, very tight correlation between the specific star formation rate (sSFR) of galaxies, and the typical ISM conditions near their inernal star-forming regions, parametrized via a novel quantity: the compactness parameter (C). The evolution of mergers along this correlation explains the spread of the MS, and implies that the physical conditions of the ISM smoothly evolve between on-MS (secular) conditions and off-MS (coalescence/starburst) conditions. Furthermore, we show that the slope of the correlation can be interpreted in terms of the efficiency in the conversion of gas into stars, and that this efficiency remains unchanged along and across the MS. Finally, we discuss differences in the normalization of the correlation as a function of merger mass and redshift, and conclude that these differences imply the existence of two different modes of star formation, unrelated to the smooth evolution across the MS: a disk-like, low pressure mode and a compact nuclear-starburst mode.

  5. Combined influence of LDLR and HMGCR sequence variation on lipid-lowering response to simvastatin

    Science.gov (United States)

    Mangravite, Lara M.; Medina, Marisa Wong; Cui, Jinrui; Pressman, Sheila; Smith, Joshua D.; Rieder, Mark J.; Guo, Xiuqing; Nickerson, Deborah A.; Rotter, Jerome I.; Krauss, Ronald M.

    2010-01-01

    Objectives Although statins are efficacious for lowering LDL-cholesterol (LDLC), there is wide inter-individual variation in response. We tested the extent to which combined effects of common alleles of LDLR and HMGCR can contribute to this variability. Methods and Results Haplotypes in the LDLR 3′-untranslated region (3UTR) were tested for association with lipid-lowering response to simvastatin treatment in the Cholesterol and Pharmacogenetics (CAP) trial (335 African-Americans and 609European-Americans). LDLR haplotype 5 (L5)was associated with smaller simvastatin-induced reductions in LDLC, total cholesterol, non-HDL cholesterol, and apolipoprotein B (P=0.0002–0.03)in African-Americans, but not European-Americans. The combined presence of L5 and previously described HMGCR haplotypes in African-Americans was associated with significantly attenuated apoB reduction(−22.4±1.5% N=89) both compared to noncarriers (−30.6±1.5% N=78, P=0.0001) and to carriers of either individual haplotype (−28.2±1.1% N=158, P=0.001). We observed similar differences when measuring simvastatin-mediated induction of LDLR surface expression using lymphoblast cell lines (P=0.03). Conclusions We have identified a common LDLR 3UTR haplotype that is associated with attenuated lipid-lowering response to simvastatin treatment. Response was further reduced in individuals with both LDLR and previously described HMGCR haplotypes. Previously identified racial differences in statin efficacy were partially explained by increased prevalence of these combined haplotypes in African-Americans. PMID:20413733

  6. Association of the leucine-7 to proline-7 variation in the signal sequence of neuropeptide Y with major depression

    DEFF Research Database (Denmark)

    Koefoed, Pernille; Woldbye, David P. D.; Hansen, Thomas v. O.

    2012-01-01

    Objective: There is clear evidence of a genetic component in major depression, and several studies indicate that neuropeptide Y (NPY) could play an important role in the pathophysiology of the disease. A well-known polymorphism encoding the substitution of leucine to proline in the signal peptide...... sequence of NPY (Leu7Pro variation) was previously found to protect against depression. Our study aimed at replicating this association in a large Danish population with major depression. Method: Leu7Pro was studied in a sample of depressed patients and ethnically matched controls, as well as psychiatric...... disease controls with schizophrenia. Possible functional consequences of Leu7Pro were explored in vitro. Results: In contrast to previous studies, Pro7 appeared to be a risk allele for depression, being significantly more frequent in the depression sample (5.5 n = 593; p = 0.009; odds ratio, OR: 1...

  7. High-throughput sequencing and copy number variation detection using formalin fixed embedded tissue in metastatic gastric cancer.

    Directory of Open Access Journals (Sweden)

    Seokhwi Kim

    Full Text Available In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%, APC (10.1%, PIK3CA (5.6%, KRAS (4.5%, SMO (3.4%, STK11 (3.4%, CDKN2A (3.4% and SMAD4 (3.4%. Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%, 4 (4.5%, 2 (2.2%, 1 (1.1% and 1 (1.1% cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes.

  8. (reprocessed)HeliscopeCAGE sequencing, Delve mapping and CAGE TSS aggregation - FANTOM5 | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us FANTOM...ntified by CAGE tag analysis (BED format) *.rdna.fa.gz: rDNA sequences (FASTA format) Data file File name: fantom...5_rp_exp_details.zip File URL: ftp://ftp.biosciencedbc.jp/archive/fantom5/20161221/fantom5_rp_exp_detai...tp://ftp.biosciencedbc.jp/archive/fantom5/datafiles/reprocessed/hg38_latest/basic/ File size: 1.4 TB File na...me: (reprocessed)basic (Mus musculus) File URL: ftp://ftp.biosciencedbc.jp/archive/fantom5/datafiles/reproce

  9. Molecular phylogeny of Japanese Rhinolophidae based on variations in the complete sequence of the mitochondrial cytochrome b gene.

    Science.gov (United States)

    Sakai, Takahiro; Kikkawa, Yoshiaki; Tsuchiya, Kimiyuki; Harada, Masashi; Kanoe, Masamitsu; Yoshiyuki, Mizuko; Yonekawa, Hiromichi

    2003-04-01

    Microchiroptera have diversified into many species whose size and the shapes of the complicated ear and nose have been adapted to their echolocation abilities. Their speciation processes, and intra- and interspecies relationships are still under discussion. Here we report on the geographical variation of Japanese Rhinolophus ferrumequinum and R. cornutus using the complete sequence of the mitochondrial cytochrome b gene to clarify the phylogenetic positions of the 2 species as well as that of Rhinolophidae within the Microchiroptera. We have found that sequence divergence values within each of the 2 species are unexpectedly low (0.07%-0.94%). We have also found that there is no local specificity of their mtCytb alleles. On the other hand, the divergence values for Japanese Microchiroptera (12.7%-16.6%) are much higher than those for other mammalian genera. Similarly, the values among five genera of Vespertilionidae were 20.5%-27.3%. Phylogenetic analysis shows that the 2 species of family Rhinolophidae in the suborder Microchiroptera belong to the Megachiroptera cluster in the constructed maximum parsimony tree. These results suggest that the speciation of Rhinolophidae involved its divergence as an independent lineage from other Microchiroptera, and other microbats might be paraphyletic. In addition, the tree also shows that the order Chiroptera is monophylitic, and the closest group to Chiroptera is the ungulates.

  10. The R package otu2ot for implementing the entropy decomposition of nucleotide variation in sequence data

    Directory of Open Access Journals (Sweden)

    Alban eRamette

    2014-11-01

    Full Text Available Oligotyping is a novel, supervised computational method that classifies closely related sequences into oligotypes (OTs based on subtle nucleotide variations (Eren et al. 2013. Its application to microbial datasets has helped reveal ecological patterns which are often hidden by the way sequence data are currently clustered to define operational taxonomic units (OTUs. Here, we implemented the OT entropy decomposition procedure and its unsupervised version, Minimal Entropy Decomposition (MED; Eren et al. 2014, in the statistical programming language and environment, R. The aims are to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework. In addition, two complementary approaches are implemented: 1 An analytical method (the broken stick model is proposed to help identify oligotypes of low abundance that could be generated by chance alone and 2 a one-pass profiling (OP method, to efficiently identify those OTUs whose subsequent oligotyping would be most promising. These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible. The package and procedures are illustrated by several tutorials and examples.

  11. CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing.

    Science.gov (United States)

    Onsongo, Getiria; Baughn, Linda B; Bower, Matthew; Henzler, Christine; Schomaker, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2016-11-01

    Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation-random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  12. Exploration of the effect of sequence variations located inside the binding pocket of HIV-1 and HIV-2 proteases.

    Science.gov (United States)

    Triki, Dhoha; Billot, Telli; Visseaux, Benoit; Descamps, Diane; Flatters, Delphine; Camproux, Anne-Claude; Regad, Leslie

    2018-04-10

    HIV-2 protease (PR2) is naturally resistant to most FDA (Food and Drug Administration)-approved HIV-1 protease inhibitors (PIs), a major antiretroviral class. In this study, we compared the PR1 and PR2 binding pockets extracted from structures complexed with 12 ligands. The comparison of PR1 and PR2 pocket properties showed that bound PR2 pockets were more hydrophobic with more oxygen atoms and fewer nitrogen atoms than PR1 pockets. The structural comparison of PR1 and PR2 pockets highlighted structural changes induced by their sequence variations and that were consistent with these property changes. Specifically, substitutions at residues 31, 46, and 82 induced structural changes in their main-chain atoms that could affect PI binding in PR2. In addition, the modelling of PR1 mutant structures containing V32I and L76M substitutions revealed a cooperative mechanism leading to structural deformation of flap-residue 45 that could modify PR2 flexibility. Our results suggest that substitutions in the PR1 and PR2 pockets can modify PI binding and flap flexibility, which could underlie PR2 resistance against PIs. These results provide new insights concerning the structural changes induced by PR1 and PR2 pocket variation changes, improving the understanding of the atomic mechanism of PR2 resistance to PIs.

  13. Database Description - ASTRA | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name ASTRA Alternative n...tics Journal Search: Contact address Database classification Nucleotide Sequence Databases - Gene structure,...3702 Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database description The database represents classified p...(10):1211-6. External Links: Original website information Database maintenance site National Institute of Ad... for user registration Not available About This Database Database Description Dow

  14. Seasonal Variations in the Risk of Reoperation for Surgical Site Infection Following Elective Spinal Fusion Surgery: A Retrospective Study Using the Japanese Diagnosis Procedure Combination Database.

    Science.gov (United States)

    Ohya, Junichi; Chikuda, Hirotaka; Oichi, Takeshi; Kato, So; Matsui, Hiroki; Horiguchi, Hiromasa; Tanaka, Sakae; Yasunaga, Hideo

    2017-07-15

    A retrospective study of data abstracted from the Diagnosis Procedure Combination (DPC) database, a national representative database in Japan. The aim of this study was to examine seasonal variations in the risk of reoperation for surgical site infection (SSI) following spinal fusion surgery. Although higher rates of infection in the summer than in other seasons were thought to be caused by increasing inexperience of new staff, high temperature, and high humidity, no studies have examined seasonal variations in the risk of SSI following spinal fusion surgery in the country where medical staff rotation timing is not in summer season. In Japan, medical staff rotation starts in April. We retrospectively extracted the data of patients who were admitted between July 2010 and March 2013 from the DPC database. Patients were included if they were aged 20 years or older and underwent elective spinal fusion surgery. The primary outcome was reoperation for SSI during hospitalization. We performed multivariate analysis to clarify the risk factors of primary outcome with adjustment for patient background characteristics. We identified 47,252 eligible patients (23,659 male, 23,593 female). The mean age of the patients was 65.4 years (range, 20-101 yrs). Overall, reoperation for SSI occurred in 0.93% of the patients during hospitalization. The risk of reoperation for SSI was significantly higher in April (vs. February; odds ratio, 1.93; 95% confidence interval, 1.09-3.43, P = 0.03) as well as other known risk factors. In subgroup analysis with stratification for type of hospital, month of surgery was identified as an independent risk factor of reoperation for SSI among cases in an academic hospital, although there was no seasonal variation among those in a nonacademic hospital. This study showed that month of surgery is a risk factor of reoperation for SSI following elective spinal fusion surgery, nevertheless, in the country where medical staff rotation timing is not in

  15. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L..

    Directory of Open Access Journals (Sweden)

    Bin Han

    Full Text Available Microsatellites or simple sequence repeats (SSRs are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR; 70,564 (23.9% were found to be monomorphic and 224,703 (76.1% were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3% amplified one locus, 8 (17.8% amplified multiple identical loci, and 13 (28.9% did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising

  16. RegTransBase - A Database Of Regulatory Sequences and Interactionsin a Wide Range of Prokaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kazakov, Alexei E.; Cipriano, Michael J.; Novichkov, Pavel S.; Minovitsky, Simon; Vinogradov, Dmitry V.; Arkin, Adam; Mironov, AndreyA.; Gelfand, Mikhail S.; Dubchak, Inna

    2006-07-01

    RegTransBase, a manually curated database of regulatoryinteractions in prokaryotes, captures the knowledge in publishedscientific literature using a controlled vocabulary. Although a number ofdatabases describing interactions between regulatory proteins and theirbinding sites are currently being maintained, they focus mostly on themodel organisms Escherichia coli and Bacillus subtilis, or are entirelycomputationally derived. RegTransBase describes a large number ofregulatory interactions reported in many organisms and contains varioustypes of experimental data, in particular: the activation or repressionof transcription by an identified direct regulator; determining thetranscriptional regulatory function of a protein (or RNA) directlybinding to DNA (RNA); mapping or prediction of binding site for aregulatory protein; characterization of regulatory mutations. Currently,the RegTransBase content is derived from about 3000 relevant articlesdescribing over 7000 experiments in relation to 128 microbes. It containsdata on the regulation of about 7500 genes and evidence for 6500interactions with 650 regulators. RegTransBase also contains manuallycreated position weight matrices (PWM) that can be used to identifycandidate regulatory sites in over 60 species. RegTransBase is availableat http://regtransbase.lbl.gov.

  17. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    Science.gov (United States)

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  18. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

    Science.gov (United States)

    Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D

    2018-01-04

    The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

  19. Minimising inter-laboratory variation when constructing a unified molecular database of plant varieties in an allogamous crop

    DEFF Research Database (Denmark)

    Jones, Huw; Bernole, Anne; Jensen, Louise Bach

    2008-01-01

    be reasonable to represent a variety by the common ‘major alleles' in a profile, but how to define these ‘major alleles' remains problematic. This paper describes methods of analysing DNA microsatellite data that will allow independent and objective data production at a number of laboratories. Methods......The construction of large-scale databases of molecular profiles of plant varieties for variety identification and diversity analyses is of considerable interest. When varieties of an allogamous species such as oilseed rape are analysed and described using molecular markers such as microsatellites......, care is needed to represent the variety in a meaningful yet useful way. It is possible to characterise such heterogeneous genotypes by analysing bulked samples comprising more than one individual seed or plant, but this approach may result in complex microsatellite profiles. Intuitively it would...

  20. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    Science.gov (United States)

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics. Copyright © 2014 by the American Society for Biochemistry and Molecular Biology, Inc.

  1. CYP2D7 sequence variation interferes with TaqMan CYP2D6*15 and *35 genotyping

    Directory of Open Access Journals (Sweden)

    Amanda K Riffel

    2016-01-01

    Full Text Available TaqMan™ genotyping assays are widely used to genotype CYP2D6, which encodes a major drug metabolizing enzyme. Assay design for CYP2D6 can be challenging owing to the presence of two pseudogenes, CYP2D7 and CYP2D8, structural and copy number variation and numerous single nucleotide polymorphisms (SNPs some of which reflect the wild-type sequence of the CYP2D7 pseudogene. The aim of this study was to identify the mechanism causing false positive CYP2D6*15 calls and remediate those by redesigning and validating alternative TaqMan genotype assays. Among 13,866 DNA samples genotyped by the CompanionDx® lab on the OpenArray platform, 70 samples were identified as heterozygotes for 137Tins, the key SNP of CYP2D6*15. However, only 15 samples were confirmed when tested with the Luminex xTAG CYP2D6 Kit and sequencing of CYP2D6-specific long range (XL-PCR products. Genotype and gene resequencing of CYP2D6 and CYP2D7-specific XL-PCR products revealed a CC>GT dinucleotide SNP in exon 1 of CYP2D7 that reverts the sequence to CYP2D6 and allows a TaqMan assay PCR primer to bind. Because CYP2D7 also carries a Tins, a false-positive mutation signal is generated. This CYP2D7 SNP was also responsible for generating false-positive signals for rs769258 (CYP2D6*35 which is also located in exon 1. Although alternative CYP2D6*15 and *35 assays resolved the issue, we discovered a novel CYP2D6*15 subvariant in one sample that carries additional SNPs preventing detection with the alternate assay. The frequency of CYP2D6*15 was 0.1% in this ethnically diverse U.S. population sample. In addition, we also discovered linkage between the CYP2D7 CC>GT dinucleotide SNP and the 77G>A (rs28371696 SNP of CYP2D6*43. The frequency of this tentatively functional allele was 0.2%. Taken together, these findings emphasize that regardless of how careful genotyping assays are designed and evaluated before being commercially marketed, rare or unknown SNPs underneath primer and/or probe

  2. Database Description - TMFunction | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available sidue (or mutant) in a protein. The experimental data are collected from the literature both by searching th...the sequence database, UniProt, structural database, PDB, and literature database

  3. Database Description - RMG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ase Description General information of database Database name RMG Alternative name ...raki 305-8602, Japan National Institute of Agrobiological Sciences E-mail : Database... classification Nucleotide Sequence Databases Organism Taxonomy Name: Oryza sativa Japonica Group Taxonomy ID: 39947 Database...rnal: Mol Genet Genomics (2002) 268: 434–445 External Links: Original website information Database...available URL of Web services - Need for user registration Not available About This Database Database Descri

  4. Natural selection in a population of Drosophila melanogaster explained by changes in gene expression caused by sequence variation in core promoter regions.

    Science.gov (United States)

    Sato, Mitsuhiko P; Makino, Takashi; Kawata, Masakado

    2016-02-09

    Understanding the evolutionary forces that influence variation in gene regulatory regions in natural populations is an important challenge for evolutionary biology because natural selection for such variations could promote adaptive phenotypic evolution. Recently, whole-genome sequence analyses have identified regulatory regions subject to natural selection. However, these studies could not identify the relationship between sequence variation in the detected regions and change in gene expression levels. We analyzed sequence variations in core promoter regions, which are critical regions for gene regulation in higher eukaryotes, in a natural population of Drosophila melanogaster, and identified core promoter sequence variations associated with differences in gene expression levels subjected to natural selection. Among the core promoter regions whose sequence variation could change transcription factor binding sites and explain differences in expression levels, three core promoter regions were detected as candidates associated with purifying selection or selective sweep and seven as candidates associated with balancing selection, excluding the possibility of linkage between these regions and core promoter regions. CHKov1, which confers resistance to the sigma virus and related insecticides, was identified as core promoter regions that has been subject to selective sweep, although it could not be denied that selection for variation in core promoter regions was due to linked single nucleotide polymorphisms in the regulatory region outside core promoter regions. Nucleotide changes in core promoter regions of CHKov1 caused the loss of two basal transcription factor binding sites and acquisition of one transcription factor binding site, resulting in decreased gene expression levels. Of nine core promoter regions regions associated with balancing selection, brat, and CG9044 are associated with neuromuscular junction development, and Nmda1 are associated with learning

  5. Variation in ribosomal and mitochondrial DNA sequences demonstrates the existence of intraspecific groups in Paramecium multimicronucleatum (Ciliophora, Oligohymenophorea).

    Science.gov (United States)

    Tarcz, Sebastian; Potekhin, Alexey; Rautian, Maria; Przyboś, Ewa

    2012-05-01

    This is the first phylogenetic study of the intraspecific variability within Paramecium multimicronucleatum with the application of two-loci analysis (ITS1-5.8S-ITS2-5'LSU rDNA and COI mtDNA) carried out on numerous strains originated from different continents. The species has been shown to have a complex structure of several sibling species within taxonomic species. Our analysis revealed the existence of 10 haplotypes for the rDNA fragment and 15 haplotypes for the COI fragment in the studied material. The mean distance for all of the studied P. multimicronucleatum sequence pairs was p=0.025/0.082 (rDNA/COI). Despite the greater variation of the COI fragment, the COI-derived tree topology is similar to the tree topology constructed on the basis of the rDNA fragment. P. multimicronucleatum strains are divided into three main clades. The tree based on COI fragment analysis presents a greater resolution of the studied P. multimicronucleatum strains. Our results indicate that the strains of P. multimicronucleatum that appear in different clades on the trees could belong to different syngens. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    Directory of Open Access Journals (Sweden)

    Kei-ichi Morita

    Full Text Available Gorlin syndrome (GS is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs. In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals, whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  7. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    Science.gov (United States)

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  8. RNA Sequencing Analysis Reveals Transcriptomic Variations in Tobacco (Nicotiana tabacum Leaves Affected by Climate, Soil, and Tillage Factors

    Directory of Open Access Journals (Sweden)

    Bo Lei

    2014-04-01

    Full Text Available The growth and development of plants are sensitive to their surroundings. Although numerous studies have analyzed plant transcriptomic variation, few have quantified the effect of combinations of factors or identified factor-specific effects. In this study, we performed RNA sequencing (RNA-seq analysis on tobacco leaves derived from 10 treatment combinations of three groups of ecological factors, i.e., climate factors (CFs, soil factors (SFs, and tillage factors (TFs. We detected 4980, 2916, and 1605 differentially expressed genes (DEGs that were affected by CFs, SFs, and TFs, which included 2703, 768, and 507 specific and 703 common DEGs (simultaneously regulated by CFs, SFs, and TFs, respectively. GO and KEGG enrichment analyses showed that genes involved in abiotic stress responses and secondary metabolic pathways were overrepresented in the common and CF-specific DEGs. In addition, we noted enrichment in CF-specific DEGs related to the circadian rhythm, SF-specific DEGs involved in mineral nutrient absorption and transport, and SF- and TF-specific DEGs associated with photosynthesis. Based on these results, we propose a model that explains how plants adapt to various ecological factors at the transcriptomic level. Additionally, the identified DEGs lay the foundation for future investigations of stress resistance, circadian rhythm and photosynthesis in tobacco.

  9. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    Directory of Open Access Journals (Sweden)

    Charles Richard Bradshaw

    Full Text Available Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10, a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in

  10. Ensembl variation resources

    Directory of Open Access Journals (Sweden)

    Marin-Garcia Pablo

    2010-05-01

    Full Text Available Abstract Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.

  11. Variation in Perfusion Strategies for Neonatal and Infant Aortic Arch Repair: Contemporary Practice in the STS Congenital Heart Surgery Database.

    Science.gov (United States)

    Meyer, David B; Jacobs, Jeffrey P; Hill, Kevin; Wallace, Amelia S; Bateson, Brian; Jacobs, Marshall L

    2016-09-01

    Regional cerebral perfusion (RCP) is used as an adjunct or alternative to deep hypothermic circulatory arrest (DHCA) for neonates and infants undergoing aortic arch repair. Clinical studies have not demonstrated clear superiority of either strategy, and multicenter data regarding current use of these strategies are lacking. We sought to describe the variability in contemporary practice patterns for use of these techniques. The Society of Thoracic Surgeons Congenital Heart Surgery Database (2010-2013) was queried to identify neonates and infants whose index operation involved aortic arch repair with cardiopulmonary bypass. Perfusion strategy was classified as isolated DHCA, RCP (with less than or equal to ten minutes of DHCA), or mixed (RCP with more than ten minutes of DHCA). Data were analyzed for the entire cohort and stratified by operation subgroups. Overall, 4,523 patients (105 centers) were identified; median age seven days (interquartile range: 5.0-13.0). The most prevalent perfusion strategy was RCP (43%). Deep hypothermic circulatory arrest and mixed perfusion accounted for 32% and 16% of cases, respectively. In all, 59% of operations involved some period of RCP. Regional cerebral perfusion was the most prevalent perfusion strategy for each operation subgroup. Neither age nor weight was associated with perfusion strategy, but reoperations were less likely to use RCP (31% vs 45%, P RCP and DHCA in the RCP group was longer than the DHCA time in the DHCA group (45 vs 36 minutes, P neonates and infants. In contemporary practice, RCP is the most prevalent perfusion strategy for these procedures. Use of DHCA is also common. Further investigation is warranted to ascertain possible relative merits of the various perfusion techniques. © The Author(s) 2016.

  12. Sequence variations in C9orf72 downstream of the hexanucleotide repeat region and its effect on repeat-primed PCR interpretation

    DEFF Research Database (Denmark)

    Nordin, Angelica; Akimoto, Chizuru; Wuolikainen, Anna

    2017-01-01

    A large GGGGCC-repeat expansion mutation (HREM) in C9orf72 is the most common known cause of ALS and FTD in European populations. Sequence variations immediately downstream of the HREM region have previously been observed and have been suggested to be one reason for difficulties in interpreting R...

  13. Maternal Clinical Diagnoses and Hospital Variation in the Risk of Cesarean Delivery: Analyses of a National US Hospital Discharge Database

    Science.gov (United States)

    Kozhimannil, Katy B.; Arcaya, Mariana C.; Subramanian, S. V.

    2014-01-01

    Background Cesarean delivery is the most common inpatient surgery in the United States, where 1.3 million cesarean sections occur annually, and rates vary widely by hospital. Identifying sources of variation in cesarean use is crucial to improving the consistency and quality of obstetric care. We used hospital discharge records to examine the extent to which variability in the likelihood of cesarean section across US hospitals was attributable to individual women's clinical diagnoses. Methods and Findings Using data from the 2009 and 2010 Nationwide Inpatient Sample from the Healthcare Cost and Utilization Project—a 20% sample of US hospitals—we analyzed data for 1,475,457 births in 1,373 hospitals. We fitted multilevel logistic regression models (patients nested in hospitals). The outcome was cesarean (versus vaginal) delivery. Covariates included diagnosis of diabetes in pregnancy, hypertension in pregnancy, hemorrhage during pregnancy or placental complications, fetal distress, and fetal disproportion or obstructed labor; maternal age, race/ethnicity, and insurance status; and hospital size and location/teaching status. The cesarean section prevalence was 22.0% (95% confidence interval 22.0% to 22.1%) among women with no prior cesareans. In unadjusted models, the between-hospital variation in the individual risk of primary cesarean section was 0.14 (95% credible interval 0.12 to 0.15). The difference in the probability of having a cesarean delivery between hospitals was 25 percentage points. Hospital variability did not decrease after adjusting for patient diagnoses, socio-demographics, and hospital characteristics (0.16 [95% credible interval 0.14 to 0.18]). A limitation is that these data, while nationally representative, did not contain information on parity or gestational age. Conclusions Variability across hospitals in the individual risk of cesarean section is not decreased by accounting for differences in maternal diagnoses. These findings highlight

  14. Sequencing of a patient with balanced chromosome abnormalities and neurodevelopmental disease identifies disruption of multiple high risk loci by structural variation.

    Directory of Open Access Journals (Sweden)

    Jonathon Blake

    Full Text Available Balanced chromosome abnormalities (BCAs occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14 that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception.

  15. Sequencing of a Patient with Balanced Chromosome Abnormalities and Neurodevelopmental Disease Identifies Disruption of Multiple High Risk Loci by Structural Variation

    Science.gov (United States)

    Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko

    2014-01-01

    Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750

  16. Identification of mitochondrial DNA sequence variation and development of single nucleotide polymorphic markers for CMS-D8 in cotton.

    Science.gov (United States)

    Suzuki, Hideaki; Yu, Jiwen; Wang, Fei; Zhang, Jinfa

    2013-06-01

    Cytoplasmic male sterility (CMS), which is a maternally inherited trait and controlled by novel chimeric genes in the mitochondrial genome, plays a pivotal role in the production of hybrid seed. In cotton, no PCR-based marker has been developed to discriminate CMS-D8 (from Gossypium trilobum) from its normal Upland cotton (AD1, Gossypium hirsutum) cytoplasm. The objective of the current study was to develop PCR-based single nucleotide polymorphic (SNP) markers from mitochondrial genes for the CMS-D8 cytoplasm. DNA sequence variation in mitochondrial genes involved in the oxidative phosphorylation chain including ATP synthase subunit 1, 4, 6, 8 and 9, and cytochrome c oxidase 1, 2 and 3 subunits were identified by comparing CMS-D8, its isogenic maintainer and restorer lines on the same nuclear genetic background. An allelic specific PCR (AS-PCR) was utilized for SNP typing by incorporating artificial mismatched nucleotides into the third or fourth base from the 3' terminus in both the specific and nonspecific primers. The result indicated that the method modifying allele-specific primers was successful in obtaining eight SNP markers out of eight SNPs using eight primer pairs to discriminate two alleles between AD1 and CMS-D8 cytoplasms. Two of the SNPs for atp1 and cox1 could also be used in combination to discriminate between CMS-D8 and CMS-D2 cytoplasms. Additionally, a PCR-based marker from a nine nucleotide insertion-deletion (InDel) sequence (AATTGTTTT) at the 59-67 bp positions from the start codon of atp6, which is present in the CMS and restorer lines with the D8 cytoplasm but absent in the maintainer line with the AD1 cytoplasm, was also developed. A SNP marker for two nucleotide substitutions (AA in AD1 cytoplasm to CT in CMS-D8 cytoplasm) in the intron (1,506 bp) of cox2 gene was also developed. These PCR-based SNP markers should be useful in discriminating CMS-D8 and AD1 cytoplasms, or those with CMS-D2 cytoplasm as a rapid, simple, inexpensive, and

  17. Analysis of nucleotide sequence variations in herpes simplex virus types 1 and 2, and varicella-zoster virus

    International Nuclear Information System (INIS)

    Chiba, A.; Suzutani, T.; Koyano, S.; Azuma, M.; Saijo, M.

    1998-01-01

    To analyze the difference in the degree of divergence between genes from identical herpes virus species, we examined the nucleotide sequence of genes from the herpes simplex virus type 1 (HSV-l ) strains VR-3 and 17 encoding thymidine kinase (TK), deoxyribonuclease (DNase), protein kinase (PK; UL13) and virion-associated host shut off (vhs) protein (UL41). The frequency of nucleotide substitutions per 1 kb in TK gene was 2.5 to 4.3 times higher than those in the other three genes. To prove that the polymorphism of HSV-1 TK gene is common characteristic of herpes virus TK genes, we compared the diversity of TK genes among eight HSV-l , six herpes simplex virus type 2 (HSV-2) and seven varicella-zoster virus (VZV) strains. The average frequency of nucleotide substitutions per 1 kb in the TK gene of HSV-l strains was 4-fold higher than that in the TK gene of HSV-2 strains. The VZV TK gene was highly conserved and only two nucleotide changes were evident in VZV strains. However, the rate of non-synonymous substitutions in total nucleotide substitutions was similar among the TK genes of the three viruses. This result indicated that the mutational rates differed, but there were no significant differences in selective pressure. We conclude that HSV-l TK gene is highly diverged and analysis of variations in the gene is a useful approach for understanding the molecular evolution of HSV-l in a short period. (authors)

  18. Whole genome re-sequencing reveals genome-wide variations among parental lines of 16 mapping populations in chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Thudi, Mahendar; Khan, Aamir W; Kumar, Vinay; Gaur, Pooran M; Katta, Krishnamohan; Garg, Vanika; Roorkiwal, Manish; Samineni, Srinivasan; Varshney, Rajeev K

    2016-01-27

    Chickpea (Cicer arietinum L.) is the second most important grain legume cultivated by resource poor farmers in South Asia and Sub-Saharan Africa. In order to harness the untapped genetic potential available for chickpea improvement, we re-sequenced 35 chickpea genotypes representing parental lines of 16 mapping populations segregating for abiotic (drought, heat, salinity), biotic stresses (Fusarium wilt, Ascochyta blight, Botrytis grey mould, Helicoverpa armigera) and nutritionally important (protein content) traits using whole genome re-sequencing approach. A total of 192.19 Gb data, generated on 35 genotypes of chickpea, comprising 973.13 million reads, with an average sequencing depth of ~10 X for each line. On an average 92.18 % reads from each genotype were aligned to the chickpea reference genome with 82.17 % coverage. A total of 2,058,566 unique single nucleotide polymorphisms (SNPs) and 292,588 Indels were detected while comparing with the reference chickpea genome. Highest number of SNPs were identified on the Ca4 pseudomolecule. In addition, copy number variations (CNVs) such as gene deletions and duplications were identified across the chickpea parental genotypes, which were minimum in PI 489777 (1 gene deletion) and maximum in JG 74 (1,497). A total of 164,856 line specific variations (144,888 SNPs and 19,968 Indels) with the highest percentage were identified in coding regions in ICC 1496 (21 %) followed by ICCV 97105 (12 %). Of 539 miscellaneous variations, 339, 138 and 62 were inter-chromosomal variations (CTX), intra-chromosomal variations (ITX) and inversions (INV) respectively. Genome-wide SNPs, Indels, CNVs, PAVs, and miscellaneous variations identified in different mapping populations are a valuable resource in genetic research and helpful in locating genes/genomic segments responsible for economically important traits. Further, the genome-wide variations identified in the present study can be used for developing high density SNP arrays for

  19. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

    DEFF Research Database (Denmark)

    Liu, Siyang; Huang, Shujia; Rao, Junhua

    2015-01-01

    present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...

  20. Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening

    Directory of Open Access Journals (Sweden)

    Richardson Annette C

    2008-07-01

    Full Text Available Abstract Background Kiwifruit (Actinidia spp. are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs. Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons. Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases and pathways (terpenoid biosynthesis is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.

  1. Melanocortin 1 receptor (MC1R) gene sequence variation and melanism in the gray (Sciurus carolinensis), fox (Sciurus niger), and red (Sciurus vulgaris) squirrel.

    Science.gov (United States)

    McRobie, Helen R; King, Linda M; Fanutti, Cristina; Coussons, Peter J; Moncrief, Nancy D; Thomas, Alison P M

    2014-01-01

    Sequence variations in the melanocortin 1 receptor (MC1R) gene are associated with melanism in many different species of mammals, birds, and reptiles. The gray squirrel (Sciurus carolinensis), found in the British Isles, was introduced from North America in the late 19th century. Melanism in the British gray squirrel is associated with a 24-bp deletion in the MC1R. To investigate the origin of this mutation, we sequenced the MC1R of 95 individuals including 44 melanic gray squirrels from both the British Isles and North America. Melanic gray squirrels of both populations had the same 24-bp deletion associated with melanism. Given the significant deletion associated with melanism in the gray squirrel, we sequenced the MC1R of both wild-type and melanic fox squirrels (Sciurus niger) (9 individuals) and red squirrels (Sciurus vulgaris) (39 individuals). Unlike the gray squirrel, no association between sequence variation in the MC1R and melanism was found in these 2 species. We conclude that the melanic gray squirrel found in the British Isles originated from one or more introductions of melanic gray squirrels from North America. We also conclude that variations in the MC1R are not associated with melanism in the fox and red squirrels.

  2. Development of a genotype-by-sequencing immunogenetic assay as exemplified by screening for variation in red fox with and without endemic rabies exposure.

    Science.gov (United States)

    Donaldson, Michael E; Rico, Yessica; Hueffer, Karsten; Rando, Halie M; Kukekova, Anna V; Kyle, Christopher J

    2018-01-01

    Pathogens are recognized as major drivers of local adaptation in wildlife systems. By determining which gene variants are favored in local interactions among populations with and without disease, spatially explicit adaptive responses to pathogens can be elucidated. Much of our current understanding of host responses to disease comes from a small number of genes associated with an immune response. High-throughput sequencing (HTS) technologies, such as genotype-by-sequencing (GBS), facilitate expanded explorations of genomic variation among populations. Hybridization-based GBS techniques can be leveraged in systems not well characterized for specific variants associated with disease outcome to "capture" specific genes and regulatory regions known to influence expression and disease outcome. We developed a multiplexed, sequence capture assay for red foxes to simultaneously assess ~300-kbp of genomic sequence from 116 adaptive, intrinsic, and innate immunity genes of predicted adaptive significance and their putative upstream regulatory regions along with 23 neutral microsatellite regions to control for demographic effects. The assay was applied to 45 fox DNA samples from Alaska, where three arctic rabies strains are geographically restricted and endemic to coastal tundra regions, yet absent from the boreal interior. The assay provided 61.5% on-target enrichment with relatively even sequence coverage across all targeted loci and samples (mean = 50×), which allowed us to elucidate genetic variation across introns, exons, and potential regulatory regions (4,819 SNPs). Challenges remained in accurately describing microsatellite variation using this technique; however, longer-read HTS technologies should overcome these issues. We used these data to conduct preliminary analyses and detected genetic structure in a subset of red fox immune-related genes between regions with and without endemic arctic rabies. This assay provides a template to assess immunogenetic variation

  3. Single-strand conformation polymorphism (SSCP)-based mutation scanning approaches to fingerprint sequence variation in ribosomal DNA of ascaridoid nematodes.

    Science.gov (United States)

    Zhu, X Q; Gasser, R B

    1998-06-01

    In this study, we assessed single-strand conformation polymorphism (SSCP)-based approaches for their capacity to fingerprint sequence variation in ribosomal DNA (rDNA) of ascaridoid nematodes of veterinary and/or human health significance. The second internal transcribed spacer region (ITS-2) of rDNA was utilised as the target region because it is known to provide species-specific markers for this group of parasites. ITS-2 was amplified by PCR from genomic DNA derived from individual parasites and subjected to analysis. Direct SSCP analysis of amplicons from seven taxa (Toxocara vitulorum, Toxocara cati, Toxocara canis, Toxascaris leonina, Baylisascaris procyonis, Ascaris suum and Parascaris equorum) showed that the single-strand (ss) ITS-2 patterns produced allowed their unequivocal identification to species. While no variation in SSCP patterns was detected in the ITS-2 within four species for which multiple samples were available, the method allowed the direct display of four distinct sequence types of ITS-2 among individual worms of T. cati. Comparison of SSCP/sequencing with the methods of dideoxy fingerprinting (ddF) and restriction endonuclease fingerprinting (REF) revealed that also ddF allowed the definition of the four sequence types, whereas REF displayed three of four. The findings indicate the usefulness of the SSCP-based approaches for the identification of ascaridoid nematodes to species, the direct display of sequence variation in rDNA and the detection of population variation. The ability to fingerprint microheterogeneity in ITS-2 rDNA using such approaches also has implications for studying fundamental aspects relating to mutational change in rDNA.

  4. Intraspecific variation between the ITS sequences of Toxocara canis, Toxocara cati and Toxascaris leonina from different host species in south-western Poland.

    Science.gov (United States)

    Fogt-Wyrwas, R; Mizgajska-Wiktor, H; Pacoń, J; Jarosz, W

    2013-12-01

    Some parasitic nematodes can inhabit different definitive hosts, which raises the question of the intraspecific variability of the nematode genotype affecting their preferences to choose particular species as hosts. Additionally, the issue of a possible intraspecific DNA microheterogeneity in specimens from different parts of the world seems to be interesting, especially from the evolutionary point of view. The problem was analysed in three related species - Toxocara canis, Toxocara cati and Toxascaris leonina - specimens originating from Central Europe (Poland). Using specific primers for species identification, internal transcribed spacer (ITS)-1 and ITS-2 regions were amplified and then sequenced. The sequences obtained were compared with sequences previously described for specimens originating from other geographical locations. No differences in nucleotide sequences were established in T. canis isolated from two different hosts (dogs and foxes). A comparison of ITS sequences of T. canis from Poland with sequences deposited in GenBank showed that the scope of intraspecific variability of the species did not exceed 0.4%, while in T. cati the differences did not exceed 2%. Significant differences were found in T. leonina, where ITS-1 differed by 3% and ITS-2 by as much as 7.4% in specimens collected from foxes in Poland and dogs in Australia. Such scope of differences in the nucleotide sequence seems to exceed the intraspecific variation of the species.

  5. Genetic Variation in Cardiomyopathy and Cardiovascular Disorders.

    Science.gov (United States)

    McNally, Elizabeth M; Puckelwartz, Megan J

    2015-01-01

    With the wider deployment of massively-parallel, next-generation sequencing, it is now possible to survey human genome data for research and clinical purposes. The reduced cost of producing short-read sequencing has now shifted the burden to data analysis. Analysis of genome sequencing remains challenged by the complexity of the human genome, including redundancy and the repetitive nature of genome elements and the large amount of variation in individual genomes. Public databases of human genome sequences greatly facilitate interpretation of common and rare genetic variation, although linking database sequence information to detailed clinical information is limited by privacy and practical issues. Genetic variation is a rich source of knowledge for cardiovascular disease because many, if not all, cardiovascular disorders are highly heritable. The role of rare genetic variation in predicting risk and complications of cardiovascular diseases has been well established for hypertrophic and dilated cardiomyopathy, where the number of genes that are linked to these disorders is growing. Bolstered by family data, where genetic variants segregate with disease, rare variation can be linked to specific genetic variation that offers profound diagnostic information. Understanding genetic variation in cardiomyopathy is likely to help stratify forms of heart failure and guide therapy. Ultimately, genetic variation may be amenable to gene correction and gene editing strategies.

  6. Sequence variation of bovine mitochondrial ND-5 between haplotypes of composite and Hereford Breeds of beef cattle

    Directory of Open Access Journals (Sweden)

    SUTARNO

    2002-07-01

    Full Text Available The aims of the study were to: Investigate polymorphisms in the ND-5 region of bovine mitochondrial DNA in the composite and purebred Hereford herds from the Wokalup selection experiment, sequencing and compare the sequences between haplotypes and published sequence from Genebank. A total of 194 Hereford and 235 composite breed cattle from Wokalup Research Station were used in this study. The mitochondrial DNA was extracted using Wizard genomic DNA purification system from Promega. ND-5 fragment of mitochondrial DNA was amplified using PCR and continued with RFLP. Each haplotypes were sequenced. PCR products of each haplotype were cloned into pCR II, transformed, colonies selection, plasmid DNA extraction continued with cycle sequencing. Polymorphisms were found in both breeds of cattle in ND-5 region of mitochondrial DNA by PCR-RFLP analysis. Sequencing analysis confirmed the RFLPs data.

  7. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R)

    Science.gov (United States)

    2013-01-01

    Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680

  8. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

    2012-01-01

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  9. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  10. A Comprehensive Survey of Sequence Variation in the ABCA4 (ABCR) Gene in Stargardt Disease and Age-Related Macular Degeneration

    OpenAIRE

    Rivera, Andrea; White, Karen; Stöhr, Heidi; Steiner, Klaus; Hemmrich, Nadine; Grimm, Timo; Jurklies, Bernhard; Lorenz, Birgit; Scholl, Hendrik P. N.; Apfelstedt-Sylla, Eckhart; Weber, Bernhard H. F.

    2000-01-01

    Stargardt disease (STGD) is a common autosomal recessive maculopathy of early and young-adult onset and is caused by alterations in the gene encoding the photoreceptor-specific ATP-binding cassette (ABC) transporter (ABCA4). We have studied 144 patients with STGD and 220 unaffected individuals ascertained from the German population, to complete a comprehensive, population-specific survey of the sequence variation in the ABCA4 gene. In addition, we have assessed the proposed role for ABCA4 in ...

  11. Improving taxonomic accuracy for fungi in public sequence databases: applying ‘one name one species’ in well-defined genera with Trichoderma/Hypocrea as a test case

    Science.gov (United States)

    Strope, Pooja K; Chaverri, Priscila; Gazis, Romina; Ciufo, Stacy; Domrachev, Michael; Schoch, Conrad L

    2017-01-01

    Abstract The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data. This database is focused on sequences obtained from type material stored in public collections. While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks. We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published. In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database. A name status report is available here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species. Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions. During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications. Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353 PMID:29220466

  12. Update on Pneumocystis carinii f. sp. hominis Typing Based on Nucleotide Sequence Variations in Internal Transcribed Spacer Regions of rRNA Genes

    Science.gov (United States)

    Lee, Chao-Hung; Helweg-Larsen, Jannik; Tang, Xing; Jin, Shaoling; Li, Baozheng; Bartlett, Marilyn S.; Lu, Jang-Jih; Lundgren, Bettina; Lundgren, Jens D.; Olsson, Mats; Lucas, Sebastian B.; Roux, Patricia; Cargnel, Antonietta; Atzori, Chiara; Matos, Olga; Smith, James W.

    1998-01-01

    Pneumocystis carinii f. sp. hominis isolates from 207 clinical specimens from nine countries were typed based on nucleotide sequence variations in the internal transcribed spacer regions I and II (ITS1 and ITS2, respectively) of rRNA genes. The number of ITS1 nucleotides has been revised from the previously reported 157 bp to 161 bp. Likewise, the number of ITS2 nucleotides has been changed from 177 to 192 bp. The number of ITS1 sequence types has increased from 2 to 15, and that of ITS2 has increased from 3 to 14. The 15 ITS1 sequence types are designated types A through O, and the 14 ITS2 types are named types a through n. A total of 59 types of P. carinii f. sp. hominis were found in this study. PMID:9508304

  13. Regional variations of basal cell carcinoma incidence in the U.K. using The Health Improvement Network database (2004-10).

    Science.gov (United States)

    Musah, A; Gibson, J E; Leonardi-Bee, J; Cave, M R; Ander, E L; Bath-Hextall, F

    2013-11-01

    Basal cell carcinoma (BCC) is one of the most common types of nonmelanoma skin cancer affecting the white population; however, little is known about how the incidence varies across the U.K. To determine the variation in BCC throughout the U.K. Data from 2004 to 2010 were obtained from The Health Improvement Network database. European and world age-standardized incidence rates (EASRs and WASRs, respectively) were obtained for country-level estimates and levels of socioeconomic deprivation, while strategic health-authority-level estimates were directly age and sex standardized to the U.K. standard population. Incidence-rate ratios were estimated using multivariable Poisson regression models. The overall EASR and WASR of BCC in the U.K. were 98.6 per 100,000 person-years and 66.9 per 100,000 person-years, respectively. Regional-level incidence rates indicated a significant geographical variation in the distribution of BCC, which was more pronounced in the southern parts of the country. The South East Coast had the highest BCC rate followed by South Central, Wales and the South West. Incidence rates were substantially higher in the least deprived groups and we observed a trend of decreasing incidence with increasing levels of deprivation (P < 0.001). Finally, in terms of age groups, the largest annual increase was observed among those aged 30-49 years. Basal cell carcinoma is an increasing health problem in the U.K.; the southern regions of the U.K. and those in the least deprived groups had a higher incidence of BCC. Our findings indicate an increased incidence of BCC for younger age groups below 49 years. © 2013 British Association of Dermatologists.

  14. Genetic Diversity and Sequence Variations at Growth Hormone Loci among Composite and Hereford Populations of Beef Cattle

    Directory of Open Access Journals (Sweden)

    ALAN J. LYMBERY

    2000-07-01

    Full Text Available A total of 194 Hereford and 235 composite breed cattle from Wokalup Research Station were used in this study. The aims of the study were to: Investigate polymorphisms in the growth hormone gene in the composite and purebred Hereford herds from the Wokalup selection experiment, compare genetic diversity in the growth hormone gene of the breeds, sequencing and compare the sequences of growth hormone loci between composite and purebred Hereford herds with published sequence from Genebank. The genomic DNA was extracted using Wizard genomic DNA purification system from Promega. Two fragments of growth hormone gene were amplified using PCR and continued with RFLP. Each genotype in both loci was sequenced. PCR products of each genotypes were cloned into PCR II, transformed, colonies selection, plasmid DNA extraction continued with cycle sequencing. Polymorphisms were found in both breeds of cattle in both loci of GH-L1 and GH-L2 of the growth hormone gene by PCR-RFLP analysis. Sequencing analysis confirmed the RFLPs data, polymorphism detected using AluI at GH-L1 is due to substitution between leusin/ valine at position 127, while polymorphism at the MspI restriction site was caused by transition of C to T at +837 position.

  15. Sequence variation does not confound the measurement of plasma PfHRP2 concentration in African children presenting with severe malaria

    Directory of Open Access Journals (Sweden)

    Ramutton Thiranut

    2012-08-01

    Full Text Available Abstract Background Plasmodium falciparum histidine-rich protein PFHRP2 measurement is used widely for diagnosis, and more recently for severity assessment in falciparum malaria. The Pfhrp2 gene is highly polymorphic, with deletion of the entire gene reported in both laboratory and field isolates. These issues potentially confound the interpretation of PFHRP2 measurements. Methods Studies designed to detect deletion of Pfhrp2 and its paralog Pfhrp3 were undertaken with samples from patients in seven countries contributing to the largest hospital-based severe malaria trial (AQUAMAT. The quantitative relationship between sequence polymorphism and PFHRP2 plasma concentration was examined in samples from selected sites in Mozambique and Tanzania. Results There was no evidence for deletion of either Pfhrp2 or Pfhrp3 in the 77 samples with lowest PFHRP2 plasma concentrations across the seven countries. Pfhrp2 sequence diversity was very high with no haplotypes shared among 66 samples sequenced. There was no correlation between Pfhrp2 sequence length or repeat type and PFHRP2 plasma concentration. Conclusions These findings indicate that sequence polymorphism is not a significant cause of variation in PFHRP2 concentration in plasma samples from African children. This justifies the further development of plasma PFHRP2 concentration as a method for assessing African children who may have severe falciparum malaria. The data also add to the existing evidence base supporting the use of rapid diagnostic tests based on PFHRP2 detection.

  16. Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK motorcycle database.

    Science.gov (United States)

    Ikeda, Shun; Abe, Takashi; Nakamura, Yukiko; Kibinge, Nelson; Hirai Morita, Aki; Nakatani, Atsushi; Ono, Naoaki; Ikemura, Toshimichi; Nakamura, Kensuke; Altaf-Ul-Amin, Md; Kanaya, Shigehiko

    2013-05-01

    Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.

  17. Genetic control of environmental variation of two quantitative traits of Drosophila melanogaster revealed by whole-genome sequencing

    DEFF Research Database (Denmark)

    Sørensen, Peter; de los Campos, Gustavo; Morgante, Fabio

    2015-01-01

    and others more volatile performance. Understanding the mechanisms responsible for environmental variability not only informs medical questions but is relevant in evolution and in agricultural science. In this work fully sequenced inbred lines of Drosophila melanogaster were analyzed to study the nature...... of genetic control of environmental variance for two quantitative traits: starvation resistance (SR) and startle response (SL). The evidence for genetic control of environmental variance is compelling for both traits. Sequence information is incorporated in random regression models to study the underlying...... genetic signals, which are shown to be different in the two traits. Genomic variance in sexual dimorphism was found for SR but not for SL. Indeed, the proportion of variance captured by sequence information and the contribution to this variance from four chromosome segments differ between sexes in SR...

  18. Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools.

    Directory of Open Access Journals (Sweden)

    Jun Ding

    2015-07-01

    Full Text Available DNA sequencing identifies common and rare genetic variants for association studies, but studies typically focus on variants in nuclear DNA and ignore the mitochondrial genome. In fact, analyzing variants in mitochondrial DNA (mtDNA sequences presents special problems, which we resolve here with a general solution for the analysis of mtDNA in next-generation sequencing studies. The new program package comprises 1 an algorithm designed to identify mtDNA variants (i.e., homoplasmies and heteroplasmies, incorporating sequencing error rates at each base in a likelihood calculation and allowing allele fractions at a variant site to differ across individuals; and 2 an estimation of mtDNA copy number in a cell directly from whole-genome sequencing data. We also apply the methods to DNA sequence from lymphocytes of ~2,000 SardiNIA Project participants. As expected, mothers and offspring share all homoplasmies but a lesser proportion of heteroplasmies. Both homoplasmies and heteroplasmies show 5-fold higher transition/transversion ratios than variants in nuclear DNA. Also, heteroplasmy increases with age, though on average only ~1 heteroplasmy reaches the 4% level between ages 20 and 90. In addition, we find that mtDNA copy number averages ~110 copies/lymphocyte and is ~54% heritable, implying substantial genetic regulation of the level of mtDNA. Copy numbers also decrease modestly but significantly with age, and females on average have significantly more copies than males. The mtDNA copy numbers are significantly associated with waist circumference (p-value = 0.0031 and waist-hip ratio (p-value = 2.4×10-5, but not with body mass index, indicating an association with central fat distribution. To our knowledge, this is the largest population analysis to date of mtDNA dynamics, revealing the age-imposed increase in heteroplasmy, the relatively high heritability of copy number, and the association of copy number with metabolic traits.

  19. Numerical Analysis of Residual Stress and Distortion Use Finite Element Method on Inner Bottom Construction of Geomarin IV Survey Ship with Welding Sequence Variations

    Science.gov (United States)

    Syahroni, N.; Hartono, A. B. W.; Murtedjo, M.

    2018-03-01

    In the ship fabrication industry, welding is the most critical stage. If the quality of welding on ship fabrication is not good, then it will affect the strength and overall appearance of the structure. One of the factors that affect the quality of welding is residual stress and distortion. In this research welding simulation is performed on the inner bottom construction of Geomarin IV Ship Survey using shell element and has variation to welding sequence. In this study, welding simulations produced peak temperatures at 2490 K at variation 4. While the lowest peak temperature was produced by variation 2 with a temperature of 2339 K. After welding simulation, it continued simulating residual stresses and distortion. The smallest maximum tensile residual stress found in the inner bottom construction is 375.23 MPa, and the maximum tensile pressure is -20.18 MPa. The residual stress is obtained from variation 3. The distortion occurring in the inner bottom construction for X=720 mm is 4.2 mm and for X=-720 mm, the distortion is 4.92 mm. The distortion is obtained from the variation 3. Near the welding area, distortion value reaches its minimum point. This is because the stiffeners in the form of frames serves as anchoring.

  20. DISSECTING THE RED SEQUENCE. III. MASS-TO-LIGHT VARIATIONS IN THREE-DIMENSIONAL FUNDAMENTAL PLANE SPACE

    International Nuclear Information System (INIS)

    Graves, Genevieve J.; Faber, S. M.

    2010-01-01

    The fundamental plane (FP) of early-type galaxies is observed to have finite thickness and to be tilted from the virial relation. Both of these represent departures from the simple assumption that dynamical mass-to-light ratios (M dyn /L) are constant for all early-type galaxies. We use a sample of 16,000 quiescent galaxies from the Sloan Digital Sky Survey to map out the variations in M dyn /L throughout the three-dimensional FP space defined by velocity dispersion (σ), effective radius (R e ), and effective surface brightness (I e ). Dividing M dyn /L into multiple components allows us to separately consider the contribution to the observed M dyn /L variation due to stellar population effects, initial mass function (IMF) variations, and variations in the dark matter fraction within one R e . Along the FP, we find that the stellar population contribution given some constant IMF (M *,IMF /L) scales with σ such that M *,IMF /L ∝ f(σ). Meanwhile, the dark matter and/or IMF contribution (M dyn /M *,IMF ) scales with M dyn such that M dyn /M *,IMF ∝ g(M dyn ). This means that the two contributions to the tilt of the FP rotate the plane around different axes in the three-dimensional space. The observed tilt of the FP requires contributions from both, with dark matter/IMF variations likely comprising the dominant contribution. Looking at M dyn /L variations through the thickness of the FP, we find that M dyn /L variations must be dominated either by IMF variations or by real differences in the dark matter fraction with R e . This means that the finite thickness of the FP is due to variations in the stellar mass surface density within R e (Σ *,IMF ), not the fading of passive stellar populations. It therefore represents genuine structural differences between early-type galaxies. These structural variations are correlated with galaxy star formation histories such that galaxies with higher M dyn /M *,IMF have higher [Mg/Fe], lower metallicities, and older mean

  1. Mitochondrial cytochrome b sequence variations and population structure of Siberian chipmunk (Tamias sibiricus) in Northeastern Asia and population substructure in South Korea.

    Science.gov (United States)

    Lee, Mu-Yeong; Lissovsky, Andrey A; Park, Sun-Kyung; Obolenskaya, Ekaterina V; Dokuchaev, Nikolay E; Zhang, Ya-Ping; Yu, Li; Kim, Young-Jun; Voloshina, Inna; Myslenkov, Alexander; Choi, Tae-Young; Min, Mi-Sook; Lee, Hang

    2008-12-31

    Twenty-five chipmunk species occur in the world, of which only the Siberian chipmunk, Tamias sibiricus, inhabits Asia. To investigate mitochondrial cytochrome b sequence variations and population structure of the Siberian chipmunk in northeastern Asia, we examined mitochondrial cytochrome b sequences (1140 bp) from 3 countries. Analyses of 41 individuals from South Korea and 33 individuals from Russia and northeast China resulted in 37 haplotypes and 27 haplotypes, respectively. There were no shared haplotypes between South Korea and Russia--northeast China. Phylogenetic trees and network analysis showed 2 major maternal lineages for haplotypes, referred to as the S and R lineages. Haplotype grouping in each cluster was nearly coincident with its geographic affinity. In particular, 3 distinct groups were found that mostly clustered in the northern, central and southern parts of South Korea. Nucleotide diversity of the S lineage was twice that of lineage R. The divergence between S and R lineages was estimated to be 2.98-0.98 Myr. During the ice age, there may have been at least 2 refuges in South Korea and Russia--northeast China. The sequence variation between the S and R lineages was 11.3% (K2P), which is indicative of specific recognition in rodents. These results suggest that T. sibiricus from South Korea could be considered a separate species. However, additional information, such as details of distribution, nuclear genes data or morphology, is required to strengthen this hypothesis.

  2. Variation of amino acid sequences of serum amyloid a (SAA) and immunohistochemical analysis of amyloid a (AA) in Japanese domestic cats.

    Science.gov (United States)

    Tei, Meina; Uchida, Kazuyuki; Chambers, James K; Watanabe, Ken-Ichi; Tamamoto, Takashi; Ohno, Koichi; Nakayama, Hiroyuki

    2018-02-02

    Amyloid A (AA) amyloidosis, a fatal systemic amyloid disease, occurs secondary to chronic inflammatory conditions in humans. Although persistently elevated serum amyloid A (SAA) levels are required for its pathogenesis, not all individuals with chronic inflammation necessarily develop AA amyloidosis. Furthermore, many diseases in cats are associated with the elevated production of SAA, whereas only a small number actually develop AA amyloidosis. We hypothesized that a genetic mutation in the SAA gene may strongly contribute to the pathogenesis of feline AA amyloidosis. In the present study, genomic DNA from four Japanese domestic cats (JDCs) with AA amyloidosis and from five without amyloidosis was analyzed using polymerase chain reaction (PCR) amplification and direct sequencing. We identified the novel variation combination of 45R-51A in the deduced amino acid sequences of four JDCs with amyloidosis and five without. However, there was no relationship between amino acid variations and the distribution of AA amyloid deposits, indicating that differences in SAA sequences do not contribute to the pathogenesis of AA amyloidosis. Immunohistochemical analysis using antisera against the three different parts of the feline SAA protein-i.e., the N-terminal, central, and C-terminal regions-revealed that feline AA contained the C-terminus, unlike human AA. These results indicate that the cleavage and degradation of the C-terminus are not essential for amyloid fibril formation in JDCs.

  3. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  4. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    Science.gov (United States)

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  5. Survey Sequencing Reveals Elevated DNA Transposon Activity, Novel Elements, and Variation in Repetitive Landscapes among Vesper Bats

    Czech Academy of Sciences Publication Activity Database

    Pagán, H.J.T.; Macas, Jiří; Novák, Petr; McCulloch, E.S.; Stevens, R.D.; Ray, D.A.

    2012-01-01

    Roč. 4, č. 4 (2012), s. 575-585 ISSN 1759-6653 Institutional research plan: CEZ:AV0Z50510513 Keywords : HORIZONTAL TRANSFER * AUTOMATED CLASSIFICATION * GENOME SEQUENCE Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.759, year: 2012

  6. Genetic variation within clonal lineages of Phytophthora infestans revealed through genotyping-by-sequencing, and implications for late blight epidemiology

    Science.gov (United States)

    Genotyping-by-sequencing (GBS) was performed on 257 Phytophthora infestans isolates belonging to four clonal lineages to study within-lineage diversity. The four lineages used in the study included US-8 (n=28), US-11 (n=27), US-23 (n=166), and US-24 (n=36), with isolates originating from 23 of the U...

  7. Genome sequence variation in the constricta strain dramatically alters the protein interaction and localization map of Potato yellow dwarf virus

    Science.gov (United States)

    The genome sequence of the constricta strain of Potato yellow dwarf virus (CYDV) was determined to be 12,792 nucleotides long and organized into seven open reading frames with the gene order 3’-N-X-P-Y-M-G-L-5’, which encodes the nucleocapsid, phosphoprotein, movement, matrix, glycoprotein and RNA-d...

  8. Overview of errors in the reference sequence and annotation of Mycobacterium tuberculosis H37Rv, and variation amongst its isolates

    KAUST Repository

    Kö ser, Claudio U.; Niemann, Stefan; Summers, David K.; Archer, John A.C.

    2012-01-01

    Since its publication in 1998, the genome sequence of the Mycobacterium tuberculosis H37Rv laboratory strain has acted as the cornerstone for the study of tuberculosis. In this review we address some of the practical aspects that have come to light

  9. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    Science.gov (United States)

    Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

    2014-07-04

    Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was

  10. The Pisa pre-main sequence tracks and isochrones. A database covering a wide range of Z, Y, mass, and age values

    Science.gov (United States)

    Tognelli, E.; Prada Moroni, P. G.; Degl'Innocenti, S.

    2011-09-01

    Context. In recent years new observations of pre-main sequence stars (pre-MS) with Z ≤ Z⊙ have been made available. To take full advantage of the continuously growing amount of data of pre-MS stars in different environments, we need to develop updated pre-MS models for a wide range of metallicity to assign reliable ages and masses to the observed stars. Aims: We present updated evolutionary pre-MS models and isochrones for a fine grid of mass, age, metallicity, and helium values. Methods: We use a standard and well-tested stellar evolutionary code (i.e. FRANEC), that adopts outer boundary conditions from detailed and realistic atmosphere models. In this code, we incorporate additional improvements to the physical inputs related to the equation of state and the low temperature radiative opacities essential to computing low-mass stellar models. Results: We make available via internet a large database of pre-MS tracks and isochrones for a wide range of chemical compositions (Z = 0.0002-0.03), masses (M = 0.2-7.0 M⊙), and ages (1-100 Myr) for a solar-calibrated mixing length parameter α (i.e. 1.68). For each chemical composition, additional models were computed with two different mixing length values, namely α = 1.2 and 1.9. Moreover, for Z ≥ 0.008, we also provided models with two different initial deuterium abundances. The characteristics of the models have been discussed in detail and compared with other work in the literature. The main uncertainties affecting theoretical predictions have been critically discussed. Comparisons with selected data indicate that there is close agreement between theory and observation. Tracks and isochrones are available on the web at the http://astro.df.unipi.it/stellar-models/Tracks and isochrones are also available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/533/A109

  11. Sequence stratigraphy in the middle Ordovician shale successions, mid-east Korea: Stratigraphic variations and preservation potential of organic matter within a sequence stratigraphic framework

    Science.gov (United States)

    Byun, Uk Hwan; Lee, Hyun Suk; Kwon, Yi Kyun

    2018-02-01

    The Jigunsan Formation is the middle Ordovician shale-dominated transgressive succession in the Taebaeksan Basin, located in the eastern margin of the North China platform. The total organic carbon (TOC) content and some geochemical properties of the succession exhibit a stratigraphically distinct distribution pattern. The pattern was closely associated with the redox conditions related to decomposition, bulk sedimentation rate (dilution), and productivity. To explain the distinct distribution pattern, this study attempted to construct a high-resolution sequence stratigraphic framework for the Jigunsan Formation. The shale-dominated Jigunsan Formation comprises a lower layer of dark gray shale, deposited during transgression, and an upper layer of greenish gray siltstone, deposited during highstand and falling stage systems tracts. The concept of a back-stepped carbonate platform is adopted to distinguish early and late transgressive systems tracts (early and late TST) in this study, whereas the highstand systems tracts and falling stage systems tracts can be divided by changes in stacking patterns from aggradation to progradation. The late TST would be initiated on a rapidly back-stepping surface of sediments and, just above the surface, exhibits a high peak in TOC content, followed by a gradually upward decrease. This trend of TOC distribution in the late TST continues to the maximum flooding surface (MFS). The perplexing TOC distribution pattern within the late TST most likely resulted from both a gradual reduction in productivity during the late TST and a gradual increase in dilution effect near the MFS interval. The reduced production of organic matter primarily incurred decreasing TOC content toward the MFS when the productivity was mainly governed by benthic biota because planktonic organisms were not widespread in the Ordovician. Results of this study will help improve the understanding of the source rock distribution in mixed carbonate

  12. Variation in the Kozak sequence of WNT16 results in an increased translation and is associated with osteoporosis related parameters

    DEFF Research Database (Denmark)

    Hendrickx, Gretl; Boudin, Eveline; Fijałkowski, Igor

    2014-01-01

    on osteoporosis related parameters. Hereto, we performed a WNT16 candidate gene association study in a population of healthy Caucasian men from the Odense Androgen Study (OAS). Using HapMap, five tagSNPs and one multimarker test were selected for genotyping to cover most of the common genetic variation...

  13. HeurAA: accurate and fast detection of genetic variations with a novel heuristic amplicon aligner program for next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Lőrinc S Pongor

    Full Text Available Next generation sequencing (NGS of PCR amplicons is a standard approach to detect genetic variations in personalized medicine such as cancer diagnostics. Computer programs used in the NGS community often miss insertions and deletions (indels that constitute a large part of known human mutations. We have developed HeurAA, an open source, heuristic amplicon aligner program. We tested the program on simulated datasets as well as experimental data from multiplex sequencing of 40 amplicons in 12 oncogenes collected on a 454 Genome Sequencer from lung cancer cell lines. We found that HeurAA can accurately detect all indels, and is more than an order of magnitude faster than previous programs. HeurAA can compare reads and reference sequences up to several thousand base pairs in length, and it can evaluate data from complex mixtures containing reads of different gene-segments from different samples. HeurAA is written in C and Perl for Linux operating systems, the code and the documentation are available for research applications at http://sourceforge.net/projects/heuraa/

  14. Sequence variation of functional HTLV-II tax alleles among isolates from an endemic population: lack of evidence for oncogenic determinant in tax.

    Science.gov (United States)

    Hjelle, B; Chaney, R

    1992-02-01

    Human T-cell leukemia-lymphoma virus type II (HTLV-II) has been isolated from patients with hairy cell leukemia (HCL). We previously described a population with longstanding endemic HTLV-II infection, and showed that there is no increased risk for HCL in the affected groups. We thus have direct evidence that the endemic form(s) of HTLV-II cause HCL infrequently, if at all. By comparison, there is reason to suspect that the viruses isolated from patients with HCL had an etiologic role in the disease in those patients. One way to reconcile these conflicting observations is to consider that isolates of HTLV-II might differ in oncogenic potential. To determine whether the structure of the putative oncogenic determinant of HTLV-II, tax2, might differ in the new isolates compared to the tax of the prototype HCL isolate, MO, four new functional tax cDNAs were cloned from new isolates. Sequence analysis showed only minor (0.9-2.0%) amino acid variation compared to the published sequence of MO tax2. Some codons were consistently different from published sequences of the MO virus, but in most cases, such variations were also found in each of two tax2 clones we isolated from the MO T-cell line. These variations rendered the new clones more similar to the tax1 of the pathogenic virus HTLV-I. Thus we find no evidence that pathologic determinants of HTLV-II can be assigned to the tax gene.

  15. Inferring contemporary levels of gene flow and demographic history in a local population of the leaf beetle Gonioctena olivacea from mitochondrial DNA sequence variation.

    Science.gov (United States)

    Mardulyn, Patrick; Milinkovitch, Michel C

    2005-05-01

    We have studied mitochondrial DNA variation in a local population of the leaf beetle species Gonioctena olivacea, to check whether its apparent low dispersal behaviour affects its pattern of genetic variation at a small geographical scale. We have sampled 10 populations of G. olivacea within a rectangle of 5 x 2 km in the Belgian Ardennes, as well as five populations located approximately along a straight line of 30 km and separated by distances of 3-12 km. For each sampled individual (8-19 per population), a fragment of the mtDNA control region was polymerase chain reaction-amplified and sequenced. Sequence data were analysed to test whether significant genetic differentiation could be detected among populations separated by such relatively short distances. The reconstructed genealogy of the mitochondrial haplotypes was also used to investigate the demographic history of these populations. Computer simulations of the evolution of populations were conducted to assess the minimum amount of gene flow that is necessary to explain the observed pattern of variation in the samples. Results show that migration among populations included in the rectangle of 5 x 2 km is substantial, and probably involves the occurrence of dispersal flights. This appears difficult to reconcile with the results of a previous ecological field study that concluded that most of this species dispersal occurs by walking. While sufficient migration to homogenize genetic diversity occurs among populations separated by distances of a few hundred metres to a few kilometres, distances greater than 5 km results in contrast in strong differentiation among populations, suggesting that migration is drastically reduced on such distances. Finally, the results of coalescent simulations suggest that the star-like genealogy inferred from the mtDNA sequence data is fully compatible with a past demographic expansion. However, a metapopulation structure alone (without the need to invoke a population expansion

  16. Comprehensive two-dimensional gel protein databases offer a global approach to the analysis of human cells: the transformed amnion cells (AMA) master database and its link to genome DNA sequence data

    DEFF Research Database (Denmark)

    Celis, J E; Gesser, B; Rasmussen, H H

    1990-01-01

    , mitochondria, Golgi, ribosomes, intermediate filaments, microfilaments and microtubules), levels in fetal human tissues, partial protein sequences (containing information on 48 human proteins microsequenced so far), cell cycle-regulated proteins, proteins sensitive to interferons alpha, beta, and gamma, heat...

  17. Dictionary as Database.

    Science.gov (United States)

    Painter, Derrick

    1996-01-01

    Discussion of dictionaries as databases focuses on the digitizing of The Oxford English dictionary (OED) and the use of Standard Generalized Mark-Up Language (SGML). Topics include the creation of a consortium to digitize the OED, document structure, relational databases, text forms, sequence, and discourse. (LRW)

  18. Variation of b and p values from aftershocks sequences along the Mexican subduction zone and their relation to plate characteristics

    Science.gov (United States)

    Ávila-Barrientos, L.; Zúñiga, F. R.; Rodríguez-Pérez, Q.; Guzmán-Speziale, M.

    2015-11-01

    Aftershock sequences along the Mexican subduction margin (between coordinates 110ºW and 91ºW) were analyzed by means of the p value from the Omori-Utsu relation and the b value from the Gutenberg-Richter relation. We focused on recent medium to large (Mw > 5.6) events considered susceptible of generating aftershock sequences suitable for analysis. The main goal was to try to find a possible correlation between aftershock parameters and plate characteristics, such as displacement rate, age and segmentation. The subduction regime of Mexico is one of the most active regions of the world with a high frequency of occurrence of medium to large events and plate characteristics change along the subduction margin. Previous studies have observed differences in seismic source characteristics at the subduction regime, which may indicate a difference in rheology and possible segmentation. The results of the analysis of the aftershock sequences indicate a slight tendency for p values to decrease from west to east with increasing of plate age although a statistical significance is undermined by the small number of aftershocks in the sequences, a particular feature distinctive of the region as compared to other world subduction regimes. The b values show an opposite, increasing trend towards the east even though the statistical significance is not enough to warrant the validation of such a trend. A linear regression between both parameters provides additional support for the inverse relation. Moreover, we calculated the seismic coupling coefficient, showing a direct relation with the p and b values. While we cannot undoubtedly confirm the hypothesis that aftershock generation depends on certain tectonic characteristics (age, thickness, temperature), our results do not reject it thus encouraging further study into this question.

  19. Maternal phylogenetic relationships and genetic variation among Arabian horse populations using whole mitochondrial DNA D-loop sequencing.

    Science.gov (United States)

    Khanshour, Anas M; Cothran, Ernest Gus

    2013-09-13

    Maternal inheritance is an essential point in Arabian horse population genetics and strains classification. The mitochondrial DNA (mtDNA) sequencing is a highly informative tool to investigate maternal lineages. We sequenced the whole mtDNA D-loop of 251 Arabian horses to study the genetic diversity and phylogenetic relationships of Arabian populations and to examine the traditional strain classification system that depends on maternal family lines using native Arabian horses from the Middle East. The variability in the upstream region of the D-loop revealed additional differences among the haplotypes that had identical sequences in the hypervariable region 1 (HVR1). While the American-Arabians showed relatively low diversity, the Syrian population was the most variable and contained a very rare and old haplogroup. The Middle Eastern horses had major genetic contributions to the Western horses and there was no clear pattern of differentiation among all tested populations. Our results also showed that several individuals from different strains shared a single haplotype, and individuals from a single strain were represented in clearly separated haplogroups. The whole mtDNA D-loop sequence was more powerful for analysis of the maternal genetic diversity in the Arabian horses than using just the HVR1. Native populations from the Middle East, such as Syrians, could be suggested as a hot spot of genetic diversity and may help in understanding the evolution history of the Arabian horse breed. Most importantly, there was no evidence that the Arabian horse breed has clear subdivisions depending on the traditional maternal based strain classification system.

  20. Insights into Korean red fox (Vulpes vulpes) based on mitochondrial cytochrome b sequence variation in East Asia.

    Science.gov (United States)

    Yu, Jeong-Nam; Han, Sang-Hoon; Kim, Bang-Hwan; Kryukov, Alexey P; Kim, Soonok; Lee, Byoung-Yoon; Kwak, Myounghai

    2012-11-01

    The red fox (Vulpes vulpes) is the most widely distributed terrestrial carnivore in the world, occurring throughout most of North America, Europe, Asia, and North Africa. In South Korea, however, this species has been drastically reduced due to habitat loss and poaching. Consequently, it is classified as an endangered species in Korea. As a first step of a planned red fox restoration project, preserved red fox museum specimens were used to determine the genetic status of red foxes that had previously inhabited South Korea against red foxes from neighboring countries. Total eighty three mtDNA cytochrome b sequences, including 22 newly obtained East Asian red fox sequences and worldwide red fox sequences from NCBI, were clustered into three clades (i.e., I, II, and III) based on haplotype network and neighbor-joining trees. The mean genetic distance between clades was 2.0%. Clade III contained South Korean and other East Asian samples in addition to Eurasian and North Pacific individuals. In clade III, South Korean individuals were separated into two lineages of Eurasian and North Pacific groups, showing unclear phylogeographic structuring and admixture. This suggests that South Korean red fox populations may have been composed of individuals from these two different genetic lineages.

  1. The Saccharomyces Genome Database Variant Viewer.

    Science.gov (United States)

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Candidate genes revealed by a genome scan for mosquito resistance to a bacterial insecticide: sequence and gene expression variations

    Directory of Open Access Journals (Sweden)

    David Jean-Philippe

    2009-11-01

    Full Text Available Abstract Background Genome scans are becoming an increasingly popular approach to study the genetic basis of adaptation and speciation, but on their own, they are often helpless at identifying the specific gene(s or mutation(s targeted by selection. This shortcoming is hopefully bound to disappear in the near future, thanks to the wealth of new genomic resources that are currently being developed for many species. In this article, we provide a foretaste of this exciting new era by conducting a genome scan in the mosquito Aedes aegypti with the aim to look for candidate genes involved in resistance to Bacillus thuringiensis subsp. israelensis (Bti insecticidal toxins. Results The genome of a Bti-resistant and a Bti-susceptible strains was surveyed using about 500 MITE-based molecular markers, and the loci showing the highest inter-strain genetic differentiation were sequenced and mapped on the Aedes aegypti genome sequence. Several good candidate genes for Bti-resistance were identified in the vicinity of these highly differentiated markers. Two of them, coding for a cadherin and a leucine aminopeptidase, were further examined at the sequence and gene expression levels. In the resistant strain, the cadherin gene displayed patterns of nucleotide polymorphisms consistent with the action of positive selection (e.g. an excess of high compared to intermediate frequency mutations, as well as a significant under-expression compared to the susceptible strain. Conclusion Both sequence and gene expression analyses agree to suggest a role for positive selection in the evolution of this cadherin gene in the resistant strain. However, it is unlikely that resistance to Bti is conferred by this gene alone, and further investigation will be needed to characterize other genes significantly associated with Bti resistance in Ae. aegypti. Beyond these results, this article illustrates how genome scans can build on the body of new genomic information (here, full

  3. Variation in the number of nucleoli and incomplete homogenization of 18S ribosomal DNA sequences in leaf cells of the cultivated Oriental ginseng (Panax ginseng Meyer).

    Science.gov (United States)

    Chelomina, Galina N; Rozhkovan, Konstantin V; Voronova, Anastasia N; Burundukova, Olga L; Muzarok, Tamara I; Zhuravlev, Yuri N

    2016-04-01

    Wild ginseng, Panax ginseng Meyer, is an endangered species of medicinal plants. In the present study, we analyzed variations within the ribosomal DNA (rDNA) cluster to gain insight into the genetic diversity of the Oriental ginseng, P. ginseng, at artificial plant cultivation. The roots of wild P. ginseng plants were sampled from a nonprotected natural population of the Russian Far East. The slides were prepared from leaf tissues using the squash technique for cytogenetic analysis. The 18S rDNA sequences were cloned and sequenced. The distribution of nucleotide diversity, recombination events, and interspecific phylogenies for the total 18S rDNA sequence data set was also examined. In mesophyll cells, mononucleolar nuclei were estimated to be dominant (75.7%), while the remaining nuclei contained two to four nucleoli. Among the analyzed 18S rDNA clones, 20% were identical to the 18S rDNA sequence of P. ginseng from Japan, and other clones differed in one to six substitutions. The nucleotide polymorphism was more expressed at the positions 440-640 bp, and distributed in variable regions, expansion segments, and conservative elements of core structure. The phylogenetic analysis confirmed conspecificity of ginseng plants cultivated in different regions, with two fixed mutations between P. ginseng and other species. This study identified the evidences of the intragenomic nucleotide polymorphism in the 18S rDNA sequences of P. ginseng. These data suggest that, in cultivated plants, the observed genome instability may influence the synthesis of biologically active compounds, which are widely used in traditional medicine.

  4. Population structure of African buffalo inferred from mtDNA sequences and microsatellite loci: high variation but low differentiation

    DEFF Research Database (Denmark)

    Simonsen, Bo Thisted; Siegismund, H R; Arctander, P

    1998-01-01

    The African buffalo (Syncerus caffer) is widespread throughout sub-Saharan Africa and is found in most major vegetation types, wherever permanent sources of water are available, making it physically able to disperse through a wide range of habitats. Despite this, the buffalo has been assumed...... and analysis of variation at six microsatellite loci among 11 localities in eastern and southern Africa. High levels of genetic variability were found, suggesting that reported severe population bottlenecks due to outbreak of rinderpest during the last century did not strongly reduce the genetic variability...

  5. Characterization of expressed sequence tags from developing fibers of Gossypium barbadense and evaluation of insertion-deletion variation in tetraploid cultivated cotton species.

    Science.gov (United States)

    Lv, Yuanda; Zhao, Liang; Xu, Xiaoyang; Wang, Lei; Wang, Cheng; Zhang, Tianzhen; Guo, Wangzhen

    2013-03-13

    Cotton is the leading fiber crop worldwide. Gossypium barbadense is an important species of cotton because of its extra-long staple fibers with superior luster and silkiness. However, a systematic analysis and utilization of cDNA sequences from G. barbadense fiber development remains understudied. A total of 21,079 high quality sequences were generated from two non-normalized cDNA libraries prepared by using a mixture of G. barbadense Hai7124 fibers and ovules. After assembly processing, a set of 8,653 unigenes were obtained. Of those, 7,786 were matched to known proteins and 7,316 were assigned to functional categories. The molecular functions of these unigenes were mostly related to binding and catalytic activity, and carbohydrate, amino acid, and energy metabolisms were major contributors among the subsets of metabolism. Sequences comparison between G. barbadense and G. hirsutum revealed that 8,245 unigenes from G. barbadense were detected the similarity with those released publicly in G. hirsutum, however, the remaining 408 sequences had no hits against G. hirsutum unigenes database. Furthermore, 13,275 putative ESTs InDels loci involved in the orthologous and/or homoeologous differences between/within G. barbadense and G. hirsutum were discovered by in silico analyses, and 2,160 InDel markers were developed by ESTs with more than five insertions or deletions. By gel electrophoresis combined with sequencing verification, 71.11% candidate InDel loci were reconfirmed orthologous and/or homoeologous loci polymorphisms using G. hirsutum acc TM-1 and G. barbadense cv Hai7124. Blastx result showed among 2,160 InDel loci, 81 with significant function similarity with known genes associated with secondary wall synthesis process, indicating the important roles in fiber quality in tetraploid cultivated cotton species. Sequence comparisons and InDel markers development will lay the groundwork for promoting the identification of genes related to superior agronomic traits

  6. Genetic variation and phylogenetic relationship analysis of Jatropha curcas L. inferred from nrDNA ITS sequences.

    Science.gov (United States)

    Guo, Guo-Ye; Chen, Fang; Shi, Xiao-Dong; Tian, Yin-Shuai; Yu, Mao-Qun; Han, Xue-Qin; Yuan, Li-Chun; Zhang, Ying

    2016-01-01

    Genetic variation and phylogenetic relationships among 102 Jatropha curcas accessions from Asia, Africa, and the Americas were assessed using the internal transcribed spacer region of nuclear ribosomal DNA (nrDNA ITS). The average G+C content (65.04%) was considerably higher than the A+T (34.96%) content. The estimated genetic diversity revealed moderate genetic variation. The pairwise genetic divergences (GD) between haplotypes were evaluated and ranged from 0.000 to 0.017, suggesting a higher level of genetic differentiation in Mexican accessions than those of other regions. Phylogenetic relationships and intraspecific divergence were inferred by Bayesian inference (BI), maximum parsimony (MP), and median joining (MJ) network analysis and were generally resolved. The J. curcas accessions were consistently divided into three lineages, groups A, B, and C, which demonstrated distant geographical isolation and genetic divergence between American accessions and those from other regions. The MJ network analysis confirmed that Central America was the possible center of origin. The putative migration route suggested that J. curcas was distributed from Mexico or Brazil, via Cape Verde and then split into two routes. One route was dispersed to Spain, then migrated to China, eventually spreading to southeastern Asia, while the other route was dispersed to Africa, via Madagascar and migrated to China, later spreading to southeastern Asia. Copyright © 2016 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  7. tRNA sequence data, annotation data and curation data - tRNADB-CE | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us tRNAD... tRNA sequence data, annotation data and curation data - tRNADB-CE | LSDB Archive ...

  8. Identifying Rare Variation in Cases of Schizophrenia in the Isolated Population of the Faroe Islands using Whole-genome Sequencing

    DEFF Research Database (Denmark)

    Als, Thomas Damm; Lescai, Francesco; Dahl, Hans

    to map risk variants involved in complex traits. We aim at utilizing samples of cases and controls of the isolated population of the Faroe Islands to conduct whole-genome-sequence analysis in order to identify rare genetic variants associated with schizophrenia. We will search for rare genetic variants...... of developing SZ. However, these studies are designed to examining only “the common variant” proportion of the genomic landscape of SZ. Due to increased genetic drift during founding and potential bottlenecks, followed by population expansion, isolated populations may be particularly useful in identifying rare...... disease variants, that may appear at higher frequencies and/or within a more clearly distinct haplotype structure compared to outbred populations. Small isolated populations also typically show reduced phenotypic, genetic and environmental heterogeneity, thus making them advantageous in studies aiming...

  9. Multilocus Sequence Typing Reveals Relevant Genetic Variation and Different Evolutionary Dynamics among Strains of Xanthomonas arboricola pv. juglandis

    Directory of Open Access Journals (Sweden)

    Marco Scortichini

    2010-11-01

    Full Text Available Forty-five Xanthomonas arboricola pv. juglandis (Xaj strains originating from Juglans regia cultivation in different countries were molecularly typed by means of MultiLocus Sequence Typing (MLST, using acnB, gapA, gyrB and rpoD gene fragments. A total of 2.5 kilobases was used to infer the phylogenetic relationship among the strains and possible recombination events. Haplotype diversity, linkage disequilibrium analysis, selection tests, gene flow estimates and codon adaptation index were also assessed. The dendrograms built by maximum likelihood with concatenated nucleotide and amino acid sequences revealed two major and two minor phylotypes. The same haplotype was found in strains originating from different continents, and different haplotypes were found in strains isolated in the same year from the same location. A recombination breakpoint was detected within the rpoD gene fragment. At the pathovar level, the Xaj populations studied here are clonal and under neutral selection. However, four Xaj strains isolated from walnut fruits with apical necrosis are under diversifying selection, suggesting a possible new adaptation. Gene flow estimates do not support the hypothesis of geographic isolation of the strains, even though the genetic diversity between the strains increases as the geographic distance between them increases. A triplet deletion, causing the absence of valine, was found in the rpoD fragment of all 45 Xaj strains when compared with X. axonopodis pv. citri strain 306. The codon adaptation index was high in all four genes studied, indicating a relevant metabolic activity.

  10. Lactobacillus kefiri shows inter-strain variations in the amino acid sequence of the S-layer proteins.

    Science.gov (United States)

    Malamud, Mariano; Carasi, Paula; Bronsoms, Sílvia; Trejo, Sebastián A; Serradell, María de Los Angeles

    2017-04-01

    The S-layer is a proteinaceous envelope constituted by subunits that self-assemble to form a two-dimensional lattice that covers the surface of different species of Bacteria and Archaea, and it could be involved in cell recognition of microbes among other several distinct functions. In this work, both proteomic and genomic approaches were used to gain knowledge about the sequences of the S-layer protein (SLPs) encoding genes expressed by six aggregative and sixteen non-aggregative strains of potentially probiotic Lactobacillus kefiri. Peptide mass fingerprint (PMF) analysis confirmed the identity of SLPs extracted from L. kefiri, and based on the homology with phylogenetically related species, primers located outside and inside the SLP-genes were employed to amplify genomic DNA. The O-glycosylation site SASSAS was found in all L. kefiri SLPs. Ten strains were selected for sequencing of the complete genes. The total length of the mature proteins varies from 492 to 576 amino acids, and all SLPs have a calculated pI between 9.37 and 9.60. The N-terminal region is relatively conserved and shows a high percentage of positively charged amino acids. Major differences among strains are found in the C-terminal region. Different groups could be distinguished regarding the mature SLPs and the similarities observed in the PMF spectra. Interestingly, SLPs of the aggregative strains are 100% homologous, although these strains were isolated from different kefir grains. This knowledge provides relevant data for better understanding of the mechanisms involved in SLPs functionality and could contribute to the development of products of biotechnological interest from potentially probiotic bacteria.

  11. Analysis of genetic variation and phylogeny of the predatory bug, Pilophorus typicus, in Japan using mitochondrial gene sequences.

    Science.gov (United States)

    Ito, Katsura; Nishikawa, Hiroshi; Shimada, Takuji; Ogawa, Kohei; Minamiya, Yukio; Tomoda, Masafumi; Nakahira, Kengo; Kodama, Rika; Fukuda, Tatsuya; Arakawa, Ryo

    2011-01-01

    Pilophorus typicus (Distant) (Heteroptera: Miridae) is a predatory bug occurring in East, Southeast, and South Asia. Because the active stages of P. typicus prey on various agricultural pest insects and mites, this species is a candidate insect as an indigenous natural enemy for use in biological control programs. However, the mass releasing of introduced natural enemies into agricultural fields may incur the risk of affecting the genetic integrity of species through hybridization with a local population. To clarify the genetic characteristics of the Japanese populations of P. typicus two portions of the mitochondrial DNA, the cytochrome oxidase subunit I (COI) (534 bp) and the cytochrome B (cytB) (217 bp) genes, were sequenced for 64 individuals collected from 55 localities in a wide range of Japan. Totals of 18 and 10 haplotypes were identified for the COI and cytB sequences, respectively (25 haplotypes over regions). Phylogenetic analysis using the maximum likelihood method revealed the existence of two genetically distinct groups in P. typicus in Japan. These groups were distributed in different geographic ranges: one occurred mainly from the Pacific coastal areas of the Kii Peninsula, the Shikoku Island, and the Ryukyu Islands; whereas the other occurred from the northern Kyushu district to the Kanto and Hokuriku districts of mainland Japan. However, both haplotypes were found in a single locality of the southern coast of the Shikoku Island. COI phylogeny incorporating other Pilophorus species revealed that these groups were only recently differentiated. Therefore, use of a certain population of P. typicus across its distribution range should be done with caution because genetic hybridization may occur.

  12. Sequence variation in the melanocortin-1 receptor (MC1R pigmentation gene and its role in the cryptic coloration of two South American sand lizards

    Directory of Open Access Journals (Sweden)

    Josmael Corso

    2012-01-01

    Full Text Available In reptiles, dorsal body darkness often varies with substrate color or temperature environment, and is generally presumed to be an adaptation for crypsis or thermoregulation. However, the genetic basis of pigmentation is poorly known in this group. In this study we analyzed the coding region of the melanocortin-1-receptor (MC1R gene, and therefore its role underlying the dorsal color variation in two sympatric species of sand lizards (Liolaemus that inhabit the southeastern coast of South America: L. occipitalis and L. arambarensis. The first is light-colored and occupies aeolic pale sand dunes, while the second is brownish and lives in a darker sandy habitat. We sequenced 630 base pairs of MC1R in both species. In total, 12 nucleotide polymorphisms were observed, and four amino acid replacement sites, but none of them could be associated with a color pattern. Comparative analysis indicated that these taxa are monomorphic for amino acid sites that were previously identified as functionally important in other reptiles. Thus, our results indicate that MC1R is not involved in the pigmentation pattern observed in Liolaemus lizards. Therefore, structural differences in other genes, such as ASIP, or variation in regulatory regions of MC1R may be responsible for this variation. Alternatively, the phenotypic differences observed might be a consequence of non-genetic factors, such as thermoregulatory mechanisms.

  13. Variations in clinicopathologic characteristics of thyroid cancer among racial ethnic groups: analysis of a large public city hospital and the SEER database.

    Science.gov (United States)

    Moo-Young, Tricia A; Panergo, Jessel; Wang, Chih E; Patel, Subhash; Duh, Hong Yan; Winchester, David J; Prinz, Richard A; Fogelfeld, Leon

    2013-11-01

    Clinicopathologic variables influence the treatment and prognosis of patients with thyroid cancer. A retrospective analysis of public hospital thyroid cancer database and the Surveillance, Epidemiology and End Results 17 database was conducted. Demographic, clinical, and pathologic data were compared across ethnic groups. Within the public hospital database, Hispanics versus non-Hispanic whites were younger and had more lymph node involvement (34% vs 17%, P ethnic groups. Similar findings were demonstrated within the Surveillance, Epidemiology and End Results database. African Americans aged ethnic groups. Such disparities persist within an equal-access health care system. These findings suggest that factors beyond socioeconomics may contribute to such differences. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Database Description - KAIKOcDNA | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us KAIKOcDNA Database Description General information of database Database name KAIKOcDNA Alter...National Institute of Agrobiological Sciences Akiya Jouraku E-mail : Database cla...ssification Nucleotide Sequence Databases Organism Taxonomy Name: Bombyx mori Taxonomy ID: 7091 Database des...rnal: G3 (Bethesda) / 2013, Sep / vol.9 External Links: Original website information Database maintenance si...available URL of Web services - Need for user registration Not available About This Database Database

  15. Sequencing of DC-SIGN promoter indicates an association between promoter variation and risk of nasopharyngeal carcinoma in cantonese

    Directory of Open Access Journals (Sweden)

    Liu Wen-Sheng

    2010-11-01

    Full Text Available Abstract Background The dendritic cell-specific intercellular adhesion molecule 3 grabbing non-integrin (DC-SIGN is an important pathogen recognition receptor of the innate immune system. DC-SIGN promoter variants play important role in the susceptibility to various infectious diseases. Nasopharyngeal carcinoma (NPC is a malignancy that is common in southern China and whether DC-SIGN promoter variants have effects on susceptibility to NPC is still unknown. The aim of this study is to ascertain the potential involvement of DC-SIGN promoter single nucleotide polymorphisms (SNPs in NPC susceptibility. Methods We conducted a case control study based on Cantonese population including 444 NPC patients and 464 controls matched on age and sex. The 1041 bp of DC-SIGN promoter region was directly sequenced for all samples. Sequence alignment and SNP search were inspected using DNAStar analysis programs and haplotype frequencies were estimated in Haploview V 4.0. The associations between the SNPs and the risk of NPC were analyzed using chi-square test and non-conditional logistic regression analysis with SPSS 13.0 software. Results A total of six variants were observed in the DC-SIGN promoter region and DC-SIGN -139 GG and -939 AA were significantly associated with NPC risk with adjusted Odds Ratios (ORs of 2.10 (95% confidence interval [CI] = 1.23-3.59; P = 0.006 and 2.52 (1.29-4.93; P = 0.007 respectively and subjects carrying the risk allele DC-SIGN -871 G had 1.47-fold (95% CI = 1.14-1.90 increased risks of developing NPC (P = 0.003. Haplotype analysis revealed that h1 'AAAG' was significantly associated with protection against NPC (OR = 0.69; P = 0.0002 and the association was still significant when using 1000 permutation test runs (P = 0.001. Conclusions Our study indicated that DC-SIGN promoter variants appear to be involved in the susceptibility to NPC and the detailed mechanism of this effect need further studies.

  16. [The LMP1 oncogene sequence variations in patients with oral tumours associated or not associated with the Epstein Barr].

    Science.gov (United States)

    Gurtsevitch, V E; Iakovleva, L S; Shcherbak, L N; Goncharova, E V; Smirnova, K V; Diduk, S V; Kondratova, V N; Maksimovich, D M; Lichtenstein, A V; Seniuta, N B

    2013-01-01

    The role of the Epstein-Barr virus (EBV), a ubiquitous lymphotropic human herpesvirus type 4, in the etiology of nasopharyngeal carcinoma (NPC) is not fully understood. The mechanism of NPC carcinogenesis, associated with the virus, is also not clear. The objective of present investigation was to carry out comparative analysis of the structure of an LMP1 oncogene of EBV in viral isolates obtained from patients with two types of tumors of the oral cavity: (a) associated (i.e., NPC) and (b) not associated (other tumors of the same anatomical region, OTOC) with EBV. Comparative analysis of C-terminal regions of LMP1 variants that was based on a sequence analysis of LMP1 from tumor, blood and throat washing samples of NPC and OTOC patients showed that all structural characteristics of LMP1 in both groups of patients were genetically similar, and differences found between compared parameters were statistically insignificant. Thus, for the first time it has been revealed that in NPC and OTOC patients in Russia genetically related EBV strains with structurally similar LMP1 variants are persisting that are likely to reflect a polymorphism of the virus circulating in population. The findings allow us to suggest that in non-NPC-endemic regions of the world, which include Russia, the risk of NPC development does not depend on the EBVstrain and its variant of LMP1 so much, but mostly from the genetic predisposition of infected persons to the disease and the exposure to other, as yet unknown agents.

  17. A new paradigm in toxicology and teratology: altering gene activity in the absence of DNA sequence variation.

    Science.gov (United States)

    Reamon-Buettner, Stella Marie; Borlak, Jürgen

    2007-07-01

    'Epigenetics' is a heritable phenomenon without change in primary DNA sequence. In recent years, this field has attracted much attention as more epigenetic controls of gene activities are being discovered. Such epigenetic controls ensue from an interplay of DNA methylation, histone modifications, and RNA-mediated pathways from non-coding RNAs, notably silencing RNA (siRNA) and microRNA (miRNA). Although epigenetic regulation is inherent to normal development and differentiation, this can be misdirected leading to a number of diseases including cancer. All the same, many of the processes can be reversed offering a hope for epigenetic therapies such as inhibitors of enzymes controlling epigenetic modifications, specifically DNA methyltransferases, histone deacetylases, and RNAi therapeutics. 'In utero' or early life exposures to dietary and environmental exposures can have a profound effect on our epigenetic code, the so-called 'epigenome', resulting in birth defects and diseases developed later in life. Indeed, examples are accumulating in which environmental exposures can be attributed to epigenetic causes, an encouraging edge towards greater understanding of the contribution of epigenetic influences of environmental exposures. Routine analysis of epigenetic modifications as part of the mechanisms of action of environmental contaminants is in order. There is, however, an explosion of research in the field of epigenetics and to keep abreast of these developments could be a challenge. In this paper, we provide an overview of epigenetic mechanisms focusing on recent reviews and studies to serve as an entry point into the realm of 'environmental epigenetics'.

  18. 15N/14N variations in Cretaceous Atlantic sedimentary sequences: implication for past changes in marine nitrogen biogeochemistry

    Science.gov (United States)

    Rau, G.H.; Arthur, M.A.; Dean, W.E.

    1987-01-01

    At two locations in the Atlantic Ocean (DSDP Sites 367 and 530) early to middle Cretaceous organic-carbon-rich beds ("black shales") were found to have significantly lower ??15N values (lower 15N/14N ratios) than adjacent organic-carbon-poor beds (white limestones or green claystones). While these lithologies are of marine origin, the black strata in particular have ??15N values that are significantly lower than those previously found in the marine sediment record and most contemporary marine nitrogen pools. In contrast, black, organic-carbon-rich beds at a third site (DSDP Site 603) contain predominantly terrestrial organic matter and have C- and N-isotopic compositions similar to organic matter of modern terrestrial origin. The recurring 15N depletion in the marine-derived Cretaceous sequences prove that the nitrogen they contain is the end result of an episodic and atypical biogeochemistry. Existing isotopic and other data indicate that the low 15N relative abundance is the consequence of pelagic rather than post-depositional processes. Reduced ocean circulation, increased denitrification, and, hence, reduced euphotic zone nitrate availability may have led to Cretaceous phytoplankton assemblages that were periodically dominated by N2-fixing blue-green algae, a possible source of this sediment 15N-depletion. Lack of parallel isotopic shifts in Cretaceous terrestrially-derived nitrogen (Site 603) argues that the above change in nitrogen cycling during this period did not extend beyond the marine environment. ?? 1987.

  19. Reproductive biology and variation of nuclear ribosomal ITS and ETS sequences in the Calligonum mongolicum complex (Polygonaceae

    Directory of Open Access Journals (Sweden)

    Wei Shi

    2017-01-01

    Full Text Available To explore the biosystematics of the Calligonum mongolicum complex (Polygonaceae, the flowering phenological period, breeding and pollination characters and seed set of the complex (C. Mongolicum Turze, C. chinense A. Los., C. gobicum A. Los., C. pumilum A. Los. and C. zaidamense A. Los. were documented in the Turpan Eremophyte Botanical Garden, China. The sequences of the nuclear ribosomal ITS and ETS region were employed to differentiate the C. mongolicum complex and other species in sect. Medusae. The results showed species of the C. mongolicum complex occupied overlapping flowering periods and had consistent pollination agents. Their breeding systems are all self-compatible, tend to be out-crossing and they interbreed amongst each other (out-crossing index, OCI = 4.The crosses within and amongst species had high seed sets (44 - 65%. Phylogenetic analyses of Calligonum sect. Medusae and the network analysis of nrDNA (ITS and ETS in the complex suggest interbreeding amongst “species” within the complex and provide evidence for taxonomically merging the five species in the complex. The detected hybridisation, occurring within the complex, suggests the need to improve traditional methods of ex situ plant conservation in botanical gardens for maintaining genetic diversity of Calligonum within and amongst species from different geographic areas.

  20. Phylogeographical structure inferred from cpDNA sequence variation of Zygophyllum xanthoxylon across north-west China.

    Science.gov (United States)

    Shi, Xiao-Jun; Zhang, Ming-Li

    2015-03-01

    Zygophyllum xanthoxylon, a desert species, displaying a broad east-west continuous distribution pattern in arid Northwestern China, can be considered as a model species to investigate the biogeographical history of this region. We sequenced two chloroplast DNA spacers (psbK-psbI and rpl32-trnL) in 226 individuals from 31 populations to explore the phylogeographical structure. Median-joining network was constructed and analysis of AMOVA, SMOVA, neutrality tests and distribution analysis were used to examine genetic structure and potential range expansion. Using species distribution modeling, the geographical distribution of Z. xanthoxylon was modeled during the present and at the Last Glacial Maximum (LGM). Among 26 haplotypes, one was widely distributed, but most was restricted to either the eastern or western region. The populations with the highest levels of haplotype diversity were found in the Tianshan Mountains and its surroundings in the west, and the Helan Mountains and Alxa Plateau in the east. AMOVA and SAMOVA showed that over all populations, the species lacks phylogeographical structure, which is speculated to be the result of its specific biology. Neutrality tests and mismatch distribution analysis support past range expansions of the species. Comparing the current distribution to those cold and dry conditions in LGM, Z. xanthoxylon had a shrunken and more fragmented range during LGM. Based on the evidences from phylogeographical patterns, distribution of genetic variability, and paleodistribution modeling, Z. xanthoxylon is speculated most likely to have originated from the east and migrated westward via the Hexi Corridor.

  1. Use of next-generation sequencing to detect LDLR gene copy number variation in familial hypercholesterolemia[S

    Science.gov (United States)

    Iacocca, Michael A.; Wang, Jian; Dron, Jacqueline S.; Robinson, John F.; McIntyre, Adam D.; Cao, Henian

    2017-01-01

    Familial hypercholesterolemia (FH) is a heritable condition of severely elevated LDL cholesterol, caused predominantly by autosomal codominant mutations in the LDL receptor gene (LDLR). In providing a molecular diagnosis for FH, the current procedure often includes targeted next-generation sequencing (NGS) panels for the detection of small-scale DNA variants, followed by multiplex ligation-dependent probe amplification (MLPA) in LDLR for the detection of whole-exon copy number variants (CNVs). The latter is essential because ∼10% of FH cases are attributed to CNVs in LDLR; accounting for them decreases false negative findings. Here, we determined the potential of replacing MLPA with bioinformatic analysis applied to NGS data, which uses depth-of-coverage analysis as its principal method to identify whole-exon CNV events. In analysis of 388 FH patient samples, there was 100% concordance in LDLR CNV detection between these two methods: 38 reported CNVs identified by MLPA were also successfully detected by our NGS method, while 350 samples negative for CNVs by MLPA were also negative by NGS. This result suggests that MLPA can be removed from the routine diagnostic screening for FH, significantly reducing associated costs, resources, and analysis time, while promoting more widespread assessment of this important class of mutations across diagnostic laboratories. PMID:28874442

  2. cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

    Science.gov (United States)

    Klambauer, Günter; Schwarzbauer, Karin; Mayr, Andreas; Clevert, Djork-Arné; Mitterecker, Andreas; Bodenhofer, Ulrich; Hochreiter, Sepp

    2012-05-01

    Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.

  3. Genetic variation and differentiation of Gekko gecko from different populations based on mitochondrial cytochrome b gene sequences and karyotypes.

    Science.gov (United States)

    Qin, Xin-Min; Li, Hui-Min; Zeng, Zhen-Hua; Zeng, De-Long; Guan, Qing-Xin

    2012-06-01

    Black-spotted and red-spotted tokay geckos are distributed in different regions and have significant differences in morphological appearance, but have been regarded as the same species, Gekko gecko, in taxonomy. To determine whether black-spotted and red-spotted tokay geckos are genetically differentiated, we sequenced the entire mitochondrial cytochrome b gene (1147 bp) from 110 individuals of Gekko gecko collected in 11 areas including Guangxi China, Yunnan China, Vietnam, and Laos. In addition, we performed karyotypic analyses of black-spotted tokay geckos from Guangxi China and red-spotted tokay geckos from Laos. These phylogenetic analyses showed that black-spotted and red-spotted tokay geckos are divided into two branches in molecular phylogenetic trees. The average genetic distances are as follows: 0.12-0.47% among six haplotypes in the black-spotted tokay gecko group, 0.12-1.66% among five haplotypes in the red-spotted tokay gecko group, and 8.76-9.18% between the black-spotted and red-spotted tokay geckos, respectively. The karyotypic analyses showed that the karyotype formula is 2n = 38 = 8m + 2sm + 2st + 26t in red-spotted tokay geckos from Laos compared with 2n = 38 = 8m + 2sm + 28t in black-spotted tokay geckos from Guangxi China. The differences in these two kinds of karyotypes were detected on the 15th chromosome. The clear differences in genetic levels between black-spotted and red-spotted tokay geckos suggest a significant level of genetic differentiation between the two.

  4. Genetic variation in Rhodomyrtus tomentosa (Kemunting) populations from Malaysia as revealed by inter-simple sequence repeat markers.

    Science.gov (United States)

    Hue, T S; Abdullah, T L; Abdullah, N A P; Sinniah, U R

    2015-12-14

    Kemunting (Rhodomyrtus tomentosa) from the Myrtaceae family, is native to Malaysia. It is widely used in traditional medicine to treat various illnesses and possesses significant antibacterial properties. In addition, it has great potential as ornamental in landscape design. Genetic variability studies are important for the rational management and conservation of genetic material. In the present study, inter-simple sequence repeat markers were used to assess the genetic diversity of 18 R. tomentosa populations collected from ten states of Peninsular Malaysia. The 11 primers selected generated 173 bands that ranged in size from 1.6 kb to 130 bp, which corresponded to an average of 15.73 bands per primer. Of these bands, 97.69% (169 in total) were polymorphic. High genetic diversity was documented at the species level (H(T) = 0.2705; I = 0.3973; PPB = 97.69%) but there was a low diversity at population level (H(S) = 0.0073; I = 0 .1085; PPB = 20.14%). The high level of genetic differentiation revealed by G(ST) (73%) and analysis of molecular variance (63%), together with the limited gene flow among population (N(m) = 0.1851), suggests that the populations examined are isolated. Results from an unweighted pair group method with arithmetic mean dendrogram and principal coordinate analysis clearly grouped the populations into two geographic groups. This clear grouping can also be demonstrated by the significant Mantel test (r = 0.581, P = 0.001). We recommend that all the R. tomentosa populations be preserved in conservation program.

  5. Associations of prodynorphin sequence variation with alcohol dependence and related traits are phenotype-specific and sex-dependent.

    Science.gov (United States)

    Winham, Stacey J; Preuss, Ulrich W; Geske, Jennifer R; Zill, Peter; Heit, John A; Bakalkin, Georgy; Biernacka, Joanna M; Karpyak, Victor M

    2015-10-27

    We previously demonstrated that prodynorphin (PDYN) haplotypes and single nucleotide polymorphism (SNP) rs2281285 are associated with alcohol dependence and the propensity to drink in negative emotional states, and recent studies suggest that PDYN gene effects on substance dependence risk may be sex-related. We examined sex-dependent associations of PDYN variation with alcohol dependence and related phenotypes, including negative craving, time until relapse after treatment and the length of sobriety episodes before seeking treatment, in discovery and validation cohorts of European ancestry. We found a significant haplotype-by-sex interaction (p  =  0.03), suggesting association with alcohol dependence in males (p = 1E-4) but not females. The rs2281285 G allele increased risk for alcohol dependence in males in the discovery cohort (OR = 1.49, p = 0.002), with a similar trend in the validation cohort (OR = 1.35, p = 0.086). However, rs2281285 showed a trend towards association with increased negative craving in females in both the discovery (beta = 10.16, p = 0.045) and validation samples (OR = 7.11, p = 0.066). In the discovery cohort, rs2281285 was associated with time until relapse after treatment in females (HR = 1.72, p = 0.037); in the validation cohort, it was associated with increased length of sobriety episodes before treatment in males (beta = 13.49, p = 0.001). Our findings suggest that sex-dependent effects of PDYN variants in alcohol dependence are phenotype-specific.

  6. Inferring Genetic Variation and Demographic History of Michelia yunnanensis Franch. (Magnoliaceae from Chloroplast DNA Sequences and Microsatellite Markers

    Directory of Open Access Journals (Sweden)

    Shikang Shen

    2017-04-01

    Full Text Available Michelia yunnanensis Franch., is a traditional ornamental, aromatic, and medicinal shrub that endemic to Yunnan Province in southwest China. Although the species has a large distribution pattern and is abundant in Yunnan Province, the populations are dramatically declining because of overexploitation and habitat destruction. Studies on the genetic variation and demography of endemic species are necessary to develop effective conservation and management strategies. To generate such knowledge, we used 3 pairs of universal cpDNA markers and 10 pairs of microsatellite markers to assess the genetic diversity, genetic structure, and demographic history of 7 M. yunnanensis populations. We calculated a total of 88 alleles for 10 polymorphic loci and 10 haplotypes for a combined 2,089 bp of cpDNA. M. yunnanensis populations showed high genetic diversity (Ho = 0.551 for nuclear markers and Hd = 0.471 for cpDNA markers and low genetic differentiation (FST = 0.058. Geographical structure was not found among M. yunnanensis populations. Genetic distance and geographic distance were not correlated (P > 0.05, which indicated that geographic isolation is not the primary cause of the low genetic differentiation of M. yunnanensis. Additionally, M. yunnanensis populations contracted ~20,000–30,000 years ago, and no recent expansion occurred in current populations. Results indicated that the high genetic diversity of the species and within its populations holds promise for effective genetic resource management and sustainable utilization. Thus, we suggest that the conservation and management of M. yunnanensis should address exotic overexploitation and habitat destruction.

  7. Exercise-Induced Rhabdomyolysis and Stress-Induced Malignant Hyperthermia Events, Association with Malignant Hyperthermia Susceptibility, and RYR1 Gene Sequence Variations

    Directory of Open Access Journals (Sweden)

    Antonella Carsana

    2013-01-01

    Full Text Available Exertional rhabdomyolysis (ER and stress-induced malignant hyperthermia (MH events are syndromes that primarily afflict military recruits in basic training and athletes. Events similar to those occurring in ER and in stress-induced MH events are triggered after exposure to anesthetic agents in MH-susceptible (MHS patients. MH is an autosomal dominant hypermetabolic condition that occurs in genetically predisposed subjects during general anesthesia, induced by commonly used volatile anesthetics and/or the neuromuscular blocking agent succinylcholine. Triggering agents cause an altered intracellular calcium regulation. Mutations in RYR1 gene have been found in about 70% of MH families. The RYR1 gene encodes the skeletal muscle calcium release channel of the sarcoplasmic reticulum, commonly known as ryanodine receptor type 1 (RYR1. The present work reviews the documented cases of ER or of stress-induced MH events in which RYR1 sequence variations, associated or possibly associated to MHS status, have been identified.

  8. Genomewide variation in an introgression line of rice-Zizania revealed by whole-genome re-sequencing.

    Directory of Open Access Journals (Sweden)

    Zhen-Hui Wang

    Full Text Available BACKGROUND: Hybridization between genetically diverged organisms is known as an important avenue that drives plant genome evolution. The possible outcomes of hybridization would be the occurrences of genetic instabilities in the resultant hybrids. It remained under-investigated however whether pollination by alien pollens of a closely related but sexually "incompatible" species could evoke genomic changes and to what extent it may result in phenotypic novelties in the derived progenies. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we have re-sequenced the genomes of Oryza sativa ssp. japonica cv. Matsumae and one of its derived introgressant RZ35 that was obtained from an introgressive hybridization between Matsumae and Zizanialatifolia Griseb. in general, 131 millions 90 base pair (bp paired-end reads were generated which covered 13.2 and 21.9 folds of the Matsumae and RZ35 genomes, respectively. Relative to Matsumae, a total of 41,724 homozygous single nucleotide polymorphisms (SNPs and 17,839 homozygous insertions/deletions (indels were identified in RZ35, of which 3,797 SNPs were nonsynonymous mutations. Furthermore, rampant mobilization of transposable elements (TEs was found in the RZ35 genome. The results of pathogen inoculation revealed that RZ35 exhibited enhanced resistance to blast relative to Matsumae. Notably, one nonsynonymous mutation was found in the known blast resistance gene Pid3/Pi25 and real-time quantitative (q RT-PCR analysis revealed constitutive up-regulation of its expression, suggesting both altered function and expression of Pid3/Pi25 may be responsible for the enhanced resistance to rice blast by RZ35. CONCLUSIONS/SIGNIFICANCE: Our results demonstrate that introgressive hybridization by Zizania has provoked genomewide, extensive genomic changes in the rice genome, and some of which have resulted in important phenotypic novelties. These findings suggest that introgressive hybridization by alien pollens of even a

  9. Association between sequence variations of the Mediterranean fever gene and the risk of migraine: a case–control study

    Directory of Open Access Journals (Sweden)

    Coşkun S

    2016-08-01

    Full Text Available Salih Coşkun,1 Sefer Varol,2 Hasan H Özdemir,2 Sercan Bulut Çelik,3 Metin Balduz,4 Mehmet Akif Camkurt,5 Abdullah Çim,1 Demet Arslan,2 Mehmet Uğur Çevik2 1Department of Medical Genetics, 2Department of Neurology, Faculty of Medicine, Dicle University, Diyarbakir, 3Family Health Center, Batman, 4Department of Neurology, Şanlıurfa Education and Research Hospital, Şanlıurfa, 5Department of Psychiatry, Afsin State Hospital, Kahramanmaraş, Turkey Abstract: Migraine pathogenesis involves a complex interaction between hormones, neurotransmitters, and inflammatory pathways, which also influence the migraine phenotype. The Mediterranean fever gene (MEFV encodes the pyrin protein. The major role of pyrin appears to be in the regulation of inflammation activity and the processing of the cytokine pro-interleukin-1β, and this cytokine plays a part in migraine pathogenesis. This study included 220 migraine patients and 228 healthy controls. Eight common missense mutations of the MEFV gene, known as M694V, M694I, M680I, V726A, R761H, K695R, P369S, and E148Q, were genotyped using real-time polymerase chain reaction with 5' nuclease assays, which include sequence specific primers, and probes with a reporter dye. When mutations were evaluated separately among the patient and control groups, only the heterozygote E148Q carrier was found to be significantly higher in the control group than in the patient group (P=0.029, odds ratio [95% confidence interval] =0.45 [0.21–0.94]. In addition, the frequency of the homozygote and the compound heterozygote genotype carrier was found to be significantly higher in patients (n=8, 3.6% than in the control group (n=1, 0.4% (P=0.016, odds ratio [95% confidence interval] =8.57 [1.06–69.07]. However, there was no statistically significant difference in the allele frequencies of MEFV mutations between the patients and the healthy control group (P=0.964. In conclusion, the results of the present study suggest that

  10. Sequence variations in DNA repair gene XPC is associated with lung cancer risk in a Chinese population: a case-control study

    International Nuclear Information System (INIS)

    Bai, Yun; Ma, Hongxia; Wang, Ying; Liu, Hongliang; Chen, Weihong; Yang, Lin; Jing, Guangfu; Huo, Xiang; Chen, Feng; Liu, Yanhong; Jin, Li; Xu, Liang; Wei, Qingyi; Huang, Wei; Shen, Hongbing; Lu, Daru; Wu, Tangchun; Yang, Xiaobo; Hu, Zhibin; Yuan, Jing; Wang, Feng; Shao, Minhua; Yuan, Wentao; Qian, Ji

    2007-01-01

    The nucleotide excision repair (NER) protein, xeroderma pigmentosum C (XPC), participates in recognizing DNA lesions and initiating DNA repair in response to DNA damage. Because mutations in XPC cause a high risk of cancer in XP patients, we hypothesized that inherited sequence variations in XPC may alter DNA repair and thus susceptibility to cancer. In this hospital-based case-control study, we investigated five XPC tagging, common single nucleotide polymorphisms (tagging SNPs) in 1,010 patients with newly diagnosed lung cancer and 1,011 matched cancer free controls in a Chinese population. In individual tagging SNP analysis, we found that rs3731055AG+AA variant genotypes were associated with a significantly decreased risk of lung adenocarcinoma [adjusted odds ratio (OR), 0.71; 95% confidence interval (CI), 0.56–0.90] but an increased risk of small cell carcinomas [adjusted OR, 1.79; 95% CI, 1.05–3.07]. Furthermore, we found that haplotype ACCCA was associated with a decreased risk of lung adenocarcinoma [OR, 0.78; 95% CI, 0.62–0.97] but an increased risk of small cell carcinomas [OR, 1.68; 95% CI, 1.04–2.71], which reflected the presence of rs3731055A allele in this haplotype. Further stratified analysis revealed that the protective effect of rs3731055AG+AA on risk of lung adenocarcinoma was more evident among young subjects (age ≤ 60) and never smokers. These results suggest that inherited sequence variations in XPC may modulate risk of lung cancer, especially lung adenocarcinoma, in Chinese populations. However, these findings need to be verified in larger confirmatory studies with more comprehensively selected tagging SNPs

  11. Identification of microRNAs from Amur grape (Vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics.

    Science.gov (United States)

    Wang, Chen; Han, Jian; Liu, Chonghuai; Kibet, Korir Nicholas; Kayesh, Emrul; Shangguan, Lingfei; Li, Xiaoying; Fang, Jinggui

    2012-03-29

    MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved mi

  12. Identification of microRNAs from Amur grape (vitis amurensis Rupr. by deep sequencing and analysis of microRNA variations with bioinformatics

    Directory of Open Access Journals (Sweden)

    Wang Chen

    2012-03-01

    Full Text Available Abstract Background MicroRNA (miRNA is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr. is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. Results A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Conclusions Deep sequencing of short RNAs from Amur grape flowers and berries identified 72

  13. Allelic sequence variations in the hypervariable region of a T-cell receptor β chain: Correlation with restriction fragment length polymorphism in human families and populations

    International Nuclear Information System (INIS)

    Robinson, M.A.

    1989-01-01

    Direct sequence analysis of the human T-cell antigen receptor (TCR) V β1 variable gene identified a single base-pair allelic variation (C/G) located within the coding region. This change results in substitution of a histidine (CAC) for a glutamine (CAG) at position 48 of the TCR β chain, a position predicted to be in the TCR antigen binding site. The V β1 polymorphism was found by DNA sequence analysis of V β1 genes from seven unrelated individuals; V β1 genes were amplified by the polymerase chain reaction, the amplified fragments were cloned into M13 phage vectors, and sequences were determined. To determined the inheritance patterns of the V β1 substitution and to test correlation with V β1 restriction fragment length polymorphism detected with Pvu II and Taq I, allele-specific oligonucleotides were constructed and used to characterize amplified DNA samples. Seventy unrelated individuals and six families were tested for both restriction fragment length polymorphism and for the V β1 substitution. The correlation was also tested using amplified, size-selected, Pvu II- and Taq I-digested DNA samples from heterozygotes. Pvu II allele 1 (61/70) and Taq I allele 1 (66/70) were found to be correlated with the substitution giving rise to a histidine at position 48. Because there are exceptions to the correlation, the use of specific probes to characterize allelic forms of TCR variable genes will provide important tools for studies of basic TCR genetics and disease associations

  14. Variations in the geomagnetic and gravitational background associated with two strong earthquakes of the May 2012 sequence in the Po Valley Plain (Italy).

    Science.gov (United States)

    Straser, Valentino

    2013-04-01

    Reawakening of seismic activity in the Emilian Po Valley Plain (Italy) resulted in 2,492 earthquakes over five and a half months: 2,270 with M= 7. The mainshock was recorded during the night of 20 May 2012, at 04:03:52 Italian time (02:03:52 UTC) with epicentre in Finale Emilia, at a depth of 6.3km, by the Italian National Institute of Geophysics and Vulcanology (INGV). A long sequence of telluric shocks occurred in the same seismic district in the areas between the provinces of Modena, Ferrara, Mantua, Reggio Emilia, Bologna and Rovigo. In addition to the general devastation plus damage to civil and industrial buildings and the historical heritage, the earthquakes resulted in a total of 27 victims. Concomitant with the two strongest quakes, recorded on 20 and 29 May 2012, respectively, as in the case of others, variations were noted in the geomagnetic background by the LTPA monitoring station in Rome (Italy). The geomagnetic background variations were associated with the appearance of radio-anomalies in a frequency range from 0.1 to 3.0Hz, as well as gravimetric variations found around 60km from the epicentre. The peak accelerations, detected in correspondence with the strongest shocks on 20 and 29 May 2012, were respectively 0.31g and 0.29g. The appearance of the radio-anomalies coincided, from a temporal point of view, with average gravimetric variations of approximately 30µGal around the epicentre areas, concurrent with the mainshock. In this study, both the appearance of radio-anomalies and the gravitational variations recorded before strong earthquakes were related to the dynamics of the fault and a progressive reduction in granulometry in the core of the fracture, until the point of dislocation was reached. The intense friction in the fault and the damping factors produced before the shock are hypothesized as being proportional to the number of radio-anomalies measured. The radio anomaly is an unknown radio emission that has no characteristics (duration

  15. GOBASE: an organelle genome database

    OpenAIRE

    O?Brien, Emmet A.; Zhang, Yue; Wang, Eric; Marie, Veronique; Badejoko, Wole; Lang, B. Franz; Burger, Gertraud

    2008-01-01

    The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (?913 000) and chloroplast-encoded sequences (?250 000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing...

  16. A search for pre-main sequence stars in the high-latitude molecular clouds. II - A survey of the Einstein database

    Science.gov (United States)

    Caillault, Jean-Pierre; Magnani, Loris

    1990-01-01

    The preliminary results are reported of a survey of every EINSTEIN image which overlaps any high-latitude molecular cloud in a search for X-ray emitting pre-main sequence stars. This survey, together with complementary KPNO and IRAS data, will allow the determination of how prevalent low mass star formation is in these clouds in general and, particularly, in the translucent molecular clouds.

  17. Relational databases

    CERN Document Server

    Bell, D A

    1986-01-01

    Relational Databases explores the major advances in relational databases and provides a balanced analysis of the state of the art in relational databases. Topics covered include capture and analysis of data placement requirements; distributed relational database systems; data dependency manipulation in database schemata; and relational database support for computer graphics and computer aided design. This book is divided into three sections and begins with an overview of the theory and practice of distributed systems, using the example of INGRES from Relational Technology as illustration. The

  18. Typing of Panton-Valentine Leukocidin-encoding Phages and lukSF-PV Gene Sequence Variation in Staphylococcus aureus from China

    Directory of Open Access Journals (Sweden)

    Huanqiang Zhao

    2016-08-01

    Full Text Available Panton-Valentine leucocidin (PVL, encoded by lukSF-PV genes, a bi-component and pore-forming toxin, is carried by different staphylococcal bacteriophages. The prevalence of PVL in Staphylococcus aureus (S. aureus have been reported around the globe. However, the data on PVL-encoding phage types, lukSF-PV gene variation and chromosomal phage insertion sites for PVL-positive S. aureus are limited, especially in China. In order to obtain a more complete understanding of the molecular epidemiology of PVL-positive S. aureus, an integrated and modified PCR-based scheme was applied to detect the PVL-encoding phage types. Phage insertion locus and the lukSF-PV variant were determined by PCR and sequencing. Meanwhile, the genetic background was characterized by staphylococcal cassette chromosome mec (SCCmec typing, staphylococcal protein A (spa gene polymorphisms typing, pulsed-field gel electrophoresis (PFGE typing, accessory gene regulator (agr locus typing and multilocus sequence typing (MLST. Seventy eight (78/1175, 6.6% isolates possessed the lukSF-PV genes and 59.0% (46/78 of PVL-positive strains belonged to CC59 lineage. Eight known different PVL-encoding phage types were detected, and Φ7247PVL/ΦST5967PVL (n=13 and ΦPVL (n=12 were the most prevalent among them. While 25 (25/78, 32.1% isolates, belonging to ST30 and ST59 clones, were unable to be typed by the modified PCR-based scheme. Single nucleotide polymorphisms (SNPs were identified at five locations in the lukSF-PV genes, two of which were non-synonymous. Maximum-likelihood tree analysis of attachment sites sequences detected six SNP profiles for attR and eight for attL, respectively. In conclusion, the PVL-positive S. aureus mainly harbored Φ7247PVL/ΦST5967PVL and ΦPVL in the regions studied. lukSF-PV gene sequences, PVL-encoding phages and phage insertion locus generally varied with lineages. Moreover, PVL-positive clones that have emerged worldwide likely carry distinct phages.

  19. Typing of Panton-Valentine Leukocidin-Encoding Phages and lukSF-PV Gene Sequence Variation in Staphylococcus aureus from China.

    Science.gov (United States)

    Zhao, Huanqiang; Hu, Fupin; Jin, Shu; Xu, Xiaogang; Zou, Yuhan; Ding, Baixing; He, Chunyan; Gong, Fang; Liu, Qingzhong

    2016-01-01

    Panton-Valentine leukocidin (PVL, encoded by lukSF-PV genes), a bi-component and pore-forming toxin, is carried by different staphylococcal bacteriophages. The prevalence of PVL in Staphylococcus aureus has been reported around the globe. However, the data on PVL-encoding phage types, lukSF-PV gene variation and chromosomal phage insertion sites for PVL-positive S. aureus are limited, especially in China. In order to obtain a more complete understanding of the molecular epidemiology of PVL-positive S. aureus, an integrated and modified PCR-based scheme was applied to detect the PVL-encoding phage types. Phage insertion locus and the lukSF-PV variant were determined by PCR and sequencing. Meanwhile, the genetic background was characterized by staphylococcal cassette chromosome mec (SCCmec) typing, staphylococcal protein A (spa) gene polymorphisms typing, pulsed-field gel electrophoresis (PFGE) typing, accessory gene regulator (agr) locus typing and multilocus sequence typing (MLST). Seventy eight (78/1175, 6.6%) isolates possessed the lukSF-PV genes and 59.0% (46/78) of PVL-positive strains belonged to CC59 lineage. Eight known different PVL-encoding phage types were detected, and Φ7247PVL/ΦST5967PVL (n = 13) and ΦPVL (n = 12) were the most prevalent among them. While 25 (25/78, 32.1%) isolates, belonging to ST30, and ST59 clones, were unable to be typed by the modified PCR-based scheme. Single nucleotide polymorphisms (SNPs) were identified at five locations in the lukSF-PV genes, two of which were non-synonymous. Maximum-likelihood tree analysis of attachment sites sequences detected six SNP profiles for attR and eight for attL, respectively. In conclusion, the PVL-positive S. aureus mainly harbored Φ7247PVL/ΦST5967PVL and ΦPVL in the regions studied. lukSF-PV gene sequences, PVL-encoding phages, and phage insertion locus generally varied with lineages. Moreover, PVL-positive clones that have emerged worldwide likely carry distinct phages.

  20. Biofuel Database

    Science.gov (United States)

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  1. Community Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such...

  2. Mycobacteriophage genome database.

    Science.gov (United States)

    Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

    2011-01-01

    Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.

  3. Tank waste processing analysis: Database development, tank-by-tank processing requirements, and examples of pretreatment sequences and schedules as applied to Hanford Double-Shell Tank Supernatant Waste - FY 1993

    International Nuclear Information System (INIS)

    Colton, N.G.; Orth, R.J.; Aitken, E.A.

    1994-09-01

    This report gives the results of work conducted in FY 1993 by the Tank Waste Processing Analysis Task for the Underground Storage Tank Integrated Demonstration. The main purpose of this task, led by Pacific Northwest Laboratory, is to demonstrate a methodology to identify processing sequences, i.e., the order in which a tank should be processed. In turn, these sequences may be used to assist in the development of time-phased deployment schedules. Time-phased deployment is implementation of pretreatment technologies over a period of time as technologies are required and/or developed. The work discussed here illustrates how tank-by-tank databases and processing requirements have been used to generate processing sequences and time-phased deployment schedules. The processing sequences take into account requirements such as the amount and types of data available for the tanks, tank waste form and composition, required decontamination factors, and types of compact processing units (CPUS) required and technology availability. These sequences were developed from processing requirements for the tanks, which were determined from spreadsheet analyses. The spreadsheet analysis program was generated by this task in FY 1993. Efforts conducted for this task have focused on the processing requirements for Hanford double-shell tank (DST) supernatant wastes (pumpable liquid) because this waste type is easier to retrieve than the other types (saltcake and sludge), and more tank space would become available for future processing needs. The processing requirements were based on Class A criteria set by the U.S. Nuclear Regulatory Commission and Clean Option goals provided by Pacific Northwest Laboratory

  4. Database Administrator

    Science.gov (United States)

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  5. Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data.

    Science.gov (United States)

    He, Zihuai; Xu, Bin; Lee, Seunggeun; Ionita-Laza, Iuliana

    2017-09-07

    Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  6. Relationship between ureB Sequence Diversity, Urease Activity and Genotypic Variations of Different Helicobacter pylori Strains in Patients with Gastric Disorders.

    Science.gov (United States)

    Ghalehnoei, Hossein; Ahmadzadeh, Alireza; Farzi, Nastaran; Alebouyeh, Masoud; Aghdaei, Hamid Asadzadeh; Azimzadeh, Pendram; Molaei, Mahsa; Zali, Mohammad Reza

    2016-01-01

    Association of the severity of Helicobacter pylori induced diseases with virulence entity of the colonized strains was proven in some studies. Urease has been demonstrated as a potent virulence factor for H. pylori. The main aim of this study was investigation of the relationships of ureB sequence diversity, urease activity and virulence genotypes of different H. pylori strains with histopathological changes of gastric tissue in infected patients suffering from different gastric disorders. Analysis of the virulence genotypes in the isolated strains indicated significant associations between the presence of severe active gastritis and cagA+ (P = 0.039) or cagA/iceA1 genotypes (P = 0.026), and intestinal metaplasia and vacA m1 (P = 0.008) or vacA s1/m2 (P = 0.001) genotypes. Our results showed a 2.4-fold increased risk of peptic ulcer (95% CI: 0.483-11.93), compared with gastritis, in the infected patients who had dupA positive strains; however this association was not statistically significant. The results of urease activity showed a significant mean difference between the isolated strains from patients with PUD and NUD (P = 0.034). This activity was relatively higher among patients with intestinal metaplasia. Also a significant association was found between the lack of cagA and increased urease activity among the isolated strains (P = 0.036). While the greatest sequence variation of ureB was detected in a strain from a patient with intestinal metaplasia, the sole determined amino acid change in UreB sequence (Ala201Thr, 30%), showed no influence on urease activity. In conclusion, the supposed role of H. pylori urease to form peptic ulcer and advancing of intestinal metaplasia was postulated in this study. Higher urease activity in the colonizing H. pylori strains that present specific virulence factors was indicated as a risk factor for promotion of histopathological changes of gastric tissue that advance gastric malignancy.

  7. Prevalence and sequence variations of the genes encoding the five antigens included in the novel 5CVMB vaccine covering group B meningococcal disease.

    Science.gov (United States)

    Jacobsson, Susanne; Hedberg, Sara Thulin; Mölling, Paula; Unemo, Magnus; Comanducci, Maurizio; Rappuoli, Rino; Olcén, Per

    2009-03-04

    During the recent years, projects are in progress for designing broad-range non-capsular-based meningococcal vaccines, covering also serogroup B isolates. We have examined three genes encoding antigens (NadA, GNA1030 and GNA2091) included in a novel vaccine, i.e. the 5 Component Vaccine against Meningococcus B (5CVMB), in terms of gene prevalence and sequence variations. These data were combined with the results from a similar study, examining the two additional antigens included in the 5CVMB (fHbp and GNA2132). nadA and fHbp v. 1 were present in 38% (n=36), respectively 71% (n=67) of the isolates, whereas gna2132, gna1030 and gna2091 were present in all the Neisseria meningitidis isolates tested (n=95). The level of amino acid conservation was relatively high in GNA1030 (93%), GNA2091 (92%), and within the main variants of NadA and fHbp. GNA2132 (54% of the amino acids conserved) appeared to be the most diversified antigen. Consequently, the theoretical coverage of the 5CVMB antigens and the feasibility to use these in a broad-range meningococcal vaccine is appealing.

  8. Fast and secure retrieval of DNA sequences

    NARCIS (Netherlands)

    2014-01-01

    Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are

  9. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data.

    Science.gov (United States)

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-10-01

    Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest. Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects.

  10. Method and system for normalizing biometric variations to authenticate users from a public database and that ensures individual biometric data privacy

    Science.gov (United States)

    Strait, Robert S.; Pearson, Peter K.; Sengupta, Sailes K.

    2000-01-01

    A password system comprises a set of codewords spaced apart from one another by a Hamming distance (HD) that exceeds twice the variability that can be projected for a series of biometric measurements for a particular individual and that is less than the HD that can be encountered between two individuals. To enroll an individual, a biometric measurement is taken and exclusive-ORed with a random codeword to produce a "reference value." To verify the individual later, a biometric measurement is taken and exclusive-ORed with the reference value to reproduce the original random codeword or its approximation. If the reproduced value is not a codeword, the nearest codeword to it is found, and the bits that were corrected to produce the codeword to it is found, and the bits that were corrected to produce the codeword are also toggled in the biometric measurement taken and the codeword generated during enrollment. The correction scheme can be implemented by any conventional error correction code such as Reed-Muller code R(m,n). In the implementation using a hand geometry device an R(2,5) code has been used in this invention. Such codeword and biometric measurement can then be used to see if the individual is an authorized user. Conventional Diffie-Hellman public key encryption schemes and hashing procedures can then be used to secure the communications lines carrying the biometric information and to secure the database of authorized users.

  11. Method and system for normalizing biometric variations to authenticate users from a public database and that ensures individual biometric data privacy

    International Nuclear Information System (INIS)

    Strait, R.S.; Pearson, P.K.; Sengupta, S.K.

    2000-01-01

    A password system comprises a set of codewords spaced apart from one another by a Hamming distance (HD) that exceeds twice the variability that can be projected for a series of biometric measurements for a particular individual and that is less than the HD that can be encountered between two individuals. To enroll an individual, a biometric measurement is taken and exclusive-ORed with a random codeword to produce a reference value. To verify the individual later, a biometric measurement is taken and exclusive-ORed with the reference value to reproduce the original random codeword or its approximation. If the reproduced value is not a codeword, the nearest codeword to it is found, and the bits that were corrected to produce the codeword to it is found, and the bits that were corrected to produce the codeword are also toggled in the biometric measurement taken and the codeword generated during enrollment. The correction scheme can be implemented by any conventional error correction code such as Reed-Muller code R(m,n). In the implementation using a hand geometry device an R(2,5) code has been used in this invention. Such codeword and biometric measurement can then be used to see if the individual is an authorized user. Conventional Diffie-Hellman public key encryption schemes and hashing procedures can then be used to secure the communications lines carrying the biometric information and to secure the database of authorized users

  12. Federal databases

    International Nuclear Information System (INIS)

    Welch, M.J.; Welles, B.W.

    1988-01-01

    Accident statistics on all modes of transportation are available as risk assessment analytical tools through several federal agencies. This paper reports on the examination of the accident databases by personal contact with the federal staff responsible for administration of the database programs. This activity, sponsored by the Department of Energy through Sandia National Laboratories, is an overview of the national accident data on highway, rail, air, and marine shipping. For each mode, the definition or reporting requirements of an accident are determined and the method of entering the accident data into the database is established. Availability of the database to others, ease of access, costs, and who to contact were prime questions to each of the database program managers. Additionally, how the agency uses the accident data was of major interest

  13. Sequence variations and protein expression levels of the two immune evasion proteins Gpm1 and Pra1 influence virulence of clinical Candida albicans isolates.

    Science.gov (United States)

    Luo, Shanshan; Hipler, Uta-Christina; Münzberg, Christin; Skerka, Christine; Zipfel, Peter F

    2015-01-01

    Candida albicans, the important human fungal pathogen uses multiple evasion strategies to control, modulate and inhibit host complement and innate immune attack. Clinical C. albicans strains vary in pathogenicity and in serum resistance, in this work we analyzed sequence polymorphisms and variations in the expression levels of two central fungal complement evasion proteins, Gpm1 (phosphoglycerate mutase 1) and Pra1 (pH-regulated antigen 1) in thirteen clinical C. albicans isolates. Four nucleotide (nt) exchanges, all representing synonymous exchanges, were identified within the 747-nt long GPM1 gene. For the 900-nt long PRA1 gene, sixteen nucleotide exchanges were identified, which represented synonymous, as well as non-synonymous exchanges. All thirteen clinical isolates had a homozygous exchange (A to G) at position 73 of the PRA1 gene. Surface levels of Gpm1 varied by 8.2, and Pra1 levels by 3.3 fold in thirteen tested isolates and these differences influenced fungal immune fitness. The high Gpm1/Pra1 expressing candida strains bound the three human immune regulators more efficiently, than the low expression strains. The difference was 44% for Factor H binding, 51% for C4BP binding and 23% for plasminogen binding. This higher Gpm1/Pra1 expressing strains result in enhanced survival upon challenge with complement active, Factor H depleted human serum (difference 40%). In addition adhesion to and infection of human endothelial cells was increased (difference 60%), and C3b surface deposition was less effective (difference 27%). Thus, variable expression levels of central immune evasion protein influences immune fitness of the human fungal pathogen C. albicans and thus contribute to fungal virulence.

  14. A comprehensive survey of sequence variation in the ABCA4 (ABCR) gene in Stargardt disease and age-related macular degeneration.

    Science.gov (United States)

    Rivera, A; White, K; Stöhr, H; Steiner, K; Hemmrich, N; Grimm, T; Jurklies, B; Lorenz, B; Scholl, H P; Apfelstedt-Sylla, E; Weber, B H

    2000-10-01

    Stargardt disease (STGD) is a common autosomal recessive maculopathy of early and young-adult onset and is caused by alterations in the gene encoding the photoreceptor-specific ATP-binding cassette (ABC) transporter (ABCA4). We have studied 144 patients with STGD and 220 unaffected individuals ascertained from the German population, to complete a comprehensive, population-specific survey of the sequence variation in the ABCA4 gene. In addition, we have assessed the proposed role for ABCA4 in age-related macular degeneration (AMD), a common cause of late-onset blindness, by studying 200 affected individuals with late-stage disease. Using a screening strategy based primarily on denaturing gradient gel electrophoresis, we have identified in the three study groups a total of 127 unique alterations, of which 90 have not been previously reported, and have classified 72 as probable pathogenic mutations. Of the 288 STGD chromosomes studied, mutations were identified in 166, resulting in a detection rate of approximately 58%. Eight different alleles account for 61% of the identified disease alleles, and at least one of these, the L541P-A1038V complex allele, appears to be a founder mutation in the German population. When the group with AMD and the control group were analyzed with the same methodology, 18 patients with AMD and 12 controls were found to harbor possible disease-associated alterations. This represents no significant difference between the two groups; however, for detection of modest effects of rare alleles in complex diseases, the analysis of larger cohorts of patients may be required.

  15. Comparison of sequencing the D2 region of the large subunit ribosomal RNA gene (MicroSEQ®) versus the internal transcribed spacer (ITS) regions using two public databases for identification of common and uncommon clinically relevant fungal species.

    Science.gov (United States)

    Arbefeville, S; Harris, A; Ferrieri, P

    2017-09-01

    Fungal infections cause considerable morbidity and mortality in immunocompromised patients. Rapid and accurate identification of fungi is essential to guide accurately targeted antifungal therapy. With the advent of molecular methods, clinical laboratories can use new technologies to supplement traditional phenotypic identification of fungi. The aims of the study were to evaluate the sole commercially available MicroSEQ® D2 LSU rDNA Fungal Identification Kit compared to the in-house developed internal transcribed spacer (ITS) regions assay in identifying moulds, using two well-known online public databases to analyze sequenced data. 85 common and uncommon clinically relevant fungi isolated from clinical specimens were sequenced for the D2 region of the large subunit (LSU) of ribosomal RNA (rRNA) gene with the MicroSEQ® Kit and the ITS regions with the in house developed assay. The generated sequenced data were analyzed with the online GenBank and MycoBank public databases. The D2 region of the LSU rRNA gene identified 89.4% or 92.9% of the 85 isolates to the genus level and the full ITS region (f-ITS) 96.5% or 100%, using GenBank or MycoBank, respectively, when compared to the consensus ID. When comparing species-level designations to the consensus ID, D2 region of the LSU rRNA gene aligned with 44.7% (38/85) or 52.9% (45/85) of these isolates in GenBank or MycoBank, respectively. By comparison, f-ITS possessed greater specificity, followed by ITS1, then ITS2 regions using GenBank or MycoBank. Using GenBank or MycoBank, D2 region of the LSU rRNA gene outperformed phenotypic based ID at the genus level. Comparing rates of ID between D2 region of the LSU rRNA gene and the ITS regions in GenBank or MycoBank at the species level against the consensus ID, f-ITS and ITS2 exceeded performance of the D2 region of the LSU rRNA gene, but ITS1 had similar performance to the D2 region of the LSU rRNA gene using MycoBank. Our results indicated that the MicroSEQ® D2 LSU r

  16. Variation in cancer surgical outcomes associated with physician and nurse staffing: a retrospective observational study using the Japanese Diagnosis Procedure Combination Database

    Directory of Open Access Journals (Sweden)

    Yasunaga Hideo

    2012-05-01

    Full Text Available Abstract Background Little is known about the effects of professional staffing on cancer surgical outcomes. The present study aimed to investigate the association between cancer surgical outcomes and physician/nurse staffing in relation to hospital volume. Methods We analyzed 131,394 patients undergoing lung lobectomy, esophagectomy, gastrectomy, colorectal surgery, hepatectomy or pancreatectomy for cancer between July and December, 2007–2008, using the Japanese Diagnosis Procedure Combination database linked to the Survey of Medical Institutions data. Physician-to-bed ratio (PBR and nurse-to-bed ratio (NBR were determined for each hospital. Hospital volume was categorized into low, medium and high for each of six cancer surgeries. Failure to rescue (FTR was defined as a proportion of inhospital deaths among those with postoperative complications. Multi-level logistic regression analysis was performed to examine the association between physician/nurse staffing and FTR, adjusting for patient characteristics and hospital volume. Results Overall inhospital mortality was 1.8%, postoperative complication rate was 15.2%, and FTR rate was 11.9%. After adjustment for hospital volume, FTR rate in the group with high PBR (≥19.7 physicians per 100 beds and high NBR (≥77.0 nurses per 100 beds was significantly lower than that in the group with low PBR ( Conclusions Well-staffed hospitals confer a benefit for cancer surgical patients regarding reduced FTR, irrespective of hospital volume. These results suggest that consolidation of surgical centers linked with migration of medical professionals may improve the quality of cancer surgical management.

  17. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    Directory of Open Access Journals (Sweden)

    Dawn B. Goldsmith

    2015-06-01

    Full Text Available Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS site in the summer (September and winter (March of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years.

  18. Database Replication

    CERN Document Server

    Kemme, Bettina

    2010-01-01

    Database replication is widely used for fault-tolerance, scalability and performance. The failure of one database replica does not stop the system from working as available replicas can take over the tasks of the failed replica. Scalability can be achieved by distributing the load across all replicas, and adding new replicas should the load increase. Finally, database replication can provide fast local access, even if clients are geographically distributed clients, if data copies are located close to clients. Despite its advantages, replication is not a straightforward technique to apply, and

  19. Refactoring databases evolutionary database design

    CERN Document Server

    Ambler, Scott W

    2006-01-01

    Refactoring has proven its value in a wide range of development projects–helping software professionals improve system designs, maintainability, extensibility, and performance. Now, for the first time, leading agile methodologist Scott Ambler and renowned consultant Pramodkumar Sadalage introduce powerful refactoring techniques specifically designed for database systems. Ambler and Sadalage demonstrate how small changes to table structures, data, stored procedures, and triggers can significantly enhance virtually any database design–without changing semantics. You’ll learn how to evolve database schemas in step with source code–and become far more effective in projects relying on iterative, agile methodologies. This comprehensive guide and reference helps you overcome the practical obstacles to refactoring real-world databases by covering every fundamental concept underlying database refactoring. Using start-to-finish examples, the authors walk you through refactoring simple standalone databas...

  20. An Integrated Molecular Database on Indian Insects.

    Science.gov (United States)

    Pratheepa, Maria; Venkatesan, Thiruvengadam; Gracy, Gandhi; Jalali, Sushil Kumar; Rangheswaran, Rajagopal; Antony, Jomin Cruz; Rai, Anil

    2018-01-01

    MOlecular Database on Indian Insects (MODII) is an online database linking several databases like Insect Pest Info, Insect Barcode Information System (IBIn), Insect Whole Genome sequence, Other Genomic Resources of National Bureau of Agricultural Insect Resources (NBAIR), Whole Genome sequencing of Honey bee viruses, Insecticide resistance gene database and Genomic tools. This database was developed with a holistic approach for collecting information about phenomic and genomic information of agriculturally important insects. This insect resource database is available online for free at http://cib.res.in. http://cib.res.in/.

  1. Intrastrain heterogeneity of the mgpB gene in Mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences.

    Science.gov (United States)

    Iverson-Cabral, Stefanie L; Astete, Sabina G; Cohen, Craig R; Rocha, Eduardo P C; Totten, Patricia A

    2006-07-01

    Mycoplasma genitalium is associated with reproductive tract disease in women and may persist in the lower genital tract for months, potentially increasing the risk of upper tract infection and transmission to uninfected partners. Despite its exceptionally small genome (580 kb), approximately 4% is composed of repeated elements known as MgPar sequences (MgPa repeats) based on their homology to the mgpB gene that encodes the immunodominant MgPa adhesin protein. The presence of these MgPar sequences, as well as mgpB variability between M. genitalium strains, suggests that mgpB and MgPar sequences recombine to produce variant MgPa proteins. To examine the extent and generation of diversity within single strains of the organism, we examined mgpB variation within M. genitalium strain G-37 and observed sequence heterogeneity that could be explained by recombination between the mgpB expression site and putative donor MgPar sequences. Similarly, we analyzed mgpB sequences from cervical specimens from a persistently infected woman (21 months) and identified 17 different mgpB variants within a single infecting M. genitalium strain, confirming that mgpB heterogeneity occurs over the course of a natural infection. These observations support the hypothesis that recombination occurs between the mgpB gene and MgPar sequences and that the resulting antigenically distinct MgPa variants may contribute to immune evasion and persistence of infection.

  2. Intragenomic sequence variation at the ITS1 - ITS2 region and at the 18S and 28S nuclear ribosomal DNA genes of the New Zealand mud snail, Potamopyrgus antipodarum (Hydrobiidae: mollusca)

    Science.gov (United States)

    Hoy, Marshal S.; Rodriguez, Rusty J.

    2013-01-01

    Molecular genetic analysis was conducted on two populations of the invasive non-native New Zealand mud snail (Potamopyrgus antipodarum), one from a freshwater ecosystem in Devil's Lake (Oregon, USA) and the other from an ecosystem of higher salinity in the Columbia River estuary (Hammond Harbor, Oregon, USA). To elucidate potential genetic differences between the two populations, three segments of nuclear ribosomal DNA (rDNA), the ITS1-ITS2 regions and the 18S and 28S rDNA genes were cloned and sequenced. Variant sequences within each individual were found in all three rDNA segments. Folding models were utilized for secondary structure analysis and results indicated that there were many sequences which contained structure-altering polymorphisms, which suggests they could be nonfunctional pseudogenes. In addition, analysis of molecular variance (AMOVA) was used for hierarchical analysis of genetic variance to estimate variation within and among populations and within individuals. AMOVA revealed significant variation in the ITS region between the populations and among clones within individuals, while in the 5.8S rDNA significant variation was revealed among individuals within the two populations. High levels of intragenomic variation were found in the ITS regions, which are known to be highly variable in many organisms. More interestingly, intragenomic variation was also found in the 18S and 28S rDNA, which has rarely been observed in animals and is so far unreported in Mollusca. We postulate that in these P. antipodarum populations the effects of concerted evolution are diminished due to the fact that not all of the rDNA genes in their polyploid genome should be essential for sustaining cellular function. This could lead to a lessening of selection pressures, allowing mutations to accumulate in some copies, changing them into variant sequences.                   

  3. RDD Databases

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This database was established to oversee documents issued in support of fishery research activities including experimental fishing permits (EFP), letters of...

  4. Snowstorm Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Snowstorm Database is a collection of over 500 snowstorms dating back to 1900 and updated operationally. Only storms having large areas of heavy snowfall (10-20...

  5. Dealer Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The dealer reporting databases contain the primary data reported by federally permitted seafood dealers in the northeast. Electronic reporting was implemented May 1,...

  6. National database

    DEFF Research Database (Denmark)

    Kristensen, Helen Grundtvig; Stjernø, Henrik

    1995-01-01

    Artikel om national database for sygeplejeforskning oprettet på Dansk Institut for Sundheds- og Sygeplejeforskning. Det er målet med databasen at samle viden om forsknings- og udviklingsaktiviteter inden for sygeplejen.......Artikel om national database for sygeplejeforskning oprettet på Dansk Institut for Sundheds- og Sygeplejeforskning. Det er målet med databasen at samle viden om forsknings- og udviklingsaktiviteter inden for sygeplejen....

  7. Database Description - AcEST | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name AcEST Alternative n...hi, Tokyo-to 192-0397 Tel: +81-42-677-1111(ext.3654) E-mail: Database classificat...eneris Taxonomy ID: 13818 Database description This is a database of EST sequences of Adiantum capillus-vene...(3): 223-227. External Links: Original website information Database maintenance site Plant Environmental Res...base Database Description Download License Update History of This Database Site Policy | Contact Us Database Description - AcEST | LSDB Archive ...

  8. Legume and Lotus japonicus Databases

    DEFF Research Database (Denmark)

    Hirakawa, Hideki; Mun, Terry; Sato, Shusei

    2014-01-01

    Since the genome sequence of Lotus japonicus, a model plant of family Fabaceae, was determined in 2008 (Sato et al. 2008), the genomes of other members of the Fabaceae family, soybean (Glycine max) (Schmutz et al. 2010) and Medicago truncatula (Young et al. 2011), have been sequenced. In this sec....... In this section, we introduce representative, publicly accessible online resources related to plant materials, integrated databases containing legume genome information, and databases for genome sequence and derived marker information of legume species including L. japonicus...

  9. Sequence and secondary structure of the mitochondrial small-subunit rRNA V4, V6, and V9 domains reveal highly species-specific variations within the genus Agrocybe.

    Science.gov (United States)

    Gonzalez, P; Labarère, J

    1998-11-01

    A comparative study of variable domains V4, V6, and V9 of the mitochondrial small-subunit (SSU) rRNA was carried out with the genus Agrocybe by PCR amplification of 42 wild isolates belonging to 10 species, Agrocybe aegerita, Agrocybe dura, Agrocybe chaxingu, Agrocybe erebia, Agrocybe firma, Agrocybe praecox, Agrocybe paludosa, Agrocybe pediades, Agrocybe alnetorum, and Agrocybe vervacti. Sequencing of the PCR products showed that the three domains in the isolates belonging to the same species were the same length and had the same sequence, while variations were found among the 10 species. Alignment of the sequences showed that nucleotide motifs encountered in the smallest sequence of each variable domain were also found in the largest sequence, indicating that the sequences evolved by insertion-deletion events. Determination of the secondary structure of each domain revealed that the insertion-deletion events commonly occurred in regions not directly involved in the secondary structure (i.e., the loops). Moreover, conserved sequences ranging from 4 to 25 nucleotides long were found at the beginning and end of each domain and could constitute genus-specific sequences. Comparisons of the V4, V6, and V9 secondary structures resulted in identification of the following four groups: (i) group I, which was characterized by the presence of additional P23-1 and P23-3 helices in the V4 domain and the lack of the P49-1 helix in V9 and included A. aegerita, A. chaxingu, and A. erebia; (ii) group II, which had the P23-3 helix in V4 and the P49-1 helix in V9 and included A. pediades; (iii) group III, which did not have additional helices in V4, had the P49-1 helix in V9 and included A. paludosa, A. firma, A. alnetorum, and A. praecox; and (iv) group IV, which lacked both the V4 additional helices and the P49-1 helix in V9 and included A. vervacti and A. dura. This grouping of species was supported by the structure of a consensus tree based on the variable domain sequences. The

  10. Along-strike Variations in the Himalayas Illuminated by the Aftershock Sequence of the 2015 Mw 7.8 Gorkha Earthquake Using the NAMASTE Local Seismic Network

    Science.gov (United States)

    Mendoza, M.; Ghosh, A.; Karplus, M. S.; Nabelek, J.; Sapkota, S. N.; Adhikari, L. B.; Klemperer, S. L.; Velasco, A. A.

    2016-12-01

    As a result of the 2015 Mw 7.8 Gorkha earthquake, more than 8,000 people were killed from a combination of infrastructure failure and triggered landslides. This earthquake produced 4 m of peak co-seismic slip as the fault ruptured 130 km east under densely populated cities, such as Kathmandu. To understand earthquake dynamics in this part of the Himalayas and help mitigate similar future calamities by the next destructive event, it is imperative to study earthquake activities in detail and improve our understanding of the source and structural complexities. In response to the Gorkha event, multiple institutions developed and deployed a 10-month long dense seismic network called NAMASTE. It blanketed a 27,650 km2 area, mainly covering the rupture area of the Gorkha earthquake, in order to capture the dynamic sequence of aftershock behavior. The network consisted of a mix of 45 broadband, short-period, and strong motion sensors, with an average spacing of 20 km. From the first 6 months of data, starting approximately 1.5 after the mainshock, we develop a robust catalog containing over 3,000 precise earthquake locations, and local magnitudes that range between 0.3 and 4.9. The catalog has a magnitude of completeness of 1.5, and an overall low b-value of 0.78. Using the HypoDD algorithm, we relocate earthquake hypocenters with high precision, and thus illustrate the fault geometry down to depths of 25 km where we infer the location of the gently-dipping Main Frontal Thrust (MFT). Above the MFT, the aftershocks illuminate complex structure produced by relatively steeply dipping faults. Interestingly, we observe sharp along-strike change in the seismicity pattern. The eastern part of the aftershock area is significantly more active than the western part. The change in seismicity may reflect structural and/or frictional lateral heterogeneity in this part of the Himalayan fault system. Such along-strike variations play an important role in rupture complexities and

  11. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution.

    Science.gov (United States)

    Modahl, Cassandra M; Mackessy, Stephen P

    2016-06-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides

  12. Variation and association to diabetes in 2000 full mtDNA sequences mined from an exome study in a Danish population

    DEFF Research Database (Denmark)

    Li, Shengting; Besenbacher, Soren; Li, Yingrui

    2014-01-01

    In this paper, we mine full mtDNA sequences from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this resource. The sample includes 1000 individuals with type 2 diabetes and 1000 controls. We characterise...

  13. Experiment Databases

    Science.gov (United States)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  14. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information....

  15. Sequence variation in the cytochrome oxidase subunit I and II genes of two commonly found blow fly species, Chrysomya megacephala (Fabricius) and Chrysomya rufifacies (Macquart) (Diptera: Calliphoridae) in Malaysia.

    Science.gov (United States)

    Tan, Siew Hwa; Aris, Edah Mohd; Surin, Johari; Omar, Baharudin; Kurahashi, Hiromu; Mohamed, Zulqarnain

    2009-08-01

    The mitochondiral DNA region encompassing the cytochrome oxidase subunit I (COI) and cytochrome oxidase subunit II (COII) genes of two Malaysian blow fly species, Chrysomya megacephala (Fabricius) and Chrysomya rufifacies (Macquart) were studied. This region, which spans 2303bp and includes the COI, tRNA leucine and partial COII was sequenced from adult fly and larval specimens, and compared. Intraspecific variations were observed at 0.26% for Ch. megacephala and 0.17% for Ch. rufifacies, while sequence divergence between the two species was recorded at a minimum of 141 out of 2303 sites (6.12%). Results obtained in this study are comparable to published data, and thus support the use of DNA sequence to facilitate and complement morphology-based species identification.

  16. Novel rare missense variations and risk of autism spectrum disorder: whole-exome sequencing in two families with affected siblings and a two-stage follow-up study in a Japanese population.

    Directory of Open Access Journals (Sweden)

    Jun Egawa

    Full Text Available Rare inherited variations in multiplex families with autism spectrum disorder (ASD are suggested to play a major role in the genetic etiology of ASD. To further investigate the role of rare inherited variations, we performed whole-exome sequencing (WES in two families, each with three affected siblings. We also performed a two-stage follow-up case-control study in a Japanese population. WES of the six affected siblings identified six novel rare missense variations. Among these variations, CLN8 R24H was inherited in one family by three affected siblings from an affected father and thus co-segregated with ASD. In the first stage of the follow-up study, we genotyped the six novel rare missense variations identified by WES in 241 patients and 667 controls (the Niigata sample. Only CLN8 R24H had higher mutant allele frequencies in patients (1/482 compared with controls (1/1334. In the second stage, this variation was further genotyped, yet was not detected in a sample of 309 patients and 350 controls (the Nagoya sample. In the combined Niigata and Nagoya samples, there was no significant association (odds ratio = 1.8, 95% confidence interval = 0.1-29.6. These results suggest that CLN8 R24H plays a role in the genetic etiology of ASD, at least in a subset of ASD patients.

  17. Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs

    DEFF Research Database (Denmark)

    Friis, Susanne L; Buchard, Anders; Rockenbauer, Eszter

    2016-01-01

    This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report with the......This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report...

  18. Nucleotide sequences from the genomes of diverse cowpea accessions for discovery of genetic variation as part of the Feed the Future Innovation Lab for Climate Resilient Cowpea

    Data.gov (United States)

    US Agency for International Development — Nucleotide sequences were generated from 37 cowpea (Vigna unguiculata L. Walp.) accessions relevant to Africa, China and the USA to discover at type of genetic...

  19. [Population genetic differentiation of Phrynocephalus axillaris in east of Xinjiang Uygur Autonomous Region based on sequence variation of mitochondrial ND4-tRNALeu gene].

    Science.gov (United States)

    Li, Jun; Guo, Xian-Guang; Wang, Yue-Zhao

    2010-08-01

    A 838 bp fragment of mtDNA ND4-tRNALeu gene was sequenced for 66 individuals from five populations (DB: Dabancheng, TU: Turpan, SS: Shanshan, HL: Liushuquan, HD: East district of Hami) of Phrynocephalus axillaris distributed in east of Xinjiang Uygur Autonomous Region. Seventeen haplotypes were identified from 29 nucleotide polymorphic sites in the aligned 838 bp sequence. Excluding DB, there were relatively high haplotype diversity [(0.600+/-0.113)oscillation since Pleistocene and genetic drift.

  20. Variation of Se, Zn, Co, Fe and Rb distribution in rats upon sequence of injection with SeO2 and glutathione

    International Nuclear Information System (INIS)

    Czauderna, M.; Samochocka, K.; Kwiathkowska, J.

    1984-01-01

    The contents of Se, Zn, Co, Fe and Rb in several organs of Wistar rats were determined by instrumental neutron activation analysis (INAA) after injections of SeO 2 and glutathione (GSH). Se was incorporated in all the examined organs, and the efficiency of incorporation does not depend upon the sequence of injection with SeO 2 and GSH. The sequence of these injections affects the contents of the other elements in all the examined organs. (author)

  1. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues.

    Science.gov (United States)

    Garrido-Martín, Diego; Pazos, Florencio

    2018-02-27

    The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.

  2. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  3. High frequency of the IVS2-2A>G DNA sequence variation in SLC26A5, encoding the cochlear motor protein prestin, precludes its involvement in hereditary hearing loss

    Directory of Open Access Journals (Sweden)

    Pereira Fred A

    2005-08-01

    Full Text Available Abstract Background Cochlear outer hair cells change their length in response to variations in membrane potential. This capability, called electromotility, is believed to enable the sensitivity and frequency selectivity of the mammalian cochlea. Prestin is a transmembrane protein required for electromotility. Homozygous prestin knockout mice are profoundly hearing impaired. In humans, a single nucleotide change in SLC26A5, encoding prestin, has been reported in association with hearing loss. This DNA sequence variation, IVS2-2A>G, occurs in the exon 3 splice acceptor site and is expected to abolish splicing of exon 3. Methods To further explore the relationship between hearing loss and the IVS2-2A>G transition, and assess allele frequency, genomic DNA from hearing impaired and control subjects was analyzed by DNA sequencing. SLC26A5 genomic DNA sequences from human, chimp, rat, mouse, zebrafish and fruit fly were aligned and compared for evolutionary conservation of the exon 3 splice acceptor site. Alternative splice acceptor sites within intron 2 of human SLC26A5 were sought using a splice site prediction program from the Berkeley Drosophila Genome Project. Results The IVS2-2A>G variant was found in a heterozygous state in 4 of 74 hearing impaired subjects of Hispanic, Caucasian or uncertain ethnicity and 4 of 150 Hispanic or Caucasian controls (p = 0.45. The IVS2-2A>G variant was not found in 106 subjects of Asian or African American descent. No homozygous subjects were identified (n = 330. Sequence alignment of SLC26A5 orthologs demonstrated that the A nucleotide at position IVS2-2 is invariant among several eukaryotic species. Sequence analysis also revealed five potential alternative splice acceptor sites in intron 2 of human SLC26A5. Conclusion These data suggest that the IVS2-2A>G variant may not occur more frequently in hearing impaired subjects than in controls. The identification of five potential alternative splice acceptor sites in

  4. REDIdb: the RNA editing database.

    Science.gov (United States)

    Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

    2007-01-01

    The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.

  5. Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing.

    Science.gov (United States)

    Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar

    2014-03-04

    Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.

  6. Geographic patterns of genetic variation in a broadly distributed marine vertebrate: new insights into loggerhead turtle stock structure from expanded mitochondrial DNA sequences.

    Directory of Open Access Journals (Sweden)

    Brian M Shamblin

    Full Text Available Previous genetic studies have demonstrated that natal homing shapes the stock structure of marine turtle nesting populations. However, widespread sharing of common haplotypes based on short segments of the mitochondrial control region often limits resolution of the demographic connectivity of populations. Recent studies employing longer control region sequences to resolve haplotype sharing have focused on regional assessments of genetic structure and phylogeography. Here we synthesize available control region sequences for loggerhead turtles from the Mediterranean Sea, Atlantic, and western Indian Ocean basins. These data represent six of the nine globally significant regional management units (RMUs for the species and include novel sequence data from Brazil, Cape Verde, South Africa and Oman. Genetic tests of differentiation among 42 rookeries represented by short sequences (380 bp haplotypes from 3,486 samples and 40 rookeries represented by long sequences (∼800 bp haplotypes from 3,434 samples supported the distinction of the six RMUs analyzed as well as recognition of at least 18 demographically independent management units (MUs with respect to female natal homing. A total of 59 haplotypes were resolved. These haplotypes belonged to two highly divergent global lineages, with haplogroup I represented primarily by CC-A1, CC-A4, and CC-A11 variants and haplogroup II represented by CC-A2 and derived variants. Geographic distribution patterns of haplogroup II haplotypes and the nested position of CC-A11.6 from Oman among the Atlantic haplotypes invoke recent colonization of the Indian Ocean from the Atlantic for both global lineages. The haplotypes we confirmed for western Indian Ocean RMUs allow reinterpretation of previous mixed stock analysis and further suggest that contemporary migratory connectivity between the Indian and Atlantic Oceans occurs on a broader scale than previously hypothesized. This study represents a valuable model for

  7. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  8. Tracing outbreaks of Streptococcus equi infection (strangles) in horses using sequence variation in the seM gene and pulsed-field gel electrophoresis.

    Science.gov (United States)

    Lindahl, Susanne; Söderlund, Robert; Frosth, Sara; Pringle, John; Båverud, Viveca; Aspán, Anna

    2011-11-21

    Strangles is a serious respiratory disease in horses caused by Streptococcus equi subspecies equi (S. equi). Transmission of the disease occurs by direct contact with an infected horse or contaminated equipment. Genetically, S. equi strains are highly homogenous and differentiation of strains has proven difficult. However, the S. equi M-protein SeM contains a variable N-terminal region and has been proposed as a target gene to distinguish between different strains of S. equi and determine the source of an outbreak. In this study, strains of S. equi (n=60) from 32 strangles outbreaks in Sweden during 1998-2003 and 2008-2009 were genetically characterized by sequencing the SeM protein gene (seM), and by pulsed-field gel electrophoresis (PFGE). Swedish strains belonged to 10 different seM types, of which five have not previously been described. Most were identical or highly similar to allele types from strangles outbreaks in the UK. Outbreaks in 2008/2009 sharing the same seM type were associated by geographic location and/or type of usage of the horses (racing stables). Sequencing of the seM gene generally agreed with pulsed-field gel electrophoresis profiles. Our data suggest that seM sequencing as a epidemiological tool is supported by the agreement between seM and PFGE and that sequencing of the SeM protein gene is more sensitive than PFGE in discriminating strains of S. equi. Copyright © 2011 Elsevier B.V. All rights reserved.

  9. Molecular phylogeny and SNP variation of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) derived from genome sequences.

    Science.gov (United States)

    Cronin, Matthew A; Rincon, Gonzalo; Meredith, Robert W; MacNeil, Michael D; Islas-Trejo, Alma; Cánovas, Angela; Medrano, Juan F

    2014-01-01

    We assessed the relationships of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) with high throughput genomic sequencing data with an average coverage of 25× for each species. A total of 1.4 billion 100-bp paired-end reads were assembled using the polar bear and annotated giant panda (Ailuropoda melanoleuca) genome sequences as references. We identified 13.8 million single nucleotide polymorphisms (SNP) in the 3 species aligned to the polar bear genome. These data indicate that polar bears and brown bears share more SNP with each other than either does with black bears. Concatenation and coalescence-based analysis of consensus sequences of approximately 1 million base pairs of ultraconserved elements in the nuclear genome resulted in a phylogeny with black bears as the sister group to brown and polar bears, and all brown bears are in a separate clade from polar bears. Genotypes for 162 SNP loci of 336 bears from Alaska and Montana showed that the species are genetically differentiated and there is geographic population structure of brown and black bears but not polar bears.

  10. Chloroplast DNA analysis of Tunisian cork oak populations (Quercus suber L.): sequence variations and molecular evolution of the trnL (UAA)-trnF (GAA) region.

    Science.gov (United States)

    Abdessamad, A; Baraket, G; Sakka, H; Ammari, Y; Ksontini, M; Hannachi, A Salhi

    2016-10-24

    Sequences of the trnL-trnF spacer and combined trnL-trnF region in chloroplast DNA of cork oak (Quercus suber L.) were analyzed to detect polymorphisms and to elucidate molecular evolution and demographic history. The aligned sequences varied in length and nucleotide composition. The overall ratio of transition/transversion (ti/tv) of 0.724 for the intergenic spacer and 0.258 for the pooled sequences were estimated, and indicated that transversions are more frequent than transitions. The molecular evolution and demographic history of Q. suber were investigated. Neutrality tests (Tajima's D and Fu and Li) ruled out the null hypothesis of a strictly neutral model, and Fu's Fs and Ramos-Onsins and Rozas' R2 confirmed the recent expansion of cork oak trees, validating its persistency in North Africa since the last glaciation during the Quaternary. The observed uni-modal mismatch distribution and the Harpending's raggedness index confirmed the demographic history model for cork oak. A phylogenetic dendrogram showed that the distribution of Q. suber trees occurs independently of geographical origin, the relief of the population site, and the bioclimatic stages. The molecular history and cytoplasmic diversity suggest that in situ and ex situ conservation strategies can be recommended for preserving landscape value and facing predictable future climatic changes.

  11. Database Description - eSOL | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name eSOL Alternative nam...eator Affiliation: The Research and Development of Biological Databases Project, National Institute of Genet...nology 4259 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8501 Japan Email: Tel.: +81-45-924-5785 Database... classification Protein sequence databases - Protein properties Organism Taxonomy Name: Escherichia coli Taxonomy ID: 562 Database...i U S A. 2009 Mar 17;106(11):4201-6. External Links: Original website information Database maintenance site

  12. A database of PCR primers for the chloroplast genomes of higher plants

    Science.gov (United States)

    Heinze, Berthold

    2007-01-01

    Background Chloroplast genomes evolve slowly and many primers for PCR amplification and analysis of chloroplast sequences can be used across a wide array of genera. In some cases 'universal' primers have been designed for the purpose of working across species boundaries. However, the essential information on these primer sequences is scattered throughout the literature. Results A database is presented here which assembles published primer information for chloroplast DNA. Additional primers were designed to fill gaps where little or no primer information could be found. Amplicons are either the genes themselves (typically useful in studies of sequence variation in higher-order phylogeny) or they are spacers, introns, and intergenic regions (for studies of phylogeographic patterns within and among species). The current list of 'generic' primers consists of more than 700 sequences. Wherever possible, we give the locations of the primers in the thirteen fully sequenced chloroplast genomes (Nicotiana tabacum, Atropa belladonna, Spinacia oleracea, Arabidopsis thaliana, Populus trichocarpa, Oryza sativa, Pinus thunbergii, Marchantia polymorpha, Zea mays, Oenothera elata, Acorus calamus, Eucalyptus globulus, Medicago trunculata). Conclusion The database described here is designed to serve as a resource for researchers who are venturing into the study of poorly described chloroplast genomes, whether for large- or small-scale DNA sequencing projects, to study molecular variation or to investigate chloroplast evolution. PMID:17326828

  13. A database of PCR primers for the chloroplast genomes of higher plants

    Directory of Open Access Journals (Sweden)

    Heinze Berthold

    2007-02-01

    Full Text Available Abstract Background Chloroplast genomes evolve slowly and many primers for PCR amplification and analysis of chloroplast sequences can be used across a wide array of genera. In some cases 'universal' primers have been designed for the purpose of working across species boundaries. However, the essential information on these primer sequences is scattered throughout the literature. Results A database is presented here which assembles published primer information for chloroplast DNA. Additional primers were designed to fill gaps where little or no primer information could be found. Amplicons are either the genes themselves (typically useful in studies of sequence variation in higher-order phylogeny or they are spacers, introns, and intergenic regions (for studies of phylogeographic patterns within and among species. The current list of 'generic' primers consists of more than 700 sequences. Wherever possible, we give the locations of the primers in the thirteen fully sequenced chloroplast genomes (Nicotiana tabacum, Atropa belladonna, Spinacia oleracea, Arabidopsis thaliana, Populus trichocarpa, Oryza sativa, Pinus thunbergii, Marchantia polymorpha, Zea mays, Oenothera elata, Acorus calamus, Eucalyptus globulus, Medicago trunculata. Conclusion The database described here is designed to serve as a resource for researchers who are venturing into the study of poorly described chloroplast genomes, whether for large- or small-scale DNA sequencing projects, to study molecular variation or to investigate chloroplast evolution.

  14. Stackfile Database

    Science.gov (United States)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  15. Sequences for Student Investigation

    Science.gov (United States)

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  16. Sequence variation of the glycoprotein gene identifies three distinct lineages within field isolates of viral hemorrhagic septicemia virus, a fish rhabdovirus

    Science.gov (United States)

    Benmansour, A.; Bascuro, B.; Monnier, A.F.; Vende, P.; Winton, J.R.; de Kinkelin, P.

    1997-01-01

    To evaluate the genetic diversity of viral haemorrhagic septicaemia virus (VHSV), the sequence of the glycoprotein genes (G) of 11 North American and European isolates were determined. Comparison with the G protein of representative members of the family Rhabdoviridae suggested that VHSV was a different virus species from infectious haemorrhagic necrosis virus (IHNV) and Hirame rhabdovirus (HIRRV). At a higher taxonomic level, VHSV, IHNV and HIRRV formed a group which was genetically closest to the genus Lyssavirus. Compared with each other, the G genes of VHSV displayed a dissimilar overall genetic diversity which correlated with differences in geographical origin. The multiple sequence alignment of the complete G protein, showed that the divergent positions were not uniformly distributed along the sequence. A central region (amino acid position 245-300) accumulated substitutions and appeared to be highly variable. The genetic heterogeneity within a single isolate was high, with an apparent internal mutation frequency of 1.2 x 10(-3) per nucleotide site, attesting the quasispecies nature of the viral population. The phylogeny separated VHSV strains according to the major geographical area of isolation: genotype I for continental Europe, genotype II for the British Isles, and genotype III for North America. Isolates from continental Europe exhibited the highest genetic variability, with sub-groups correlated partially with the serological classification. Neither neutralizing polyclonal sera, nor monoclonal antibodies, were able to discriminate between the genotypes. The overall structure of the phylogenetic tree suggests that VHSV genetic diversity and evolution fit within the model of random change and positive selection operating on quasispecies.

  17. Genetic variation of the greenhouse whitefly, Trialeurodes vaporariorum (Hemiptera: Aleyrodidae), among populations from Serbia and neighbouring countries, as inferred from COI sequence variability.

    Science.gov (United States)

    Prijović, M; Skaljac, M; Drobnjaković, T; Zanić, K; Perić, P; Marčić, D; Puizina, J

    2014-06-01

    The greenhouse whitefly Trialeurodes vaporariorum Westwood, 1856 (Hemiptera: Aleyrodidae) is an invasive and highly polyphagous phloem-feeding pest of vegetables and ornamentals. Trialeurodes vaporariorum causes serious damage due to direct feeding and transmits several important plant viruses. Excessive use of insecticides has resulted in significantly reduced levels of susceptibility of various T. vaporariorum populations. To determine the genetic variability within and among populations of T. vaporariorum from Serbia and to explore their genetic relatedness with other T. vaporariorum populations, we analysed the mitochondrial cytochrome c oxidase I (COI) sequences of 16 populations from Serbia and six neighbouring countries: Montenegro (three populations), Macedonia (one population) and Croatia (two populations), for a total of 198 analysed specimens. A low overall level of sequence divergence and only five variable nucleotides and six haplotypes were found. The most frequent haplotype, H1, was identified in all Serbian populations and in all specimens from distant localities in Croatia and Macedonia. The COI sequence data that was retrieved from GenBank and the data from our study indicated that H1 is the most globally widespread T. vaporariorum haplotype. A lack of spatial genetic structure among the studied T. vaporariorum populations, as well as two demographic tests that we performed (Tajima's D value and Fu's Fs statistics), indicate a recent colonisation event and population growth. Phylogenetic analyses of the COI haplotypes in this study and other T. vaporariorum haplotypes that were retrieved from GenBank were performed using Bayesian inference and median-joining (MJ) network analysis. Two major haplogroups with only a single unique nucleotide difference were found: haplogroup 1 (containing the five Serbian haplotypes and those previously identified in India, China, the Netherlands, the United Kingdom, Morocco, Reunion and the USA) and haplogroup 3

  18. dBBQs: dataBase of Bacterial Quality scores

    OpenAIRE

    Wanchai, Visanu; Patumcharoenpol, Preecha; Nookaew, Intawat; Ussery, David

    2017-01-01

    Background: It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from al...

  19. Deciphering the resistome of the widespread P. aeruginosa ST175 international high-risk clone through whole genome sequencing

    DEFF Research Database (Denmark)

    Cabot, Gabriel; López-Causapé, Carla; Ocampo-Sosa, Alain A.

    2016-01-01

    . Sequence variation was analyzed for 146 chromosomal genes related to antimicrobial resistance and horizontally-acquired genes were explored using online databases. The resistome of ST175 was mainly determined by mutational events, with resistance traits common to all or nearly all of the strains, including...

  20. Variation in extragenic repetitive DNA sequences in Pseudomonas syringae and potential use of modified REP primers in the identification of closely related isolates

    Directory of Open Access Journals (Sweden)

    Elif Çepni

    2012-01-01

    Full Text Available In this study, Pseudomonas syringe pathovars isolated from olive, tomato and bean were identified by species-specific PCR and their genetic diversity was assessed by repetitive extragenic palindromic (REP-PCR. Reverse universal primers for REP-PCR were designed by using the bases of A, T, G or C at the positions of 1, 4 and 11 to identify additional polymorphism in the banding patterns. Binding of the primers to different annealing sites in the genome revealed additional fingerprint patterns in eight isolates of P. savastanoi pv. savastanoi and two isolates of P. syringae pv. tomato. The use of four different bases in the primer sequences did not affect the PCR reproducibility and was very efficient in revealing intra-pathovar diversity, particularly in P. savastanoi pv. savastanoi. At the pathovar level, the primer BOX1AR yielded shared fragments, in addition to five bands that discriminated among the pathovars P. syringae pv. phaseolicola, P. savastanoi pv. savastanoi and P. syringae pv. tomato. REP-PCR with a modified primer containing C produced identical bands among the isolates in a pathovar but separated three pathovars more distinctly than four other primers. Although REP-and BOX-PCRs have been successfully used in the molecular identification of Pseudomonas isolates from Turkish flora, a PCR based on inter-enterobacterial repetitive intergenic concensus (ERIC sequences failed to produce clear banding patterns in this study.

  1. Open Geoscience Database

    Science.gov (United States)

    Bashev, A.

    2012-04-01

    treatment could be conducted in other programs after extraction the filtered data into *.csv file. It makes the database understandable for non-experts. The database employs open data format (*.csv) and wide spread tools: PHP as the program language, MySQL as database management system, JavaScript for interaction with GoogleMaps and JQueryUI for create user interface. The database is multilingual: there are association tables, which connect with elements of the database. In total the development required about 150 hours. The database still has several problems. The main problem is the reliability of the data. Actually it needs an expert system for estimation the reliability, but the elaboration of such a system would take more resources than the database itself. The second problem is the problem of stream selection - how to select the stations that are connected with each other (for example, belong to one water stream) and indicate their sequence. Currently the interface is English and Russian. However it can be easily translated to your language. But some problems we decided. For example problem "the problem of the same station" (sometimes the distance between stations is smaller, than the error of position): when you adding new station to the database our application automatically find station near this place. Also we decided problem of object and parameter type (how to regard "EC" and "electrical conductivity" as the same parameter). This problem has been solved using "associative tables". If you would like to see the interface on your language, just contact us. We should send you the list of terms and phrases for translation on your language. The main advantage of the database is that it is totally open: everybody can see, extract the data from the database and use them for non-commercial purposes with no charge. Registered users can contribute to the database without getting paid. We hope, that it will be widely used first of all for education purposes, but

  2. Extending Database Integration Technology

    National Research Council Canada - National Science Library

    Buneman, Peter

    1999-01-01

    Formal approaches to the semantics of databases and database languages can have immediate and practical consequences in extending database integration technologies to include a vastly greater range...

  3. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome

    Directory of Open Access Journals (Sweden)

    Santosh K. Tiwari

    2011-01-01

    Full Text Available The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs, in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0 software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5 software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  4. Sequence variation and linkage disequilibrium in the GABA transporter-1 gene (SLC6A1 in five populations: implications for pharmacogenetic research

    Directory of Open Access Journals (Sweden)

    Sughondhabirom Atapol

    2007-10-01

    Full Text Available Abstract Background GABA transporter-1 (GAT-1; genetic locus SLC6A1 is emerging as a novel target for treatment of neuropsychiatric disorders. To understand how population differences might influence strategies for pharmacogenetic studies, we identified patterns of genetic variation and linkage disequilibrium (LD in SLC6A1 in five populations representing three continental groups. Results We resequenced 12.4 kb of SLC6A1, including the promoters, exons and flanking intronic regions in African-American, Thai, Hmong, Finnish, and European-American subjects (total n = 40. LD in SLC6A1 was examined by genotyping 16 SNPs in larger samples. Sixty-three variants were identified through resequencing. Common population-specific variants were found in African-Americans, including a novel 21-bp promoter region variable number tandem repeat (VNTR, but no such variants were found in any of the other populations studied. Low levels of LD and the absence of major LD blocks were characteristic of all five populations. African-Americans had the highest genetic diversity. European-Americans and Finns did not differ in genetic diversity or LD patterns. Although the Hmong had the highest level of LD, our results suggest that a strategy based on the use of tag SNPs would not translate to a major improvement in genotyping efficiency. Conclusion Owing to the low level of LD and presence of recombination hotspots, SLC6A1 may be an example of a problematic gene for association and haplotype tagging-based genetic studies. The 21-bp promoter region VNTR polymorphism is a putatively functional candidate allele for studies focusing on variation in GAT-1 function in the African-American population.

  5. Analysis and functional characterization of sequence variations in ligand binding domain of thyroid hormone receptors in autism spectrum disorder (ASD) patients.

    Science.gov (United States)

    Kalikiri, Mahesh Kumar; Mamidala, Madhu Poornima; Rao, Ananth N; Rajesh, Vidya

    2017-12-01

    Autism spectrum disorder (ASD) is a neuro developmental disorder, reported to be on a rise in the past two decades. Thyroid hormone-T3 plays an important role in early embryonic and central nervous system development. T3 mediates its function by binding to thyroid hormone receptors, TRα and TRβ. Alterations in T3 levels and thyroid receptor mutations have been earlier implicated in neuropsychiatric disorders and have been linked to environmental toxins. Limited reports from earlier studies have shown the effectiveness of T3 treatment with promising results in children with ASD and that the thyroid hormone levels in these children was also normal. This necessitates the need to explore the genetic variations in the components of the thyroid hormone pathway in ASD children. To achieve this objective, we performed genetic analysis of ligand binding domain of THRA and THRB receptor genes in 30 ASD subjects and in age matched controls from India. Our study for the first time reports novel single nucleotide polymorphisms in the THRA and THRB receptor genes of ASD individuals. Autism Res 2017, 10: 1919-1928. ©2017 International Society for Autism Research, Wiley Periodicals, Inc. Thyroid hormone (T3) and thyroid receptors (TRα and TRβ) are the major components of the thyroid hormone pathway. The link between thyroid pathway and neuronal development is proven in clinical medicine. Since the thyroid hormone levels in Autistic children are normal, variations in their receptors needs to be explored. To achieve this objective, changes in THRA and THRB receptor genes was studied in 30 ASD and normal children from India. The impact of some of these mutations on receptor function was also studied. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.

  6. GRIP Database original data - GRIPDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us GRI...PDB GRIP Database original data Data detail Data name GRIP Database original data DOI 10....18908/lsdba.nbdc01665-006 Description of data contents GRIP Database original data It consists of data table...s and sequences. Data file File name: gripdb_original_data.zip File URL: ftp://ftp.biosciencedbc.jp/archive/gripdb/LATEST/gri...e Database Description Download License Update History of This Database Site Policy | Contact Us GRIP Database original data - GRIPDB | LSDB Archive ...

  7. RICD: A rice indica cDNA database resource for rice functional genomics

    Directory of Open Access Journals (Sweden)

    Zhang Qifa

    2008-11-01

    Full Text Available Abstract Background The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Results Rice Indica cDNA Database (RICD is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. Conclusion The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.

  8. Simplified validation of borderline hits of database searches

    OpenAIRE

    Thomas, Henrik; Shevchenko, Andrej

    2008-01-01

    Along with unequivocal hits produced by matching multiple MS/MS spectra to database sequences, LC-MS/MS analysis often yields a large number of hits of borderline statistical confidence. To simplify their validation, we propose to use rapid de novo interpretation of all acquired MS/MS spectra and, with the help of a simple software tool, display the candidate sequences together with each database search hit. We demonstrate that comparing hit database sequences and independent de novo interpre...

  9. Artificial Radionuclides Database in the Pacific Ocean: HAM Database

    Directory of Open Access Journals (Sweden)

    Michio Aoyama

    2004-01-01

    Full Text Available The database “Historical Artificial Radionuclides in the Pacific Ocean and its Marginal Seas”, or HAM database, has been created. The database includes 90Sr, 137Cs, and 239,240Pu concentration data from the seawater of the Pacific Ocean and its marginal seas with some measurements from the sea surface to the bottom. The data in the HAM database were collected from about 90 literature citations, which include published papers; annual reports by the Hydrographic Department, Maritime Safety Agency, Japan; and unpublished data provided by individuals. The data of concentrations of 90Sr, 137Cs, and 239,240Pu have been accumulating since 1957–1998. The present HAM database includes 7737 records for 137Cs concentration data, 3972 records for 90Sr concentration data, and 2666 records for 239,240Pu concentration data. The spatial variation of sampling stations in the HAM database is heterogeneous, namely, more than 80% of the data for each radionuclide is from the Pacific Ocean and the Sea of Japan, while a relatively small portion of data is from the South Pacific. This HAM database will allow us to use these radionuclides as significant chemical tracers for oceanographic study as well as the assessment of environmental affects of anthropogenic radionuclides for these 5 decades. Furthermore, these radionuclides can be used to verify the oceanic general circulation models in the time scale of several decades.

  10. Cryptic variation in an ecological indicator organism: mitochondrial and nuclear DNA sequence data confirm distinct lineages of Baetis harrisoni Barnard (Ephemeroptera: Baetidae in southern Africa

    Directory of Open Access Journals (Sweden)

    Pereira-da-Conceicoa Lyndall L

    2012-02-01

    Full Text Available Abstract Background Baetis harrisoni Barnard is a mayfly frequently encountered in river studies across Africa, but the external morphological features used for identifying nymphs have been observed to vary subtly between different geographic locations. It has been associated with a wide range of ecological conditions, including pH extremes of pH 2.9–10.0 in polluted waters. We present a molecular study of the genetic variation within B. harrisoni across 21 rivers in its distribution range in southern Africa. Results Four gene regions were examined, two mitochondrial (cytochrome c oxidase subunit I [COI] and small subunit ribosomal 16S rDNA [16S] and two nuclear (elongation factor 1 alpha [EF1α] and phosphoenolpyruvate carboxykinase [PEPCK]. Bayesian and parsimony approaches to phylogeny reconstruction resulted in five well-supported major lineages, which were confirmed using a general mixed Yule-coalescent (GMYC model. Results from the EF1α gene were significantly incongruent with both mitochondrial and nuclear (PEPCK results, possibly due to incomplete lineage sorting of the EF1α gene. Mean between-clade distance estimated using the COI and PEPCK data was found to be an order of magnitude greater than the within-clade distance and comparable to that previously reported for other recognised Baetis species. Analysis of the Isolation by Distance (IBD between all samples showed a small but significant effect of IBD. Within each lineage the contribution of IBD was minimal. Tentative dating analyses using an uncorrelated log-normal relaxed clock and two published estimates of COI mutation rates suggest that diversification within the group occurred throughout the Pliocene and mid-Miocene (~2.4–11.5 mya. Conclusions The distinct lineages of B. harrisoni correspond to categorical environmental variation, with two lineages comprising samples from streams that flow through acidic Table Mountain Sandstone and three lineages with samples from

  11. The UCSC Genome Browser Database: 2008 update

    DEFF Research Database (Denmark)

    Karolchik, D; Kuhn, R M; Baertsch, R

    2007-01-01

    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrat...

  12. Converging evidence that sequence variations in the novel candidate gene MAP2K7 (MKK7) are functionally associated with schizophrenia.

    Science.gov (United States)

    Winchester, Catherine L; Ohzeki, Hiromitsu; Vouyiouklis, Demetrius A; Thompson, Rhiannon; Penninger, Josef M; Yamagami, Keiji; Norrie, John D; Hunter, Robert; Pratt, Judith A; Morris, Brian J

    2012-11-15

    Schizophrenia is a debilitating psychiatric disease with a strong genetic contribution, potentially linked to altered glutamatergic function in brain regions such as the prefrontal cortex (PFC). Here, we report converging evidence to support a functional candidate gene for schizophrenia. In post-mortem PFC from patients with schizophrenia, we detected decreased expression of MKK7/MAP2K7-a kinase activated by glutamatergic activity. While mice lacking one copy of the Map2k7 gene were overtly normal in a variety of behavioural tests, these mice showed a schizophrenia-like cognitive phenotype of impaired working memory. Additional support for MAP2K7 as a candidate gene came from a genetic association study. A substantial effect size (odds ratios: ~1.9) was observed for a common variant in a cohort of case and control samples collected in the Glasgow area and also in a replication cohort of samples of Northern European descent (most significant P-value: 3 × 10(-4)). While some caution is warranted until these association data are further replicated, these results are the first to implicate the candidate gene MAP2K7 in genetic risk for schizophrenia. Complete sequencing of all MAP2K7 exons did not reveal any non-synonymous mutations. However, the MAP2K7 haplotype appeared to have functional effects, in that it influenced the level of expression of MAP2K7 mRNA in human PFC. Taken together, the results imply that reduced function of the MAP2K7-c-Jun N-terminal kinase (JNK) signalling cascade may underlie some of the neurochemical changes and core symptoms in schizophrenia.

  13. High sequence variations in the region containing genes encoding a cellular morphogenesis protein and the repressor of sexual development help to reveal origins of Aspergillus oryzae.

    Science.gov (United States)

    Chang, Perng-Kuang; Scharfenstein, Leslie L; Solorzano, Cesar D; Abbas, Hamed K; Hua, Sui-Sheng T; Jones, Walker A; Zablotowicz, Robert M

    2015-05-04

    Aspergillus oryzae and Aspergillus flavus are closely related fungal species. The A. flavus morphotype that produces numerous small sclerotia (S strain) and aflatoxin has a unique 1.5 kb deletion in the norB-cypA region of the aflatoxin gene cluster (i.e. the S genotype). Phylogenetic studies have indicated that an isolate of the nonaflatoxigenic A. flavus with the S genotype is the ancestor of A. oryzae. Genome sequence comparison between A. flavus NRRL3357, which produces large sclerotia (L strain), and S-strain A. flavus 70S identified a region (samA-rosA) that was highly variable in the two morphotypes. A third type of samA-rosA region was found in A. oryzae RIB40. The three samA-rosA types were later revealed to be commonly present in A. flavus L-strain populations. Of the 182 L-strain A. flavus field isolates examined, 46%, 15% and 39% had the samA-rosA type of NRRL3357, 70S and RIB40, respectively. The three types also were found in 18 S-strain A. flavus isolates with different proportions. For A. oryzae, however, the majority (80%) of the 16 strains examined had the RIB40 type and none had the NRRL3357 type. The results suggested that A. oryzae strains in the current culture collections were mostly derived from the samA-rosA/RIB40 lineage of the nonaflatoxigenic A. flavus with the S genotype. Published by Elsevier B.V.

  14. A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature.

    Science.gov (United States)

    Hart, Reece K; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A

    2015-01-15

    Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  15. Database development and management

    CERN Document Server

    Chao, Lee

    2006-01-01

    Introduction to Database Systems Functions of a DatabaseDatabase Management SystemDatabase ComponentsDatabase Development ProcessConceptual Design and Data Modeling Introduction to Database Design Process Understanding Business ProcessEntity-Relationship Data Model Representing Business Process with Entity-RelationshipModelTable Structure and NormalizationIntroduction to TablesTable NormalizationTransforming Data Models to Relational Databases .DBMS Selection Transforming Data Models to Relational DatabasesEnforcing ConstraintsCreating Database for Business ProcessPhysical Design and Database

  16. The Ensembl genome database project.

    Science.gov (United States)

    Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

    2002-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

  17. Efficient Disk-Based Techniques for Manipulating Very Large String Databases

    KAUST Repository

    Allam, Amin

    2017-01-01

    Indexing and processing strings are very important topics in database management. Strings can be database records, DNA sequences, protein sequences, or plain text. Various string operations are required for several application categories

  18. Sequence History Update Tool

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  19. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  20. The UCSC Genome Browser Database: update 2006

    DEFF Research Database (Denmark)

    Hinrichs, A S; Karolchik, D; Baertsch, R

    2006-01-01

    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, ...

  1. Databases of the marine metagenomics

    KAUST Repository

    Mineta, Katsuhiko

    2015-10-28

    The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database.

  2. Mathematics for Databases

    NARCIS (Netherlands)

    ir. Sander van Laar

    2007-01-01

    A formal description of a database consists of the description of the relations (tables) of the database together with the constraints that must hold on the database. Furthermore the contents of a database can be retrieved using queries. These constraints and queries for databases can very well be

  3. Databases and their application

    NARCIS (Netherlands)

    Grimm, E.C.; Bradshaw, R.H.W; Brewer, S.; Flantua, S.; Giesecke, T.; Lézine, A.M.; Takahara, H.; Williams, J.W.,Jr; Elias, S.A.; Mock, C.J.

    2013-01-01

    During the past 20 years, several pollen database cooperatives have been established. These databases are now constituent databases of the Neotoma Paleoecology Database, a public domain, multiproxy, relational database designed for Quaternary-Pliocene fossil data and modern surface samples. The

  4. DOT Online Database

    Science.gov (United States)

    Page Home Table of Contents Contents Search Database Search Login Login Databases Advisory Circulars accessed by clicking below: Full-Text WebSearch Databases Database Records Date Advisory Circulars 2092 5 data collection and distribution policies. Document Database Website provided by MicroSearch

  5. The STRING database in 2011

    DEFF Research Database (Denmark)

    Szklarczyk, Damian; Franceschini, Andrea; Kuhn, Michael

    2011-01-01

    present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score...... models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org....

  6. Database Resources of the BIG Data Center in 2018.

    Science.gov (United States)

    2018-01-04

    The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome vari