WorldWideScience

Sample records for nucleotide sequence motifs

  1. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  2. Novel nucleotide sequence motifs that produce hotspots of meiotic recombination in Schizosaccharomyces pombe.

    Science.gov (United States)

    Steiner, Walter W; Steiner, Estelle M; Girvin, Angela R; Plewik, Lauren E

    2009-06-01

    In many organisms, including yeasts and humans, meiotic recombination is initiated preferentially at a limited number of sites in the genome referred to as recombination hotspots. Predicting precisely the location of most hotspots has remained elusive. In this study, we tested the hypothesis that hotspots can result from multiple different sequence motifs. We devised a method to rapidly screen many short random oligonucleotide sequences for hotspot activity in the fission yeast Schizosaccharomyces pombe and produced a library of approximately 500 unique 15- and 30-bp sequences containing hotspots. The frequency of hotspots found suggests that there may be a relatively large number of different sequence motifs that produce hotspots. Within our sequence library, we found many shorter 6- to 10-bp motifs that occurred multiple times, many of which produced hotspots when reconstructed in vivo. On the basis of sequence similarity, we were able to group those hotspots into five different sequence families. At least one of the novel hotspots we found appears to be a target for a transcription factor, as it requires that factor for its hotspot activity. We propose that many hotspots in S. pombe, and perhaps other organisms, result from simple sequence motifs, some of which are identified here.

  3. FastMotif: spectral sequence motif discovery.

    Science.gov (United States)

    Colombo, Nicoló; Vlassis, Nikos

    2015-08-15

    Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, most of the existing motif finding algorithms are computationally demanding, and they may not be able to support the increasingly large datasets produced by modern high-throughput sequencing technologies. We present FastMotif, a new motif discovery algorithm that is built on a recent machine learning technique referred to as Method of Moments. Based on spectral decompositions, our method is robust to model misspecifications and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. On HT-Selex data, FastMotif extracts motif profiles that match those computed by various state-of-the-art algorithms, but one order of magnitude faster. We provide a theoretical and numerical analysis of the algorithm's robustness and discuss its sensitivity with respect to the free parameters. The Matlab code of FastMotif is available from http://lcsb-portal.uni.lu/bioinformatics. vlassis@adobe.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  5. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs....... These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery, exemplified by microRNA seed enrichment, but also including enrichment problems involving complex motifs and combinations of motifs. We demonstrate a number of usage scenarios that take...

  6. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  7. rMotifGen: random motif generator for DNA and protein sequences

    Directory of Open Access Journals (Sweden)

    Hardin C Timothy

    2007-08-01

    Full Text Available Abstract Background Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM. Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Results Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. Conclusion rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.

  8. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  9. The international nucleotide sequence database collaboration.

    Science.gov (United States)

    Karsch-Mizrachi, Ilene; Takagi, Toshihisa; Cochrane, Guy

    2018-01-04

    For more than 30 years, the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been committed to capturing, preserving and providing access to comprehensive public domain nucleotide sequence and associated metadata which enables discovery in biomedicine, biodiversity and biological sciences. Since 1987, the DNA Data Bank of Japan (DDBJ) at the National Institute for Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have worked collaboratively to enable access to nucleotide sequence data in standardized formats for the worldwide scientific community. In this article, we reiterate the principles of the INSDC collaboration and briefly summarize the trends of the archival content. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

  10. MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis.

    Science.gov (United States)

    Klepper, Kjetil; Drabløs, Finn

    2013-01-16

    Traditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming. Here we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results. We have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org.

  11. The International Nucleotide Sequence Database Collaboration.

    Science.gov (United States)

    Cochrane, Guy; Karsch-Mizrachi, Ilene; Nakamura, Yasukazu

    2011-01-01

    Under the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), globally comprehensive public domain nucleotide sequence is captured, preserved and presented. The partners of this long-standing collaboration work closely together to provide data formats and conventions that enable consistent data submission to their databases and support regular data exchange around the globe. Clearly defined policy and governance in relation to free access to data and relationships with journal publishers have positioned INSDC databases as a key provider of the scientific record and a core foundation for the global bioinformatics data infrastructure. While growth in sequence data volumes comes no longer as a surprise to INSDC partners, the uptake of next-generation sequencing technology by mainstream science that we have witnessed in recent years brings a step-change to growth, necessarily making a clear mark on INSDC strategy. In this article, we introduce the INSDC, outline data growth patterns and comment on the challenges of increased growth.

  12. Expressed sequence tags (ESTs) and single nucleotide ...

    African Journals Online (AJOL)

    Expressed Sequence Tags (ESTs) and Single Nucleotide Polymorphisms (SNPs) are providing in depth knowledge in plant biology, breeding and biotechnology. The emergence of many novel molecular marker techniques are changing and accelerating the process of producing mutations in plant molecular biology ...

  13. Retrieval and Representation of Nucleotide Sequence of ...

    African Journals Online (AJOL)

    Nigerian Journal of Basic and Applied Science (March, 2013), 21(1): 27-32. DOI: http://dx.doi.org/10.4314/njbas.v21i1.4. ISSN 0794-5698. Retrieval and Representation of Nucleotide Sequence of Saccharomyces cerevisiae Cystathionine. Gamma-Lyase (CYS3) Gene in Five Formats. *R. A. Umar, H. Abdullahi and N. Lawal.

  14. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. A Novel Protein Interaction between Nucleotide Binding Domain of Hsp70 and p53 Motif

    Directory of Open Access Journals (Sweden)

    Asita Elengoe

    2015-01-01

    Full Text Available Currently, protein interaction of Homo sapiens nucleotide binding domain (NBD of heat shock 70 kDa protein (PDB: 1HJO with p53 motif remains to be elucidated. The NBD-p53 motif complex enhances the p53 stabilization, thereby increasing the tumor suppression activity in cancer treatment. Therefore, we identified the interaction between NBD and p53 using STRING version 9.1 program. Then, we modeled the three-dimensional structure of p53 motif through homology modeling and determined the binding affinity and stability of NBD-p53 motif complex structure via molecular docking and dynamics (MD simulation. Human DNA binding domain of p53 motif (SCMGGMNR retrieved from UniProt (UniProtKB: P04637 was docked with the NBD protein, using the Autodock version 4.2 program. The binding energy and intermolecular energy for the NBD-p53 motif complex were −0.44 Kcal/mol and −9.90 Kcal/mol, respectively. Moreover, RMSD, RMSF, hydrogen bonds, salt bridge, and secondary structure analyses revealed that the NBD protein had a strong bond with p53 motif and the protein-ligand complex was stable. Thus, the current data would be highly encouraging for designing Hsp70 structure based drug in cancer therapy.

  16. Perception Enhancement using Visual Attributes in Sequence Motif Visualization

    OpenAIRE

    Oon, Yin; Lee, Nung; Kok, Wei

    2016-01-01

    Sequence logo is a well-accepted scientific method to visualize the conservation characteristics of biological sequence motifs. Previous studies found that using sequence logo graphical representation for scientific evidence reports or arguments could seriously cause biases and misinterpretation by users. This study investigates on the visual attributes performance of a sequence logo in helping users to perceive and interpret the information based on preattentive theories and Gestalt principl...

  17. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  18. The MutT motif family of nucleotide phosphohydrolases in man and human pathogens (review).

    Science.gov (United States)

    McLennan, A G

    1999-07-01

    Human cells express at least eight members of the MutT motif protein (or nudix hydrolase) family. These enzymes are believed to eliminate toxic nucleotide derivatives from the cell and regulate the levels of important signalling nucleotides and their metabolites. Six have been fully or partially characterized: i) hMTH1 is a nucleoside triphosphatase which restricts AT-->CG transversions by specifically degrading the oxidized nucleotide 8-oxo-dGTP; ii) hAPAH1 preferentially degrades the signalling dinucleotide Ap4A; iii) DIPP is unusual in hydrolysing two seemingly unrelated signalling substrate groups - the dinucleotides Ap6A and Ap5A, and the diphosphoinositol polyphosphates; iv) DIPP2 is closely related to DIPP; v) hYSAH1 is an NDP-sugar hydrolase which prefers ADP-ribose, and vi) hGFG is a protein of unknown function encoded by the antisense transcript of the basic fibroblast growth factor gene. Although not yet associated with known hereditary or acquired disorders, the functional loss of any one of these hydrolases would be expected to be detrimental to cellular function. Furthermore, the ialA invasion gene of Bartonella bacilliformis and other invasive pathogens encodes a MutT motif Ap4A hydrolase while poxviruses express two MutT motif proteins, at least one of which is essential for infectivity. This protein family, therefore, occupies a position of some importance in controlling human health and disease.

  19. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  20. Discovering sequence motifs in quantitative and qualitative pepetide data

    DEFF Research Database (Denmark)

    Andreatta, Massimo

    the number of experimental tests needed to identify new epitopes. Taken as a whole, this thesis provides a valuable series of algorithms and tools for the analysis of peptide data, both from the point of view of characterization of sequence motifs and the prediction of protein-peptide interactions....... and interpret such data. The first paper in this thesis presents a new, publicly available method based on artificial neural networks that allows custom analysis of quantitative peptide data. The online NNAlign web-server provides a simple yet powerful tool for the discovery of sequence motifs in large...... with the presence of multiple motifs, due to the experimental setup or the actual poly-specificity of the receptor, in peptide data. A new algorithm, based on Gibbs sampling, identifies multiple specificities by performing two tasks simultaneously: alignment and clustering of peptide data. The method, available...

  1. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms...... to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular...

  2. iTriplet, a rule-based nucleic acid sequence motif finder

    Directory of Open Access Journals (Sweden)

    Gunderson Samuel I

    2009-10-01

    Full Text Available Abstract Background With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. Results We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. Conclusion iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.

  3. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  4. Computing distribution of scale independent motifs in biological sequences.

    Science.gov (United States)

    Almeida, Jonas S; Vinga, Susana

    2006-10-18

    The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.

  5. Computing distribution of scale independent motifs in biological sequences

    Directory of Open Access Journals (Sweden)

    Vinga Susana

    2006-10-01

    Full Text Available Abstract The use of Chaos Game Representation (CGR or its generalization, Universal Sequence Maps (USM, to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.

  6. A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation

    Energy Technology Data Exchange (ETDEWEB)

    Bucher, P. [Swiss Institute for Experimental Cancer Research, Lausanne (Switzerland); Bairoch, A. [Centre Medical Universitaire, Geneva (Switzerland)

    1994-12-31

    A general syntax for expressing bimolecular sequence motifs is described, which will be used in future releases of the PROSITE data bank and in a similar collection of nucleic acid sequence motifs currently under development. The central part of the syntax is a regular structure which can be viewed as a generalization of the profiles introduced by Gribskov and coworkers. Accessory features implement specific motif search strategies and provide information helpful for the interpretation of predicted matches. Two contrasting examples, representing E. coli promoters and SH3 domains respectively, are shown to demonstrate the versatility of the syntax, and its compatibility with diverse motif search methods. It is argued, that a comprehensive machine-readable motif collection based on the new syntax, in conjunction with a standard search program, can serve as a general-purpose sequence interpretation and function prediction tool.

  7. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  8. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  9. The nucleotide sequences of two leghemoglobin genes from soybean

    DEFF Research Database (Denmark)

    Wiborg, O; Hyldig-Nielsen, J J; Jensen, E O

    1982-01-01

    We present the complete nucleotide sequences of two leghemoglobin genes isolated from soybean DNA. Both genes contain three intervening sequences in identical positions. Comparison of the coding sequences with known amino-acid sequences of soybean leghemoglobins suggest that the two genes...

  10. Finding Common Sequence and Structure Motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.

    1997-01-01

    We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences......, and comparisons with other approaches, are provided. The solutions include finding consensus structure identical to published ones....

  11. Brief communication Complete nucleotide sequence analysis of ...

    Indian Academy of Sciences (India)

    PRAKASH KUMAR

    21.1, 28.9, 24.4 and 25.6, respectively (Frowd and Tremaine. 1977). Like most of the polyadenylated monopartite positive-strand RNA viruses, the open reading frame (ORF) coding for the viral coat protein (CP) is located at the 3´ end. (Chia et al 1992). The putative polyadenylation signal, the. AATAAA motif, is found in the ...

  12. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  13. Complete nucleotide sequence and organization of the mitogenome ...

    African Journals Online (AJOL)

    Phylogenetic reconstruction using the concatenated 13 amino acid and nucleotide sequences of the protein-coding genes (PCGs) consistently supported a close relationship between Bombycoidea and Geometroidea among six available lepidopteran superfamilies (Tortricoidea, Pyraloidea, Papilionoidea, Bombycoidea, ...

  14. Exact correspondence between walk in nucleotide and protein sequence spaces.

    Directory of Open Access Journals (Sweden)

    Dmitry N Ivankov

    Full Text Available In the course of evolution, genes traverse the nucleotide sequence space, which translates to a trajectory of changes in the protein sequence in protein sequence space. The correspondence between regions of the nucleotide and protein sequence spaces is understood in general but not in detail. One of the unexplored questions is how many sequences a protein can reach with a certain number of nucleotide substitutions in its gene sequence. Here I propose an algorithm to calculate the volume of protein sequence space accessible to a given protein sequence as a function of the number of nucleotide substitutions made in the protein-coding sequence. The algorithm utilizes the power of the dynamic programming approach, and makes all calculations within a couple of seconds on a desktop computer. I apply the algorithm to green fluorescence protein, and get the number of sequences four times higher than estimated before. However, taking into account the astronomically huge size of the protein sequence space, the previous estimate can be considered as acceptable as an order of magnitude estimation. The proposed algorithm has practical applications in the study of evolutionary trajectories in sequence space.

  15. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified...

  16. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......, focusing on oft encountered problems in data processing, such as quality assurance, mapping, normalization, visualization, and interpretation. Presented in the second part are scientific endeavors representing solutions to problems of two sub-genres of next generation sequencing. For the first flavor, RNA-sequencing...

  17. Expressed sequence tags (ESTs) and single nucleotide ...

    African Journals Online (AJOL)

    SERVER

    2008-02-19

    stranded DNA binding dyes or fluorophore-labelled ..... Comparative sequence analysis of plant nuclear genomes: microcolinearity and its many exceptions. Plant Cell 12(7):. 1021-1029. Bertone P, Snyder M (2005). Prospects and ...

  18. [Tabular excel editor for analysis of aligned nucleotide sequences].

    Science.gov (United States)

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  19. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...

  20. A 6-Nucleotide Regulatory Motif within the AbcR Small RNAs of Brucella abortus Mediates Host-Pathogen Interactions.

    Science.gov (United States)

    Sheehan, Lauren M; Caswell, Clayton C

    2017-06-06

    In Brucella abortus , two small RNAs (sRNAs), AbcR1 and AbcR2, are responsible for regulating transcripts encoding ABC-type transport systems. AbcR1 and AbcR2 are required for Brucella virulence, as a double chromosomal deletion of both sRNAs results in attenuation in mice. Although these sRNAs are responsible for targeting transcripts for degradation, the mechanism utilized by the AbcR sRNAs to regulate mRNA in Brucella has not been described. Here, two motifs (M1 and M2) were identified in AbcR1 and AbcR2, and complementary motif sequences were defined in AbcR-regulated transcripts. Site-directed mutagenesis of M1 or M2 or of both M1 and M2 in the sRNAs revealed transcripts to be targeted by one or both motifs. Electrophoretic mobility shift assays revealed direct, concentration-dependent binding of both AbcR sRNAs to a target mRNA sequence. These experiments genetically and biochemically characterized two indispensable motifs within the AbcR sRNAs that bind to and regulate transcripts. Additionally, cellular and animal models of infection demonstrated that only M2 in the AbcR sRNAs is required for Brucella virulence. Furthermore, one of the M2-regulated targets, BAB2_0612, was found to be critical for the virulence of B. abortus in a mouse model of infection. Although these sRNAs are highly conserved among Alphaproteobacteria , the present report displays how gene regulation mediated by the AbcR sRNAs has diverged to meet the intricate regulatory requirements of each particular organism and its unique biological niche. IMPORTANCE Small RNAs (sRNAs) are important components of bacterial regulation, allowing organisms to quickly adapt to changes in their environments. The AbcR sRNAs are highly conserved throughout the Alphaproteobacteria and negatively regulate myriad transcripts, many encoding ABC-type transport systems. In Brucella abortus , AbcR1 and AbcR2 are functionally redundant, as only a double abcR1 abcR2 ( abcR1 / 2 ) deletion results in attenuation in

  1. Nucleotide sequence composition and method for detection of neisseria gonorrhoeae

    Energy Technology Data Exchange (ETDEWEB)

    Lo, A.; Yang, H.L.

    1990-02-13

    This patent describes a composition of matter that is specific for {ital Neisseria gonorrhoeae}. It comprises: at least one nucleotide sequence for which the ratio of the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria gonorrhoeae} to the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria meningitidis} is greater than about five. The ratio being obtained by a method described.

  2. Vaccine-derived mutation in motif D of poliovirus RNA-dependent RNA polymerase lowers nucleotide incorporation fidelity.

    Science.gov (United States)

    Liu, Xinran; Yang, Xiaorong; Lee, Cheri A; Moustafa, Ibrahim M; Smidansky, Eric D; Lum, David; Arnold, Jamie J; Cameron, Craig E; Boehr, David D

    2013-11-08

    All viral RNA-dependent RNA polymerases (RdRps) have a conserved structural element termed motif D. Studies of the RdRp from poliovirus (PV) have shown that a conformational change of motif D leads to efficient and faithful nucleotide addition by bringing Lys-359 into the active site where it serves as a general acid. The RdRp of the Sabin I vaccine strain has Thr-362 changed to Ile. Such a drastic change so close to Lys-359 might alter RdRp function and contribute in some way to the attenuated phenotype of Sabin type I. Here we present our characterization of the T362I RdRp. We find that the T362I RdRp exhibits a mutator phenotype in biochemical experiments in vitro. Using NMR, we show that this change in nucleotide incorporation fidelity correlates with a change in the structural dynamics of motif D. A recombinant PV expressing the T362I RdRp exhibits normal growth properties in cell culture but expresses a mutator phenotype in cells. For example, the T362I-containing PV is more sensitive to the mutagenic activity of ribavirin than wild-type PV. Interestingly, the T362I change was sufficient to cause a statistically significant reduction in viral virulence. Collectively, these studies suggest that residues of motif D can be targeted when changes in nucleotide incorporation fidelity are desired. Given the observation that fidelity mutants can serve as vaccine candidates, it may be possible to use engineering of motif D for this purpose.

  3. Complete nucleotide sequence and gene rearrangement of the ...

    Indian Academy of Sciences (India)

    ... Journals; Journal of Genetics; Volume 93; Issue 3. Complete nucleotide sequence and gene rearrangement of the mitochondrial genome of Occidozyga martensii. En Li Xiaoqiang Li Xiaobing Wu Ge Feng Man Zhang Haitao Shi Lijun Wang Jianping Jiang. Research Article Volume 93 Issue 3 December 2014 pp 631-641 ...

  4. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified......, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed, Example solutions, and comparisons with other...

  5. Nucleotide Sequencing and Identification of Some Wild Mushrooms

    Directory of Open Access Journals (Sweden)

    Sudip Kumar Das

    2013-01-01

    Full Text Available The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India was amplified using ITS1 (Internal Transcribed Spacers 1 and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base of Amanita hemibapha [CN (Chota Nagpur 1, % identity 99 (JX844716.1], Amanita sp. [CN 2, % identity 98 (JX844763.1], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1], Termitomyces sp. [CN 4, % identity 90 (JF746992.1], Termitomyces sp. [CN 5, % identity 99 (GU001667.1], T. microcarpus [CN 6, % identity 82 (EF421077.1], Termitomyces sp. [CN 7, % identity 76 (JF746993.1], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits.

  6. Nucleotide sequencing and identification of some wild mushrooms.

    Science.gov (United States)

    Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari

    2013-01-01

    The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits.

  7. Nucleotide sequence of the triosephosphate isomerase gene from Macaca mulatta

    Energy Technology Data Exchange (ETDEWEB)

    Old, S.E.; Mohrenweiser, H.W. (Univ. of Michigan, Ann Arbor (USA))

    1988-09-26

    The triosephosphate isomerase gene from a rhesus monkey, Macaca mulatta, charon 34 library was sequenced. The human and chimpanzee enzymes differ from the rhesus enzyme at ASN 20 and GLU 198. The nucleotide sequence identity between rhesus and human is 97% in the coding region and >94% in the flanking regions. Comparison of the rhesus and chimp genes, including the intron and flanking sequences, does not suggest a mechanism for generating the two TPI peptides of proliferating cells from hominoids and a single peptide from the rhesus gene.

  8. HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

    Science.gov (United States)

    Le, Thanh; Altman, Tom; Gardiner, Katheleen

    2010-02-01

    Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

  9. The Investigation of Promoter Sequences of Marseilleviruses Highlights a Remarkable Abundance of the AAATATTT Motif in Intergenic Regions.

    Science.gov (United States)

    Oliveira, Graziele Pereira; Lima, Maurício Teixeira; Arantes, Thalita Souza; Assis, Felipe Lopes; Rodrigues, Rodrigo Araújo Lima; da Fonseca, Flávio Guimarães; Bonjardim, Cláudio Antônio; Kroon, Erna Geessien; Colson, Philippe; La Scola, Bernard; Abrahão, Jônatas Santos

    2017-11-01

    Viruses display a wide range of genomic profiles and, consequently, a variety of gene expression strategies. Specific sequences associated with transcriptional processes have been described in viruses, and putative promoter motifs have been elucidated for some nucleocytoplasmic large DNA viruses (NCLDV). Among NCLDV, the Marseilleviridae is a well-recognized family because of its genomic mosaicism. The marseilleviruses have an ability to incorporate foreign genes, especially from sympatric organisms inhabiting Acanthamoeba , its main known host. Here, we identified for the first time an eight-nucleotide A/T-rich promoter sequence (AAATATTT) associated with 55% of marseillevirus genes that is conserved in all marseilleviruses lineages, a higher level of conservation than that of any giant virus described to date. We instigated our prediction about the promoter motif by biological assays and by evaluating how single mutations in this octamer can impact gene expression. The investigation of sequences that regulate the expression of genes relative to lateral transfer revealed that the promoter motifs do not appear to be incorporated by marseilleviruses from donor organisms. Indeed, analyses of the intergenic regions that regulate lateral gene transfer-related genes have revealed an independent origin of the marseillevirus intergenic regions that does not match gene-donor organisms. About 50% of AAATATTT motifs spread throughout intergenic regions of the marseilleviruses are present as multiple copies. We believe that such multiple motifs are associated with increased expression of a given gene or are related to incorporation of foreign genes into the mosaic genome of marseilleviruses. IMPORTANCE The marseilleviruses draw attention because of the peculiar features of their genomes; however, little is known about their gene expression patterns or the factors that regulate those expression patterns. The limited published research on the expression patterns of the

  10. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  11. Sequencing genes in silico using single nucleotide polymorphisms

    Directory of Open Access Journals (Sweden)

    Zhang Xinyi

    2012-01-01

    Full Text Available Abstract Background The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive. Results To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS, which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%. This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes. Conclusions Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate

  12. Complete nucleotide sequence and genome organization of a Cactus virus X strain from Hylocereus undatus (Cactaceae).

    Science.gov (United States)

    Liou, M R; Chen, Y R; Liou, R F

    2004-05-01

    The complete nucleotide sequence of a strain of Cactus virus X (CVX-Hu) isolated from Hylocereus undatus (Cactaceae) has been determined. Excluding the poly(A) tail, the sequence is 6614 nucleotides in length and contains seven open reading frames (ORFs). The genome organization of CVX is similar to that of other potexviruses. ORF1 encodes the putative viral replicase with conserved methyltransferase, helicase, and polymerase motifs. Within ORF1, two other ORFs were located separately in the +2 reading frame, we call these ORF6 and ORF7. ORF2, 3, and 4, which form the "triple gene block" characteristic of the potexviruses, encode proteins with molecular mass of 25, 12, and 7 KDa, respectively. ORF5 encodes the coat protein with an estimated molecular mass of 24 KDa. Sequence analysis indicated that proteins encoded by ORF1-5 display certain degree of homology to the corresponding proteins of other potexviruses. Putative product of ORF6, however, shows no significant similarity to those of other potexviruses. Phylogenetic analyses based on the replicase (the methyltransferase, helicase, and polymerase domains) and coat protein demonstrated a closer relationship of CVX with Bamboo mosaic virus, Cassava common mosaic virus, Foxtail mosaic virus, Papaya mosaic virus, and Plantago asiatica mosaic virus.

  13. Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon.

    Science.gov (United States)

    Fraenkel, Y M; Mandel, Y; Friedberg, D; Margalit, H

    1995-08-01

    We describe a relatively simple method for the identification of common motifs in DNA sequences that are known to share a common function. The input sequences are unaligned and there is no information regarding the position or orientation of the motif. Often such data exists for protein-binding regions, where genetic or molecular information that defines the binding region is available, but the specific recognition site within it is unknown. The method is based on the principle of 'divide and conquer'; we first search for dominant submotifs and then build full-length motifs around them. This method has several useful features: (i) it screens all submotifs so that the results are independent of the sequence order in the data; (ii) it allows the submotifs to contain spacers; (iii) it identifies an existing motif even if the data contains 'noise'; (iv) its running time depends linearly on the total length of the input. The method is demonstrated on two groups of protein-binding sequences: a well-studied group of known CRP-binding sequences, and a relatively newly identified group of genes known to be regulated by Lrp. The Lrp motif that we identify, based on 23 gene sequences, is similar to a previously identified motif based on a smaller data set, and to a consensus sequence of experimentally defined binding sites. Individual Lrp sites are evaluated and compared in regard to their regulation mode.

  14. Complete nucleotide sequences of avian metapneumovirus subtype B genome.

    Science.gov (United States)

    Sugiyama, Miki; Ito, Hiroshi; Hata, Yusuke; Ono, Eriko; Ito, Toshihiro

    2010-12-01

    Complete nucleotide sequences were determined for subtype B avian metapneumovirus (aMPV), the attenuated vaccine strain VCO3/50 and its parental pathogenic strain VCO3/60616. The genomes of both strains comprised 13,508 nucleotides (nt), with a 42-nt leader at the 3'-end and a 46-nt trailer at the 5'-end. The genome contains eight genes in the order 3'-N-P-M-F-M2-SH-G-L-5', which is the same order shown in the other metapneumoviruses. The genes are flanked on either side by conserved transcriptional start and stop signals and have intergenic sequences varying in length from 1 to 88 nt. Comparison of nt and predicted amino acid (aa) sequences of VCO3/60616 with those of other metapneumoviruses revealed higher homology with aMPV subtype A virus than with other metapneumoviruses. A total of 18 nt and 10 deduced aa differences were seen between the strains, and one or a combination of several differences could be associated with attenuation of VCO3/50.

  15. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    Science.gov (United States)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  16. Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells

    Directory of Open Access Journals (Sweden)

    Valentina eBoeva

    2016-02-01

    Full Text Available Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.

  17. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  18. Structure and sequence motifs in the HIV-1 RNA genome

    NARCIS (Netherlands)

    van Bel, N.

    2015-01-01

    The untranslated leader of the HIV-1 RNA genome contains some 350 nucleotides and is highly conserved among virus isolates. Several characteristic hairpin structures that regulate important virus replication steps, such as dimerization and packaging in virion particles, are clustered in this leader.

  19. Viroids: from genotype to phenotype just relying on RNA sequence and structural motifs

    Directory of Open Access Journals (Sweden)

    Ricardo eFlores

    2012-06-01

    Full Text Available As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson-Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunvioidae adopt multibranched conformations occasionally stabilized by kissing loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunvioidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures ⎯either global or local ⎯ determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs.

  20. Quantifying single nucleotide variant detection sensitivity in exome sequencing.

    Science.gov (United States)

    Meynert, Alison M; Bicknell, Louise S; Hurles, Matthew E; Jackson, Andrew P; Taylor, Martin S

    2013-06-18

    The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed. Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give "power estimates" for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5-15% of heterozygous and 1-4% of homozygous SNVs in the targeted regions will be missed. Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of

  1. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  2. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    Science.gov (United States)

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  3. Nucleotide sequences specific to Brucella and methods for the detection of Brucella

    Energy Technology Data Exchange (ETDEWEB)

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA

    2009-02-24

    Nucleotide sequences specific to Brucella that serves as a marker or signature for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  4. A 6-Nucleotide Regulatory Motif within the AbcR Small RNAs of Brucella abortus Mediates Host-Pathogen Interactions

    Science.gov (United States)

    Sheehan, Lauren M.

    2017-01-01

    ABSTRACT In Brucella abortus, two small RNAs (sRNAs), AbcR1 and AbcR2, are responsible for regulating transcripts encoding ABC-type transport systems. AbcR1 and AbcR2 are required for Brucella virulence, as a double chromosomal deletion of both sRNAs results in attenuation in mice. Although these sRNAs are responsible for targeting transcripts for degradation, the mechanism utilized by the AbcR sRNAs to regulate mRNA in Brucella has not been described. Here, two motifs (M1 and M2) were identified in AbcR1 and AbcR2, and complementary motif sequences were defined in AbcR-regulated transcripts. Site-directed mutagenesis of M1 or M2 or of both M1 and M2 in the sRNAs revealed transcripts to be targeted by one or both motifs. Electrophoretic mobility shift assays revealed direct, concentration-dependent binding of both AbcR sRNAs to a target mRNA sequence. These experiments genetically and biochemically characterized two indispensable motifs within the AbcR sRNAs that bind to and regulate transcripts. Additionally, cellular and animal models of infection demonstrated that only M2 in the AbcR sRNAs is required for Brucella virulence. Furthermore, one of the M2-regulated targets, BAB2_0612, was found to be critical for the virulence of B. abortus in a mouse model of infection. Although these sRNAs are highly conserved among Alphaproteobacteria, the present report displays how gene regulation mediated by the AbcR sRNAs has diverged to meet the intricate regulatory requirements of each particular organism and its unique biological niche. PMID:28588127

  5. Functional Mutations Form at CTCF-Cohesin Binding Sites in Melanoma Due to Uneven Nucleotide Excision Repair across the Motif.

    Science.gov (United States)

    Poulos, Rebecca C; Thoms, Julie A I; Guan, Yi Fang; Unnikrishnan, Ashwin; Pimanda, John E; Wong, Jason W H

    2016-12-13

    CTCF binding sites are frequently mutated in cancer, but how these mutations accumulate and whether they broadly perturb CTCF binding are not well understood. Here, we report that skin cancers exhibit a highly specific asymmetric mutation pattern within CTCF motifs attributable to ultraviolet irradiation and differential nucleotide excision repair (NER). CTCF binding site mutations form independently of replication timing and are enriched at sites of CTCF/cohesin complex binding, suggesting a role for cohesin in stabilizing CTCF-DNA binding and impairing NER. Performing CTCF ChIP-seq in a melanoma cell line, we show CTCF binding site mutations to be functional by demonstrating allele-specific reduction of CTCF binding to mutant alleles. While topologically associating domains with mutated CTCF anchors in melanoma contain differentially expressed cancer-associated genes, CTCF motif mutations appear generally under neutral selection. However, the frequency and potential functional impact of such mutations in melanoma highlights the need to consider their impact on cellular phenotype in individual genomes. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  6. Novel Nucleotide Variations, Haplotypes Structure and Associations with Growth Related Traits of Goat AT Motif-Binding Factor ( Gene

    Directory of Open Access Journals (Sweden)

    Xiaoyan Zhang

    2015-10-01

    Full Text Available The AT motif-binding factor (ATBF1 not only interacts with protein inhibitor of activated signal transducer and activator of transcription 3 (STAT3 (PIAS3 to suppress STAT3 signaling regulating embryo early development and cell differentiation, but is required for early activation of the pituitary specific transcription factor 1 (Pit1 gene (also known as POU1F1 critically affecting mammalian growth and development. The goal of this study was to detect novel nucleotide variations and haplotypes structure of the ATBF1 gene, as well as to test their associations with growth-related traits in goats. Herein, a total of seven novel single nucleotide polymorphisms (SNPs (SNP 1-7 within this gene were found in two well-known Chinese native goat breeds. Haplotypes structure analysis demonstrated that there were four haplotypes in Hainan black goat while seventeen haplotypes in Xinong Saanen dairy goat, and both breeds only shared one haplotype (hap1. Association testing revealed that the SNP2, SNP5, SNP6, and SNP7 loci were also found to significantly associate with growth-related traits in goats, respectively. Moreover, one diplotype in Xinong Saanen dairy goats significantly linked to growth related traits. These preliminary findings not only would extend the spectrum of genetic variations of the goat ATBF1 gene, but also would contribute to implementing marker-assisted selection in genetics and breeding in goats.

  7. Deep sequencing of phage-displayed peptide libraries reveals sequence motif that detects norovirus

    Science.gov (United States)

    Hurwitz, Amy M.; Huang, Wanzhi; Estes, Mary K.; Atmar, Robert L.; Palzkill, Timothy

    2017-01-01

    Norovirus infections are the leading cause of non-bacterial gastroenteritis and result in about 21 million new cases and $2 billion in costs per year in the United States. Existing diagnostics have limited feasibility for point-of-care applications, so there is a clear need for more reliable, rapid, and simple-to-use diagnostic tools in order to contain outbreaks and prevent inappropriate treatments. In this study, a combination of phage display technology, deep sequencing and computational analysis was used to identify 12-mer peptides with specific binding to norovirus genotype GI.1 virus-like particles (VLPs). After biopanning, phage populations were sequenced and analyzed to identify a consensus peptide motif—YRSWXP. Two 12-mer peptides containing this sequence, NV-O-R5-3 and NV-O-R5-6, were further characterized to evaluate the motif's functional ability to detect VLPs and virus. Results indicated that these peptides effectively detect GI.1 VLPs in solid-phase peptide arrays, ELISAs and dot blots. Further, their specificity for the S-domain of the major capsid protein enables them to detect a wide range of GI and GII norovirus genotypes. Both peptides were able to detect virus in norovirus-positive clinical stool samples. Overall, the work reported here demonstrates the application of phage display coupled with next generation sequencing and computational analysis to uncover peptides with specific binding ability to a target protein for diagnostic applications. Further, the reagents characterized here can be integrated into existing diagnostic formats to detect clinically relevant genotypes of norovirus in stool. PMID:28035012

  8. Nucleotide sequence of the human N-myc gene

    International Nuclear Information System (INIS)

    Stanton, L.W.; Schwab, M.; Bishop, J.M.

    1986-01-01

    Human neuroblastomas frequently display amplification and augmented expression of a gene known as N-myc because of its similarity to the protooncogene c-myc. It has therefore been proposed that N-myc is itself a protooncogene, and subsequent tests have shown that N-myc and c-myc have similar biological activities in cell culture. The authors have now detailed the kinship between N-myc and c-myc by determining the nucleotide sequence of human N-myc and deducing the amino acid sequence of the protein encoded by the gene. The topography of N-myc is strikingly similar to that of c-myc: both genes contain three exons of similar lengths; the coding elements of both genes are located in the second and third exons; and both genes have unusually long 5' untranslated regions in their mRNAs, with features that raise the possibility that expression of the genes may be subject to similar controls of translation. The resemblance between the proteins encoded by N-myc and c-myc sustains previous suspicions that the genes encode related functions

  9. ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data.

    Science.gov (United States)

    Heller, David; Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

    2017-11-02

    RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. PDL1 Signals through Conserved Sequence Motifs to Overcome Interferon-Mediated Cytotoxicity

    Directory of Open Access Journals (Sweden)

    Maria Gato-Cañas

    2017-08-01

    Full Text Available PDL1 blockade produces remarkable clinical responses, thought to occur by T cell reactivation through prevention of PDL1-PD1 T cell inhibitory interactions. Here, we find that PDL1 cell-intrinsic signaling protects cancer cells from interferon (IFN cytotoxicity and accelerates tumor progression. PDL1 inhibited IFN signal transduction through a conserved class of sequence motifs that mediate crosstalk with IFN signaling. Abrogation of PDL1 expression or antibody-mediated PDL1 blockade strongly sensitized cancer cells to IFN cytotoxicity through a STAT3/caspase-7-dependent pathway. Moreover, somatic mutations found in human carcinomas within these PDL1 sequence motifs disrupted motif regulation, resulting in PDL1 molecules with enhanced protective activities from type I and type II IFN cytotoxicity. Overall, our results reveal a mode of action of PDL1 in cancer cells as a first line of defense against IFN cytotoxicity.

  11. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  12. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Directory of Open Access Journals (Sweden)

    Jieming Shi

    Full Text Available Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  13. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Science.gov (United States)

    Shi, Jieming; Li, Xi; Dong, Min; Graham, Mitchell; Yadav, Nehul; Liang, Chun

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  14. JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures

    Science.gov (United States)

    Dong, Min; Graham, Mitchell; Yadav, Nehul

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416

  15. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  16. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  17. Complete nucleotide sequence of a monopartite Begomovirus and associated satellites infecting Carica papaya in Nepal.

    Science.gov (United States)

    Shahid, M S; Yoshida, S; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T

    2013-06-01

    Carica papaya (papaya) is a fruit crop that is cultivated mostly in kitchen gardens throughout Nepal. Leaf samples of C. papaya plants with leaf curling, vein darkening, vein thickening, and a reduction in leaf size were collected from a garden in Darai village, Rampur, Nepal in 2010. Full-length clones of a monopartite Begomovirus, a betasatellite and an alphasatellite were isolated. The complete nucleotide sequence of the Begomovirus showed the arrangement of genes typical of Old World begomoviruses with the highest nucleotide sequence identity (>99 %) to an isolate of Ageratum yellow vein virus (AYVV), confirming it as an isolate of AYVV. The complete nucleotide sequence of betasatellite showed greater than 89 % nucleotide sequence identity to an isolate of Tomato leaf curl Java betasatellite originating from Indonesian. The sequence of the alphasatellite displayed 92 % nucleotide sequence identity to Sida yellow vein China alphasatellite. This is the first identification of these components in Nepal and the first time they have been identified in papaya.

  18. The nucleotide sequence of 5S rRNA from a red alga, Porphyra yezoensis.

    OpenAIRE

    Takaiwa, F; Kusuda, M; Saga, N; Sugiura, M

    1982-01-01

    The nucleotide sequence of 5S rRNA from Porphyra yezoensis has been determined to be: pACGUACGGCCAUAUCCGAGACACGCGUACCGGAACCCAUUCCGAAUUCCGAAGUCAAGCGUCCGCGAGUUGGGUUAGU - AAUCUGGUGAAAGAUCACAGGCGAACCCCCAAUGCUGUACGUC. This 5S rRNA sequence is most similar to that of Euglena gracilis (63% homology).

  19. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Melon Transcriptome Characterization: Simple Sequence Repeats and Single Nucleotide Polymorphisms Discovery for High Throughput Genotyping across the Species

    Directory of Open Access Journals (Sweden)

    José Miguel Blanca

    2011-07-01

    Full Text Available Melon ( L. ranks among the highest-valued fruit crops worldwide. Some genomic tools are available for this crop, including a Sanger transcriptome. We report the generation of 689,054 high-quality expressed sequence tags (ESTs from two 454 sequencing runs, using normalized and nonnormalized complementary DNA (cDNA libraries prepared from four genotypes belonging to the two subspecies and the main commercial types. 454 ESTs were combined with the Sanger available ESTs and de novo assembled into 53,252 unigenes. Over 63% of the unigenes were functionally annotated with Gene Ontology (GO terms and 21% had known orthologs of (L. Heynh. Annotation distribution followed similar tendencies than that reported for , suggesting that the dataset represents a fairly complete melon transcriptome. Furthermore, we identified a set of 3298 unigenes with microsatellite motifs and 14,417 sequences with single nucleotide variants of which 11,655 single nucleotide polymorphism met criteria for use with high-throughput genotyping platforms, and 453 could be detected as cleaved amplified polymorphic sequence (CAPS. A set of markers were validated, 90% of them being polymorphic in a number of variable accessions. This transcriptome provides an invaluable new tool for biological research, more so when it includes transcripts not described previously. It is being used for genome annotation and has provided a large collection of markers that will allow speeding up the process of breeding new melon varieties.

  1. Complete nucleotide sequence of Hibiscus infecting Cilevirus Florida isolate and its relationship with closely associated Cileviruses

    Science.gov (United States)

    The complete nucleotide sequence of a recently discovered Florida (FL) isolate of Hibiscus infecting Cilevirus (HiCV) was determined by Sanger sequencing. The movement- and coat- protein gene sequences of the HiCV-FL isolate are more divergent than other genes of the previously sequenced HiCV-HA (Ha...

  2. Rasp21 sequences opposite the nucleotide binding pocket are required for GRF-mediated nucleotide release

    DEFF Research Database (Denmark)

    Leonardsen, L; DeClue, J E; Lybaek, H

    1996-01-01

    The substrate requirements for the catalytic activity of the mouse Cdc25 homolog Guanine nucleotide Release Factor, GRF, were determined using the catalytic domain of GRF expressed in insect cells and E. coli expressed H-Ras mutants. We found a requirement for the loop 7 residues in Ras (amino ac...... and the human Ras like proteins RhoA, Rap1A, Rac1 and G25K revealed a strict Ras specificity; of these only S. pombe Ras was GRF sensitive....

  3. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction.

    Directory of Open Access Journals (Sweden)

    Aalt D J van Dijk

    Full Text Available Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and

  4. Complete nucleotide sequence and genome organization of a novel allexivirus from alfalfa (Medicago sativa)

    Science.gov (United States)

    A new species of the family Alphaflexiviridae provisionally named Alfalfa virus S (AVS) was diagnosed in alfalfa samples originating from Sudan. A complete nucleotide sequence of the viral genome consisting of 8,349 nucleotides excluding the 3’ poly(A) tail was determined by Illumina NGS technology ...

  5. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    Science.gov (United States)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  6. The nucleotide sequence of threonine transfer RNA coded by bacteriophage T4

    International Nuclear Information System (INIS)

    Guthrie, C.; Scholla, C.A.; Yesian, H.; Abelson, J.

    1978-01-01

    The nucleotide sequence of a low molecular weight RNA coded by bacteriophage T4 (and previously identified as species α) has been determined. The molecule is of particular biological interest for its associated biosynthetic properties. This RNA is 76 nucleotides in length, contains eight modified bases, and can be arranged in a cloverleaf configuration common to tRNAs. The anticodon sequence is UGU, which corresponds to the threonine-specific codons ACsub(G)sup(A). The nucleotide sequence was determined primarily by nearest-neighbour analysis of RNA synthesized in vitro using [α- 32 P] nucleoside triphosphates. Using the single-strand specific nuclease S1, two in vivo labelled half-molecules were generated and analysed. This information together with restrictions imposed by nearest-neighbour data, provided a unique linear sequence of nucleotides with the features of secondary structure common to tRNA molecules. (author)

  7. Globalizing Genomics: The Origins of the International Nucleotide Sequence Database Collaboration.

    Science.gov (United States)

    Stevens, Hallam

    2017-10-06

    Genomics is increasingly considered a global enterprise - the fact that biological information can flow rapidly around the planet is taken to be important to what genomics is and what it can achieve. However, the large-scale international circulation of nucleotide sequence information did not begin with the Human Genome Project. Efforts to formalize and institutionalize the circulation of sequence information emerged concurrently with the development of centralized facilities for collecting that information. That is, the very first databases build for collecting and sharing DNA sequence information were, from their outset, international collaborative enterprises. This paper describes the origins of the International Nucleotide Sequence Database Collaboration between GenBank in the United States, the European Molecular Biology Laboratory Databank, and the DNA Database of Japan. The technical and social groundwork for the international exchange of nucleotide sequences created the conditions of possibility for imagining nucleotide sequences (and subsequently genomes) as a "global" objects. The "transnationalism" of nucleotide sequence was critical to their ontology - what DNA sequences came to be during the Human Genome Project was deeply influenced by international exchange.

  8. Nucleotide sequence of Zygosaccharomyces bailii virus Z: Evidence for +1 programmed ribosomal frameshifting and for assignment to family Amalgaviridae.

    Science.gov (United States)

    Depierreux, Delphine; Vong, Minh; Nibert, Max L

    2016-06-02

    Zygosaccharomyces bailii virus Z (ZbV-Z) is a monosegmented dsRNA virus that infects the yeast Zygosaccharomyces bailii and remains unclassified to date despite its discovery >20years ago. The previously reported nucleotide sequence of ZbV-Z (GenBank AF224490) encompasses two nonoverlapping long ORFs: upstream ORF1 encoding the putative coat protein and downstream ORF2 encoding the RNA-dependent RNA polymerase (RdRp). The lack of overlap between these ORFs raises the question of how the downstream ORF is translated. After examining the previous sequence of ZbV-Z, we predicted that it contains at least one sequencing error to explain the nonoverlapping ORFs, and hence we redetermined the nucleotide sequence of ZbV-Z, derived from the same isolate of Z. bailii as previously studied, to address this prediction. The key finding from our new sequence, which includes several insertions, deletions, and substitutions relative to the previous one, is that ORF2 in fact overlaps ORF1 in the +1 frame. Moreover, a proposed sequence motif for +1 programmed ribosomal frameshifting, previously noted in influenza A viruses, plant amalgaviruses, and others, is also present in the newly identified ORF1-ORF2 overlap region of ZbV-Z. Phylogenetic analyses provided evidence that ZbV-Z represents a distinct taxon most closely related to plant amalgaviruses (genus Amalgavirus, family Amalgaviridae). We conclude that ZbV-Z is the prototype of a new species, which we propose to assign as type species of a new genus of monosegmented dsRNA mycoviruses in family Amalgaviridae. Comparisons involving other unclassified mycoviruses with RdRps apparently related to those of plant amalgaviruses, and having either mono- or bisegmented dsRNA genomes, are also discussed. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Peptomics, identification of novel cationic Arabidopsis peptides with conserved sequence motifs

    DEFF Research Database (Denmark)

    Olsen, Addie Nina; Mundy, John; Skriver, Karen

    2002-01-01

    function. Annotation of the Arabidopsis genome sequence has made it possible to identify peptide-encoding genes. However, such annotational identification is impeded because small genes are poorly predicted by gene-prediction algorithms, thus prompting the alternative approaches described here. We...... initially performed a systematic analysis of short polypeptides encoded by annotated genes on two Arabidopsis chromosomes using SignalP to identify potentially secreted peptides. Subsequent homology searches with selected, putatively secreted peptides, led to the identification of a potential, large...... Arabidopsis family of 34 genes. The predicted peptides are characterized by a conserved C-terminal sequence motif and additional primary structure conservation in a core region. The majority of these genes had not previously been annotated. A subset of the predicted peptides show high overall sequence...

  10. Identification of sequence motifs involved in Dengue virus-host interactions.

    Science.gov (United States)

    Asnet Mary, J; Paramasivan, R; Shenbagarathai, R

    2016-01-01

    Dengue fever is a rapidly spreading mosquito-borne virus infection, which remains a serious global public health problem. As there is no specific treatment or commercial vaccine available for effective control of the disease, the attempts on developing novel control strategies are underway. Viruses utilize the surface receptor proteins of host to enter into the cells. Though various proteins were said to be receptors of Dengue virus (DENV) using Virus Overlay Protein Binding Assay, the precise interaction between DENV and host is not explored. Understanding the structural features of domain III envelope glycoprotein would help in developing efficient antiviral inhibitors. Therefore, an attempt was made to identify the sequence motifs present in domain III envelope glycoprotein of Dengue virus. Computational analysis revealed that the NGR motif is present in the domain III envelope glycoprotein of DENV-1 and DENV-3. Similarly, DENV-1, DENV-2 and DENV-4 were found to contain Yxxphi motif which is a tyrosine-based sorting signal responsible for the interaction with a mu subunit of adaptor protein complex. High-throughput virtual screening resulted in five compounds as lead molecules based on glide score, which ranges from -4.664 to -6.52 kcal/Mol. This computational prediction provides an additional tool for understanding the virus-host interactions and helps to identify potential targets in the host. Further, experimental evidence is warranted to confirm the virus-host interactions and also inhibitory activity of reported lead compounds.

  11. Complete nucleotide sequence and gene rearrangement of the ...

    Indian Academy of Sciences (India)

    Species classification and sequence accession numbers of amphibian mt genomes used in phylogenetic analyses. Taxon. Species. Family. GenBank no. Archaeobatrachia. Bombina fortinuptialis. Bombinatoridae. AY458591. B. orientalis. Bombinatoridae. AY585338. B. variegata. Bombinatoridae. AY971143. B. maxima.

  12. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......, focusing on oft encountered problems in data processing, such as quality assurance, mapping, normalization, visualization, and interpretation. Presented in the second part are scientific endeavors representing solutions to problems of two sub-genres of next generation sequencing. For the first flavor, RNA-sequencing...

  13. Evaluation of sequence motifs found in scaffold/matrix-attached regions (S/MARs)

    OpenAIRE

    Liebich, I.; Bode, J.; Reuter, I.; Wingender, E.

    2002-01-01

    Based on the contents of the database S/MARt DB, the most comprehensive data collection of scaffold/matrix-attached regions (S/MARs) publicly available thus far, we initiated a systematic evaluation of the stored data. By analyzing the 245 S/MAR sequences presently described in this database, we found that the S/MARs contained in this collection are generally AT-rich, with certain significant exceptions. Comparative analyses showed that most of the AT-rich motifs which were found to be enrich...

  14. Role of two sequence motifs of mesencephalic astrocyte-derived neurotrophic factor in its survival-promoting activity.

    Science.gov (United States)

    Mätlik, K; Yu, Li-ying; Eesmaa, A; Hellman, M; Lindholm, P; Peränen, J; Galli, E; Anttila, J; Saarma, M; Permi, P; Airavaara, M; Arumäe, U

    2015-12-31

    Mesencephalic astrocyte-derived neurotrophic factor (MANF) is a prosurvival protein that protects the cells when applied intracellularly in vitro or extracellularly in vivo. Its protective mechanisms are poorly known. Here we studied the role of two short sequence motifs within the carboxy-(C) terminal domain of MANF in its neuroprotective activity: the CKGC sequence (a CXXC motif) that could be involved in redox reactions, and the C-terminal RTDL sequence, an endoplasmic reticulum (ER) retention signal. We mutated these motifs and analyzed the antiapoptotic effect and intracellular localization of these mutants of MANF when overexpressed in cultured sympathetic or sensory neurons. As an in vivo model for studying the effect of these mutants after their extracellular application, we used the rat model of cerebral ischemia. Even though we found no evidence for oxidoreductase activity of MANF, the mutation of CXXC motif completely abolished its protective effect, showing that this motif is crucial for both MANF's intracellular and extracellular activity. The RTDL motif was not needed for the neuroprotective activity of MANF after its extracellular application in the stroke model in vivo. However, in vitro the deletion of RTDL motif inactivated MANF in the sympathetic neurons where the mutant protein localized to Golgi, but not in the sensory neurons where the mutant localized to the ER, showing that intracellular MANF protects these peripheral neurons in vitro only when localized to the ER.

  15. DNA sequence analysis of cagA 3' motifs of Helicobacter pylori strains from patients with peptic ulcer diseases.

    Science.gov (United States)

    Salih, Barik A; Bolek, Bora Kazim; Arikan, Soykan

    2010-02-01

    The Helicobacter pylori cagA gene is a major virulence factor that plays an important role in gastric pathologies. DNA sequence data for the cagA 3' region of Western isolates differ markedly in their EPIYA motifs from those of East Asian isolates. An increase in the number of these motifs is known to be associated with gastric cancer. Whether such an association is also the case for peptic ulceration was investigated in this study. Gastric biopsies were collected from 96 patients with duodenal ulcer (DU), gastric ulcer (GU) and gastritis. The types of EPIYA motif detected by PCR among 28 DU strains were 13 ABC, eight ABCC, six ABCCC, and in one patient both ABC and ABCCCCC; among nine GU strains were two ABC, five ABCC and two ABCCC; and among 40 gastritis strains were 35 ABC and five ABCC. DNA sequencing was carried out to confirm the detection of the EPIYA motif types and to analyse their peptide sequences. A significant association was found between the number of the EPIYA-C motifs (>or=2) and peptic ulceration (P=0.00001) compared with gastritis. In conclusion, this study shows that our patients harboured cagA-positive H. pylori strains with EPIYA motifs of the Western type and that the increase in the number of EPIYA-C motifs was significantly associated with DU and GU but not with gastritis, indicating predictive association with the severity of the disease.

  16. Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview.

    Science.gov (United States)

    Karvelis, Tautvydas; Gasiunas, Giedrius; Siksnys, Virginijus

    2017-05-15

    Recently the Cas9, an RNA guided DNA endonuclease, emerged as a powerful tool for targeted genome manipulations. Cas9 protein can be reprogrammed to cleave, bind or nick any DNA target by simply changing crRNA sequence, however a short nucleotide sequence, termed PAM, is required to initiate crRNA hybridization to the DNA target. PAM sequence is recognized by Cas9 protein and must be determined experimentally for each Cas9 variant. Exploration of Cas9 orthologs could offer a diversity of PAM sequences and novel biochemical properties that may be beneficial for genome editing applications. Here we briefly review and compare Cas9 PAM identification assays that can be adopted for other PAM-dependent CRISPR-Cas systems. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...

  18. Nucleotide sequence of a human tRNA gene heterocluster

    International Nuclear Information System (INIS)

    Chang, Y.N.; Pirtle, I.L.; Pirtle, R.M.

    1986-01-01

    Leucine tRNA from bovine liver was used as a hybridization probe to screen a human gene library harbored in Charon-4A of bacteriophage lambda. The human DNA inserts from plaque-pure clones were characterized by restriction endonuclease mapping and Southern hybridization techniques, using both [3'- 32 P]-labeled bovine liver leucine tRNA and total tRNA as hybridization probes. An 8-kb Hind III fragment of one of these γ-clones was subcloned into the Hind III site of pBR322. Subsequent fine restriction mapping and DNA sequence analysis of this plasmid DNA indicated the presence of four tRNA genes within the 8-kb DNA fragment. A leucine tRNA gene with an anticodon of AAG and a proline tRNA gene with an anticodon of AGG are in a 1.6-kb subfragment. A threonine tRNA gene with an anticodon of UGU and an as yet unidentified tRNA gene are located in a 1.1-kb subfragment. These two different subfragments are separated by 2.8 kb. The coding regions of the three sequenced genes contain characteristic internal split promoter sequences and do not have intervening sequences. The 3'-flanking region of these three genes have typical RNA polymerase III termination sites of at least four consecutive T residues

  19. Organization and Nucleotide Sequences of Two Lactococcal Bacteriocin Operons

    NARCIS (Netherlands)

    Belkum, Marco J. van; Hayema, Bert Jan; Jeeninga, Rienk E.; Kok, Jan; Venema, Gerard

    Two distinct regions of the Lactococcus lactis subsp. cremoris 9B4 plasmid p9B4-6, each of which specified bacteriocin production as well as immunity, have been sequenced and analyzed by deletion and frameshift mutation analyses. On a 1.8-kb ScaI-ClaI fragment specifying low antagonistic activity,

  20. Unique structural features and sequence motifs of proline utilization A (PutA).

    Science.gov (United States)

    Singh, Ranjan K; Tanner, John J

    2012-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20-30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100-200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA.

  1. Resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods.

    Directory of Open Access Journals (Sweden)

    Kouki Yonezawa

    Full Text Available A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm-called the closest-neighbor trimming method-that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and [Formula: see text]-medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias.

  2. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences.

    Directory of Open Access Journals (Sweden)

    Michael J McDonald

    2011-06-01

    Full Text Available The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.

  3. Nucleotide Sequence of the Protective Antigen Gene of Bacillus Anthracis

    Science.gov (United States)

    1988-02-02

    the bands excised, and the DNA extracted with phenol for cloning in M13. 6 Nuclotida sequence analysis. The two fragments were each cloned into phages ...E.c. ToxA a £scherichia coli heat-labile enterotoxin A gene (45) V.c. CTxA - Vibrio ctolerae cholera toxin alfa subunlt gene (28) atotal nurber of specific amino acid residues deduced from p’otective antigen gene.

  4. Complete nucleotide sequence of Alfalfa mosaic virus isolated from alfalfa (Medicago sativa L.) in Argentina.

    Science.gov (United States)

    Trucco, Verónica; de Breuil, Soledad; Bejerman, Nicolás; Lenardon, Sergio; Giolitti, Fabián

    2014-06-01

    The complete nucleotide sequence of an Alfalfa mosaic virus (AMV) isolate infecting alfalfa (Medicago sativa L.) in Argentina, AMV-Arg, was determined. The virus genome has the typical organization described for AMV, and comprises 3,643, 2,593, and 2,038 nucleotides for RNA1, 2 and 3, respectively. The whole genome sequence and each encoding region were compared with those of other four isolates that have been completely sequenced from China, Italy, Spain and USA. The nucleotide identity percentages ranged from 95.9 to 99.1 % for the three RNAs and from 93.7 to 99 % for the protein 1 (P1), protein 2 (P2), movement protein and coat protein (CP) encoding regions, whereas the amino acid identity percentages of these proteins ranged from 93.4 to 99.5 %, the lowest value corresponding to P2. CP sequences of AMV-Arg were compared with those of other 25 available isolates, and the phylogenetic analysis based on the CP gene was carried out. The highest percentage of nucleotide sequence identity of the CP gene was 98.3 % with a Chinese isolate and 98.6 % at the amino acid level with four isolates, two from Italy, one from Brazil and the remaining one from China. The phylogenetic analysis showed that AMV-Arg is closely related to subgroup I of AMV isolates. To our knowledge, this is the first report of a complete nucleotide sequence of AMV from South America and the first worldwide report of complete nucleotide sequence of AMV isolated from alfalfa as natural host.

  5. Flow cytometry-assisted cloning of specific sequence motifs from complex 16S rRNA gene libraries

    DEFF Research Database (Denmark)

    Nielsen, J. L.; Schramm, A.; Engh, G. van den

    2004-01-01

    A How cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant i...... in a clone library of environmental 16S rRNA genes....

  6. Long-range macromolecule interaction and “speed reading” long nucleotide sequences in DNA

    International Nuclear Information System (INIS)

    Namiot, V.A.; Anashkina, A.A.; Filatov, I.V.; Tumanyan, V.G.; Esipova, N.G.

    2013-01-01

    Methods based on the phenomenon of the specific long-range interaction between long macromolecules proposed for “speed reading” nucleotide sequences in single DNA molecules. One way is to measure the electric field potential along the preliminary stretched double DNA strand. Another way of information “reading” is to measure deformation of strand elements caused by an electric field that is generated by the “straightening” electrode due to an alternating voltage applied to it. On the base of the obtained information the sequence of nucleotides in the strand could be determined in principle.

  7. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    Science.gov (United States)

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  8. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Science.gov (United States)

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Symbols and format to be used... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  9. Moloney murine sarcoma virus MuSVts110 DNA: cloning, nucleotide sequence, and gene expression.

    Science.gov (United States)

    Huai, L; Chiocca, S M; Gilbreth, M A; Ainsworth, J R; Bishop, L A; Murphy, E C

    1992-09-01

    We have cloned Moloney murine sarcoma virus (MuSV) MuSVts110 DNA by assembly of polymerase chain reaction (PCR)-amplified segments of integrated viral DNA from infected NRK cells (6m2 cells) and determined its complete sequence. Previously, by direct sequencing of MuSVts110 RNA transcribed in 6m2 cells, we established that the thermosensitive RNA splicing phenotype uniquely characteristic of MuSVts110 results from a deletion of 1,487 nucleotides of progenitor MuSV-124 sequences. As anticipated, the sequence obtained in this study contained precisely this same deletion. In addition, several other unexpected sequence differences were found between MuSVts110 and MuSV-124. For example, in the noncoding region upstream of the gag gene, MuSVts110 DNA contained a 52-nucleotide tract typical of murine leukemia virus rather than MuSV-124, suggesting that MuSVts110 originated as a MuSV-helper murine leukemia virus recombinant during reverse transcription rather than from a straightforward deletion within MuSV-124. In addition, both MuSVts110 long terminal repeats contained head-to-tail duplications of eight nucleotides in the U3 region. Finally, seven single-nucleotide substitutions were found scattered throughout MuSVts110 DNA. Three of the nucleotide substitutions were in the gag gene, resulting in one coding change in p15 and one in p30. All of the remaining nucleotide changes were found in the noncoding region between the 5' long terminal repeat and the gag gene. In NIH 3T3 cells transfected with the cloned MuSVts110 DNA, the pattern of viral RNA expression conformed with that observed in cells infected with authentic MuSVts110 virus in that viral RNA splicing was 30 to 40% efficient at growth temperatures between 28 and 33 degrees C but reduced to trace levels above 37 degrees C.

  10. Nucleotide composition of CO1 sequences in Chelicerata (Arthropoda): detecting new mitogenomic rearrangements.

    Science.gov (United States)

    Arabi, Juliette; Judson, Mark L I; Deharveng, Louis; Lourenço, Wilson R; Cruaud, Corinne; Hassanin, Alexandre

    2012-02-01

    Here we study the evolution of nucleotide composition in third codon-positions of CO1 sequences of Chelicerata, using a phylogenetic framework, based on 180 taxa and three markers (CO1, 18S, and 28S rRNA; 5,218 nt). The analyses of nucleotide composition were also extended to all CO1 sequences of Chelicerata found in GenBank (1,701 taxa). The results show that most species of Chelicerata have a positive strand bias in CO1, i.e., in favor of C nucleotides, including all Amblypygi, Palpigradi, Ricinulei, Solifugae, Uropygi, and Xiphosura. However, several taxa show a negative strand bias, i.e., in favor of G nucleotides: all Scorpiones, Opisthothelae spiders and several taxa within Acari, Opiliones, Pseudoscorpiones, and Pycnogonida. Several reversals of strand-specific bias can be attributed to either a rearrangement of the control region or an inversion of a fragment containing the CO1 gene. Key taxa for which sequencing of complete mitochondrial genomes will be necessary to determine the origin and nature of mtDNA rearrangements involved in the reversals are identified. Acari, Opiliones, Pseudoscorpiones, and Pycnogonida were found to show a strong variability in nucleotide composition. In addition, both mitochondrial and nuclear genomes have been affected by higher substitution rates in Acari and Pseudoscorpiones. The results therefore indicate that these two orders are more liable to fix mutations of all types, including base substitutions, indels, and genomic rearrangements.

  11. Single nucleotide polymorphism mining and nucleotide sequence analysis of Mx1 gene in exonic regions of Japanese quail.

    Science.gov (United States)

    Niraj, Diwesh Kumar; Kumar, Pushpendra; Mishra, Chinmoy; Narayan, Raj; Bhattacharya, Tarun Kumar; Shrivastava, Kush; Bhushan, Bharat; Tiwari, Ashok Kumar; Saxena, Vishesh; Sahoo, Nihar Ranjan; Sharma, Deepak

    2015-12-01

    An attempt has been made to study the Myxovirus resistant (Mx1) gene polymorphism in Japanese quail. In the present, investigation four fragments viz. Fragment I of 185 bp (Exon 3 region), Fragment II of 148 bp (Exon 5 region), Fragment III of 161 bp (Exon 7 region), and Fragment IV of 176 bp (Exon 13 region) of Mx1 gene were amplified and screened for polymorphism by polymerase chain reaction-single-strand conformation polymorphism technique in 170 Japanese quail birds. Out of the four fragments, one fragment (Fragment II) was found to be polymorphic. Remaining three fragments (Fragment I, III, and IV) were found to be monomorphic which was confirmed by custom sequencing. Overall nucleotide sequence analysis of Mx1 gene of Japanese quail showed 100% homology with common quail and more than 80% homology with reported sequence of chicken breeds. The Mx1 gene is mostly conserved in Japanese quail. There is an urgent need of comprehensive analysis of other regions of Mx1 gene along with its possible association with the traits of economic importance in Japanese quail.

  12. Single nucleotide polymorphism mining and nucleotide sequence analysis of Mx1 gene in exonic regions of Japanese quail

    Directory of Open Access Journals (Sweden)

    Diwesh Kumar Niraj

    2015-12-01

    Full Text Available Aim: An attempt has been made to study the Myxovirus resistant (Mx1 gene polymorphism in Japanese quail. Materials and Methods: In the present, investigation four fragments viz. Fragment I of 185 bp (Exon 3 region, Fragment II of 148 bp (Exon 5 region, Fragment III of 161 bp (Exon 7 region, and Fragment IV of 176 bp (Exon 13 region of Mx1 gene were amplified and screened for polymorphism by polymerase chain reaction-single-strand conformation polymorphism technique in 170 Japanese quail birds. Results: Out of the four fragments, one fragment (Fragment II was found to be polymorphic. Remaining three fragments (Fragment I, III, and IV were found to be monomorphic which was confirmed by custom sequencing. Overall nucleotide sequence analysis of Mx1 gene of Japanese quail showed 100% homology with common quail and more than 80% homology with reported sequence of chicken breeds. Conclusion: The Mx1 gene is mostly conserved in Japanese quail. There is an urgent need of comprehensive analysis of other regions of Mx1 gene along with its possible association with the traits of economic importance in Japanese quail.

  13. Nucleotide sequence of the Agrobacterium tumefaciens octopine Ti plasmid-encoded tmr gene

    NARCIS (Netherlands)

    Heidekamp, F.; Dirkse, W.G.; Hille, J.; Ormondt, H. van

    1983-01-01

    The nucleotide sequence of the tmr gene, encoded by the octopine Ti plasmid from Agrobacterium tumefaciens (pTiAch5), was determined. The T-DNA, which encompasses this gene, is involved in tumor formation and maintenance, and probably mediates the cytokinin-independent growth of transformed plant

  14. Inferring epidemiological dynamics of infectious diseases using Tajima's D statistic on nucleotide sequences of pathogens

    Directory of Open Access Journals (Sweden)

    Kiyeon Kim

    2017-12-01

    Full Text Available The estimation of the basic reproduction number is essential to understand epidemic dynamics, and time series data of infected individuals are usually used for the estimation. However, such data are not always available. Methods to estimate the basic reproduction number using genealogy constructed from nucleotide sequences of pathogens have been proposed so far. Here, we propose a new method to estimate epidemiological parameters of outbreaks using the time series change of Tajima's D statistic on the nucleotide sequences of pathogens. To relate the time evolution of Tajima's D to the number of infected individuals, we constructed a parsimonious mathematical model describing both the transmission process of pathogens among hosts and the evolutionary process of the pathogens. As a case study we applied this method to the field data of nucleotide sequences of pandemic influenza A (H1N1 2009 viruses collected in Argentina. The Tajima's D-based method estimated basic reproduction number to be 1.55 with 95% highest posterior density (HPD between 1.31 and 2.05, and the date of epidemic peak to be 10th July with 95% HPD between 22nd June and 9th August. The estimated basic reproduction number was consistent with estimation by birth–death skyline plot and estimation using the time series of the number of infected individuals. These results suggested that Tajima's D statistic on nucleotide sequences of pathogens could be useful to estimate epidemiological parameters of outbreaks.

  15. Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins.

    Directory of Open Access Journals (Sweden)

    David Karlin

    Full Text Available Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa, several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains that could be detected simply by comparing orthologous proteins.

  16. Correlating CpG islands, motifs, and sequence variants in human chromosome 21

    Directory of Open Access Journals (Sweden)

    Cercone Nick

    2011-07-01

    Full Text Available Abstract Background CpG islands are important regions in DNA. They usually appear at the 5’ end of genes containing GC-rich dinucleotides. When DNA methylation occurs, gene regulation is affected and it sometimes leads to carcinogenesis. We propose a new detection program using a hidden-markov model alongside the Viterbi algorithm. Methods Our solution provides a graphical user interface not seen in many of the other CGI detection programs and we unify the detection and analysis under one program to allow researchers to scan a genetic sequence, detect the significant CGIs, and analyze the sequence once the scan is complete for any noteworthy findings. Results Using human chromosome 21, we show that our algorithm finds a significant number of CGIs. Running an analysis on a dataset of promoters discovered that the characteristics of methylated and unmethylated CGIs are significantly different. Finally, we detected significantly different motifs between methylated and unmethylated CGI promoters using MEME and MAST. Conclusions Developing this new tool for the community using powerful algorithms has shown that combining analysis with CGI detection will improve the continued research within the field of epigenetics.

  17. Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs.

    Science.gov (United States)

    Huo, Tong; Liu, Wei; Guo, Yu; Yang, Cheng; Lin, Jianping; Rao, Zihe

    2015-03-26

    Emergence of multiple drug resistant strains of M. tuberculosis (MDR-TB) threatens to derail global efforts aimed at reigning in the pathogen. Co-infections of M. tuberculosis with HIV are difficult to treat. To counter these new challenges, it is essential to study the interactions between M. tuberculosis and the host to learn how these bacteria cause disease. We report a systematic flow to predict the host pathogen interactions (HPIs) between M. tuberculosis and Homo sapiens based on sequence motifs. First, protein sequences were used as initial input for identifying the HPIs by 'interolog' method. HPIs were further filtered by prediction of domain-domain interactions (DDIs). Functional annotations of protein and publicly available experimental results were applied to filter the remaining HPIs. Using such a strategy, 118 pairs of HPIs were identified, which involve 43 proteins from M. tuberculosis and 48 proteins from Homo sapiens. A biological interaction network between M. tuberculosis and Homo sapiens was then constructed using the predicted inter- and intra-species interactions based on the 118 pairs of HPIs. Finally, a web accessible database named PATH (Protein interactions of M. tuberculosis and Human) was constructed to store these predicted interactions and proteins. This interaction network will facilitate the research on host-pathogen protein-protein interactions, and may throw light on how M. tuberculosis interacts with its host.

  18. Nucleotide sequence database comparison for Internal Transcribed Spacer 2 genetic region DNA barcode dermatophyte routine identification.

    Science.gov (United States)

    Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R

    2018-02-28

    Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA ITS region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification.We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study.According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex (n=184), T. interdigitale (n=40), T. tonsurans (n=26) and T. benhamiae (n=5). Other genera included Microsporum (e.g., M canis (n=21), M. audouinii (n=10) and Nannizzia gypseum (n=3), and Epidermophyton (n=3)). Species-level identification of T. rubrum complex isolates was an issue.Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly-labeled database is consulted. As many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.

  19. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Directory of Open Access Journals (Sweden)

    Guido W. Grimm

    2006-01-01

    Full Text Available The multi-copy internal transcribed spacer (ITS region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation instead of the full (partly redundant original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

  20. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs

    DEFF Research Database (Denmark)

    van Beest, M; Dooijes, D; van De Wetering, M

    2000-01-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment of p...

  1. A structural study for the optimisation of functional motifs encoded in protein sequences

    Directory of Open Access Journals (Sweden)

    Helmer-Citterich Manuela

    2004-04-01

    Full Text Available Abstract Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases, the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of

  2. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  3. An algorithm and program for finding sequence specific oligo-nucleotide probes for species identification

    Directory of Open Access Journals (Sweden)

    Tautz Diethard

    2002-03-01

    Full Text Available Abstract Background The identification of species or species groups with specific oligo-nucleotides as molecular signatures is becoming increasingly popular for bacterial samples. However, it shows also great promise for other small organisms that are taxonomically difficult to tract. Results We have devised here an algorithm that aims to find the optimal probes for any given set of sequences. The program requires only a crude alignment of these sequences as input and is optimized for performance to deal also with very large datasets. The algorithm is designed such that the position of mismatches in the probes influences the selection and makes provision of single nucleotide outloops. Program implementations are available for Linux and Windows.

  4. Nucleotide sequence of the 18S-26S rRNA intergene region of the sea urchin.

    OpenAIRE

    Hindenach, B R; Stafford, D W

    1984-01-01

    The DNA sequence which spans the internal transcribed spacers of a cloned ribosomal transcription unit from the sea urchin, Lytechinus variegatus, has been determined. The region extends from the conserved Eco RI site near the 3' end of the 18S rDNA to a Bam HI site in the 26S rDNA and includes 232 nucleotides coding for 18S rRNA, 367 nucleotides of internal transcribed spacer, 159 nucleotides coding for 5.8S rRNA, 338 nucleotides of internal transcribed spacer, and 505 nucleotides coding for...

  5. Nucleotide sequence analysis of regions of adenovirus 5 DNA containing the origins of DNA replication

    International Nuclear Information System (INIS)

    Steenbergh, P.H.

    1979-01-01

    The purpose of the investigations described is the determination of nucleotide sequences at the molecular ends of the linear adenovirus type 5 DNA. Knowledge of the primary structure at the termini of this DNA molecule is of particular interest in the study of the mechanism of replication of adenovirus DNA. The initiation- and termination sites of adenovirus DNA replication are located at the ends of the DNA molecule. (Auth.)

  6. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

    Science.gov (United States)

    Dunn, Joshua G; Weissman, Jonathan S

    2016-11-22

    Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily

  7. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    Energy Technology Data Exchange (ETDEWEB)

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.

    1987-02-01

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.

  8. A combined sequence and structure based method for discovering enriched motifs in RNA from in vivo binding data.

    Science.gov (United States)

    Polishchuk, Maya; Paz, Inbal; Kohen, Refael; Mesika, Rona; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2017-04-15

    RNA binding proteins (RBPs) play an important role in regulating many processes in the cell. RBPs often recognize their RNA targets in a specific manner. In addition to the RNA primary sequence, the structure of the RNA has been shown to play a central role in RNA recognition by RBPs. In recent years, many experimental approaches, both in vitro and in vivo, were developed and employed to identify and characterize RBP targets and extract their binding specificities. In vivo binding techniques, such as CrossLinking and ImmunoPrecipitation (CLIP)-based methods, enable the characterization of protein binding sites on RNA targets. However, these methods do not provide information regarding the structural preferences of the protein. While methods to obtain the structure of RNA are available, inferring both the sequence and the structure preferences of RBPs remains a challenge. Here we present SMARTIV, a novel computational tool for discovering combined sequence and structure binding motifs from in vivo RNA binding data relying on the sequences of the target sites, the ranking of their binding scores and their predicted secondary structure. The combined motifs are provided in a unified representation that is informative and easy for visual perception. We tested the method on CLIP-seq data from different platforms for a variety of RBPs. Overall, we show that our results are highly consistent with known binding motifs of RBPs, offering additional information on their structural preferences. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. APTANI: a computational tool to select aptamers through sequence-structure motif analysis of HT-SELEX data.

    Science.gov (United States)

    Caroli, J; Taccioli, C; De La Fuente, A; Serafini, P; Bicciato, S

    2016-01-15

    Aptamers are synthetic nucleic acid molecules that can bind biological targets in virtue of both their sequence and three-dimensional structure. Aptamers are selected using SELEX, Systematic Evolution of Ligands by EXponential enrichment, a technique that exploits aptamer-target binding affinity. The SELEX procedure, coupled with high-throughput sequencing (HT-SELEX), creates billions of random sequences capable of binding different epitopes on specific targets. Since this technique produces enormous amounts of data, computational analysis represents a critical step to screen and select the most biologically relevant sequences. Here, we present APTANI, a computational tool to identify target-specific aptamers from HT-SELEX data and secondary structure information. APTANI builds on AptaMotif algorithm, originally implemented to analyze SELEX data; extends the applicability of AptaMotif to HT-SELEX data and introduces new functionalities, as the possibility to identify binding motifs, to cluster aptamer families or to compare output results from different HT-SELEX cycles. Tabular and graphical representations facilitate the downstream biological interpretation of results. APTANI is available at http://aptani.unimore.it. silvio.bicciato@unimore.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Nucleotide sequence determination of the region in adenovirus 5 DNA involved in cell transformation

    International Nuclear Information System (INIS)

    Maat, J.

    1978-01-01

    A description is given of investigations into the primary structure of the transforming region of adenovirus type 5 DNA. The phenomenon of cell transformation is discussed in general terms and the principles of a number of fairly recent techniques, which have been in use for DNA sequence determination since 1975 are dealt with. A few of the author's own techniques are described which deal both with nucleotide sequence analysis and with the determination of DNA cleavage sites of restriction endonucleases. The results are given of the mapping of cleavage sites in the HpaI-E fragment of adenovirus DNA of HpaII, HaeIII, AluI, HinfI and TaqI and of the determination of the nucleotide sequence in the transforming region of adenovirus type 5 DNA. The results of the sequence determination of the Ad5 HindIII-G fragment are discussed in relation with the investigation on the transforming proteins isolated from in vitro and in vivo synthesizing systems. Labelling procedures of DNA are described including the exonuclease III/DNA polymerase 1 method and TA polynucleotide kinase labelling of DNA fragments. (Auth.)

  11. A novel method to discover fluoroquinolone antibiotic resistance (qnr genes in fragmented nucleotide sequences

    Directory of Open Access Journals (Sweden)

    Boulund Fredrik

    2012-12-01

    Full Text Available Abstract Background Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered qnr genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, qnr genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of qnr genes in more detail. Results In this paper we describe a new method to identify qnr genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of qnr genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated qnr genes. In addition, several fragments from novel putative qnr genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature. Conclusions The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of qnr genes in nucleotide sequence data. The predicted novel putative qnr genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at http://bioinformatics.math.chalmers.se/qnr/.

  12. The nucleotide sequence of 4.5S ribosomal RNA from tobacco chloroplasts.

    OpenAIRE

    Takaiwa, F; Sugiura, M

    1980-01-01

    The nucleotide sequence of tobacco chloroplast 4.5S ribosomal RNA has been determined to be: OHG-A-A-G-G-U-C-A-C-G-G-C-G-A-G-A-C-G-A-G-C-C-G-U-U-U-A-U-C-A-U-U-A-C-G-A-U-A-G-G-U-G-U-C-A-A-G-U-G-G-A-A-G-U-G-C-A-G-U-G-A-U-G-U-A-U-G-C-(G-A)-C-U-G-A-G-G-C-A-U-C-C-U-A-A-C-A-G-A-C-C-G-G-U-A-G-A-C-U-U-G-A-A-COH. The 4.5S RNA is 103 nucleotides long and its 5'-terminus is not phosphorylated.

  13. Analysis of the genome sequence of the pathogenic Muscovy duck parvovirus strain YY reveals a 14-nucleotide-pair deletion in the inverted terminal repeats.

    Science.gov (United States)

    Wang, Jianye; Huang, Yu; Zhou, Mingxu; Zhu, Guoqiang

    2016-09-01

    Genomic information about Muscovy duck parvovirus is still limited. In this study, the genome of the pathogenic MDPV strain YY was sequenced. The full-length genome of YY is 5075 nucleotides (nt) long, 57 nt shorter than that of strain FM. Sequence alignment indicates that the 5' and 3' inverted terminal repeats (ITR) of strain YY contain a 14-nucleotide-pair deletion in the stem of the palindromic hairpin structure in comparison to strain FM and FZ91-30. The deleted region contains one "E-box" site and one repeated motif with the sequence "TTCCGGT" or "ACCGGAA". Phylogenetic trees constructed based the protein coding genes concordantly showed that YY, together with nine other MDPV isolates from various places, clustered in a separate branch, distinct from the branch formed by goose parvovirus (GPV) strains. These results demonstrate that, despite the distinctive deletion, the YY strain still belongs to the classical MDPV group. Moreover, the deletion of ITR may contribute to the genome evolution of MDPV under immunization pressure.

  14. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms.

    Science.gov (United States)

    Taillon-Miller, P; Gu, Z; Li, Q; Hillier, L; Kwok, P Y

    1998-07-01

    An efficient strategy to develop a dense set of single-nucleotide polymorphism (SNP) markers is to take advantage of the human genome sequencing effort currently under way. Our approach is based on the fact that bacterial artificial chromosomes (BACs) and P1-based artificial chromosomes (PACs) used in long-range sequencing projects come from diploid libraries. If the overlapping clones sequenced are from different lineages, one is comparing the sequences from 2 homologous chromosomes in the overlapping region. We have analyzed in detail every SNP identified while sequencing three sets of overlapping clones found on chromosome 5p15.2, 7q21-7q22, and 13q12-13q13. In the 200.6 kb of DNA sequence analyzed in these overlaps, 153 SNPs were identified. Computer analysis for repetitive elements and suitability for STS development yielded 44 STSs containing 68 SNPs for further study. All 68 SNPs were confirmed to be present in at least one of the three (Caucasian, African-American, Hispanic) populations studied. Furthermore, 42 of the SNPs tested (62%) were informative in at least one population, 32 (47%) were informative in two or more populations, and 23 (34%) were informative in all three populations. These results clearly indicate that developing SNP markers from overlapping genomic sequence is highly efficient and cost effective, requiring only the two simple steps of developing STSs around the known SNPs and characterizing them in the appropriate populations.

  15. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    Science.gov (United States)

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  16. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    Science.gov (United States)

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-01-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amino acid polypeptide of 31,372 daltons. Images PMID:3514578

  17. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    OpenAIRE

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-01-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amin...

  18. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    Science.gov (United States)

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-04-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amino acid polypeptide of 31,372 daltons.

  19. Complete nucleotide sequence of a virus associated with rusty mottle disease of sweet cherry (Prunus avium).

    Science.gov (United States)

    Villamor, D V; Druffel, K L; Eastwell, K C

    2013-08-01

    Cherry rusty mottle is a disease of sweet cherries first described in 1940 in western North America. Because of the graft-transmissible nature of the disease, a viral nature of the disease was assumed. Here, the complete genomic nucleotide sequences of virus isolates from two trees expressing cherry rusty mottle disease symptoms are characterized; the virus is designated cherry rusty mottle associated virus (CRMaV). The biological and molecular characteristics of this virus in comparison to those of cherry necrotic rusty mottle virus (CNRMV) and cherry green ring mottle virus (CGRMV) are described. CRMaV was subsequently detected in additional sweet cherry trees expressing symptoms of cherry rusty mottle disease.

  20. The complete nucleotide sequence of the mitochondrial genome of Bactrocera minax (Diptera: Tephritidae).

    Science.gov (United States)

    Zhang, Bin; Nardi, Francesco; Hull-Sanders, Helen; Wan, Xuanwu; Liu, Yinghong

    2014-01-01

    The complete 16,043 bp mitochondrial genome (mitogenome) of Bactrocera minax (Diptera: Tephritidae) has been sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned above, although it contains the lowest genome-wide A+T content (67.3%). Four short intergenic spacers with a high degree of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear counterparts in the control regions (CRs). Correlation analysis among these ten tephritid species revealed close positive correlation between the A+T content of zero-fold degenerate sites (P0FD), the ratio of nucleotide substitution frequency at P0FD sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites) and amino acid sequence distance (ASD) were found. Further, significant positive correlation was observed between the A+T content of four-fold degenerate sites (P4FD) and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites; however, we found significant negative correlation between ASD and the A+T content of P4FD, and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non-synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect mitogenome. A poly(T) stretch at the 5' end of the CR followed by a [TA(A)]n-like stretch was also found. In addition, a highly conserved G+A-rich sequence block was observed in front of the

  1. The complete nucleotide sequence of the mitochondrial genome of Bactrocera minax (Diptera: Tephritidae.

    Directory of Open Access Journals (Sweden)

    Bin Zhang

    Full Text Available The complete 16,043 bp mitochondrial genome (mitogenome of Bactrocera minax (Diptera: Tephritidae has been sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned above, although it contains the lowest genome-wide A+T content (67.3%. Four short intergenic spacers with a high degree of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear counterparts in the control regions (CRs. Correlation analysis among these ten tephritid species revealed close positive correlation between the A+T content of zero-fold degenerate sites (P0FD, the ratio of nucleotide substitution frequency at P0FD sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites and amino acid sequence distance (ASD were found. Further, significant positive correlation was observed between the A+T content of four-fold degenerate sites (P4FD and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites; however, we found significant negative correlation between ASD and the A+T content of P4FD, and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non-synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect mitogenome. A poly(T stretch at the 5' end of the CR followed by a [TA(A]n-like stretch was also found. In addition, a highly conserved G+A-rich sequence block was observed in front

  2. MeCP2 recognizes cytosine methylated tri-nucleotide and di-nucleotide sequences to tune transcription in the mammalian brain.

    Directory of Open Access Journals (Sweden)

    Sabine Lagger

    2017-05-01

    Full Text Available Mutations in the gene encoding the methyl-CG binding protein MeCP2 cause several neurological disorders including Rett syndrome. The di-nucleotide methyl-CG (mCG is the classical MeCP2 DNA recognition sequence, but additional methylated sequence targets have been reported. Here we show by in vitro and in vivo analyses that MeCP2 binding to non-CG methylated sites in brain is largely confined to the tri-nucleotide sequence mCAC. MeCP2 binding to chromosomal DNA in mouse brain is proportional to mCAC + mCG density and unexpectedly defines large genomic domains within which transcription is sensitive to MeCP2 occupancy. Our results suggest that MeCP2 integrates patterns of mCAC and mCG in the brain to restrain transcription of genes critical for neuronal function.

  3. Expression and nucleotide sequence of the Lactobacillus bulgaricus beta-galactosidase gene cloned in Escherichia coli.

    Science.gov (United States)

    Schmidt, B F; Adams, R M; Requadt, C; Power, S; Mainzer, S E

    1989-01-01

    The Lactobacillus bulgaricus beta-galactosidase gene was cloned on a ca. 7-kilobase-pair HindIII fragment in the vector pKK223-3 and expressed in Escherichia coli by using its own promoter. The nucleotide sequence of the gene and approximately 400 bases of 3'- and 5'-flanking sequences was determined. The amino acid sequence of the beta-galactosidase, deduced from the nucleotide sequence of the gene, yielded a monomeric molecular mass of ca. 114 kilodaltons, slightly smaller than the E. coli lacZ and Klebsiella pneumoniae lacZ enzymes but larger than the E. coli evolved (ebgA) beta-galactosidase. The cloned beta-galactosidase was found to be indistinguishable from the native enzyme by several criteria. From amino acid sequence alignments, the L. bulgaricus beta-galactosidase has a 30 to 34% similarity to the E. coli lacZ, E. coli ebgA, and K. pneumoniae lacZ enzymes. There are seven regions of high similarity common to all four of these beta-galactosidases. Also, the putative active-site residues (Glu-461 and Tyr-503 in the E. coli lacZ beta-galactosidase) are conserved in the L. bulgaricus enzyme as well as in the other two beta-galactosidases mentioned above. The conservation of active-site amino acids and the large regions of similarity suggest that all four of these beta-galactosidases evolved from a common ancestral gene. However, these enzymes are quite different from the thermophilic beta-galactosidase encoded by the Bacillus stearothermophilus bgaB gene. PMID:2492511

  4. The phosphoenolpyruvate carboxylase gene of Corynebacterium glutamicum: molecular cloning, nucleotide sequence, and expression.

    Science.gov (United States)

    Eikmanns, B J; Follettie, M T; Griot, M U; Sinskey, A J

    1989-08-01

    The ppc gene of Corynebacterium glutamicum encoding phosphoenolpyruvate (PEP) carboxylase was isolated by complementation of a ppc mutant of Escherichia coli using a cosmid gene bank of chromosomal C. glutamicum DNA. By subsequent subcloning into the plasmid pUC8 and deletion analysis, the ppc gene could be located on a 3.3 kb SalI fragment. This fragment was able to complement the E. coli ppc mutant and conferred PEP carboxylase activity to the mutant. The complete nucleotide sequence of the ppc gene including 5' and 3' flanking regions has been determined and the primary structure of PEP carboxylase was deduced. The sequence predicts a 919 residue protein product (molecular weight of 103 154) which shows 34% similarity with the respective E. coli enzyme.

  5. Nucleotide sequence alignment of hdcA from Gram-positive bacteria.

    Science.gov (United States)

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; Del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A

    2016-03-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4].

  6. Nucleotide sequence alignment of hdcA from Gram-positive bacteria

    Directory of Open Access Journals (Sweden)

    Maria Diaz

    2016-03-01

    Full Text Available The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]. The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3], which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4].

  7. [Nucleotide sequence of HLA-DQA1 promoter region (QAP) in a lung cancer patient].

    Science.gov (United States)

    Qiu, C; Zhou, W; Song, C

    1996-06-01

    The HLA-DQA1 allele and nucleotide sequence of HLA-DQA1 promoter region (QAP) in a patient with IDDM complicated lung cancer have been identified by PCR/SSCP, PCR/SSCP and PCR/sequencing. The results showed that: (1) All of the lung cancer patient and his family members carried HLA-DQA1* 0301/0501 alleles. (2) a single base substitution G-->A at position -155 and deletion CAA at position -161 to -163 occurred in the patient. These results suggest that the mutation of HLA-DQA1 promoter region may modulate HLA-DQA1 gene expression by trans-acting factors binding to variant cis-acting elements and may be responsible for pathogenesis of lung cancer.

  8. Unique nucleotide sequence-guided assembly of repetitive DNA parts for synthetic biology applications

    Energy Technology Data Exchange (ETDEWEB)

    Torella, JP; Lienert, F; Boehm, CR; Chen, JH; Way, JC; Silver, PA

    2014-08-07

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.

  9. Requirement for asparagine in the aquaporin NPA sequence signature motifs for cation exclusion

    DEFF Research Database (Denmark)

    Wree, Dorothea; Wu, Binghua; Zeuthen, Thomas

    2011-01-01

    Two highly conserved NPA motifs are a hallmark of the aquaporin (AQP) family. The NPA triplets form N-terminal helix capping structures with the Asn side chains located in the centre of the water or solute-conducting channel, and are considered to play an important role in AQP selectivity. Althou...

  10. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    Science.gov (United States)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  11. Nucleotide sequence of cloned cDNA for human sphingolipid activator protein 1 precursor

    International Nuclear Information System (INIS)

    Dewji, N.N.; Wenger, D.A.; O'Brien, J.S.

    1987-01-01

    Two cDNA clones encoding prepro-sphingolipid activator protein 1 (SAP-1) were isolated from a λ gt11 human hepatoma expression library using polyclonal antibodies. These had inserts of ≅ 2 kilobases (λ-S-1.2 and λ-S-1.3) and both were both homologous with a previously isolated clone (λ-S-1.1) for mature SAP-1. The authors report here the nucleotide sequence of the longer two EcoRI fragments of S-1.2 and S-1.3 that were not the same and the derived amino acid sequences of mature SAP-1 and its prepro form. The open reading frame encodes 19 amino acids, which are colinear with the amino-terminal sequence of mature SAP-1, and extends far beyond the predicted carboxyl terminus of mature SAP-1, indicating extensive carboxyl-terminal processing. The nucleotide sequence of cDNA encoding prepro-SAP-1 includes 1449 bases from the assigned initiation codon ATG at base-pair 472 to the stop codon TGA at base-pair 1921. The first 23 amino acids coded after the initiation ATG are characteristic of a signal peptide. The calculated molecular mass for a polypeptide encoded by 1449 bases is ≅ 53 kDa, in keeping with the reported value for pro-SAP-1. The data indicate that after removal of the signal peptide mature SAP-1 is generated by removing an additional 7 amino acids from the amino terminus and ≅ 373 amino acids from the carboxyl terminus. One potential glycosylation site was previously found in mature SAP-1. Three additional potential glycosylation sites are present in the processed carboxyl-terminal polypeptide, which they designate as P-2

  12. Remarkable similarity in genome nucleotide sequences between the Schwarz FF-8 and AIK-C measles virus vaccine strains and apparent nucleotide differences in the phosphoprotein gene.

    Science.gov (United States)

    Ito, Chie; Ohgimoto, Shinji; Kato, Seiichi; Sharma, Luna Bhatta; Ayata, Minoru; Komase, Katsuhiro; Takeuchi, Kaoru; Ihara, Toshiaki; Ogura, Hisashi

    2011-07-01

    The Schwarz FF-8 (FF-8) and AIK-C measles virus vaccine strains are currently used for vaccination in Japan. Here, the complete genome nucleotide sequence of the FF-8 strain has been determined and its genome sequence found to be remarkably similar to that of the AIK-C strain. These two strains are differentiated only by two nucleotide differences in the phosphoprotein gene. Since the FF-8 strain does not possess the amino acid substitutions in the phospho- and fusion proteins which are responsible for the temperature-sensitivity and small syncytium formation phenotypes of the AIK-C strain, respectively, other unidentified common mechanisms likely attenuate both the FF-8 and AIK-C strains. © 2011 The Societies and Blackwell Publishing Asia Pty Ltd.

  13. Feasibility of mini-sequencing schemes based on nucleotide polymorphisms for microbial identification and population analyses.

    Science.gov (United States)

    Araujo, Ricardo; Eusebio, Nadia; Caramalho, Rita

    2015-03-01

    Practical schemes based on single nucleotide polymorphisms (SNP) have been proposed as alternatives to simplify and replace the molecular methodologies based on the extensive sequencing analysis of genes. SNaPshot mini-sequencing has been progressively experienced during the last decade and represents a fast and robust strategy to analyze critical polymorphisms. Such assays have been proposed to characterize some bacteria and microbial eukaryotes, and its feasibility was now reviewed in the present manuscript. The mini-sequencing schemes showed high discriminatory power and competence for identification of microorganisms, but some specificity errors were still found, particularly for species of the Burkholderia cepacia complex and mycobacteria. SNP assays designed for other goals, e.g., comparison of strains, detection of serotypes, virulence, epidemic, and phylogenetic-related subgroups of isolates, can be very useful by facilitating the investigation of large collections of isolates. The next-generation of SNP assays might consider the inclusion of large number of markers to fully characterize microbial taxonomy and strains; nevertheless, these new technologies are still prone to errors and can largely benefit from integration with well-established mini-sequencing assays. Newly proposed molecular tools should be systematically tested in collections of isolates with high indexes of diversity and guarantee interlaboratorial validation.

  14. The nucleotide sequences of 5S rRNAs from a fern Dryopteris acuminata and a horsetail Equisetum arvense.

    Science.gov (United States)

    Hori, H; Osawa, S; Takaiwa, F; Sugiura, M

    1984-02-10

    The nucleotide sequences from two Pteridophyta species, a fern Dryopteris acuminata and a horsetail Equisetum arvense have been determined. These two sequences are more related to those of the Bryophyta species (88% identity on average) than to those of seed plants (84% identity on average).

  15. Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs

    Science.gov (United States)

    Allevato, Michael; Bolotin, Eugene; Grossman, Mark; Mane-Padros, Daniel; Sladek, Frances M.

    2017-01-01

    The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX) bind Enhancer box (E-box) DNA elements (CANNTG) and have the greatest affinity for the canonical MYC E-box (CME) CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a “non-specific” fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87%) of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought. PMID:28719624

  16. Cloning, sequencing and identification of single nucleotide polymorphisms of partial sequence on the porcine CACNA1S gene.

    Science.gov (United States)

    Fang, XiaoMin; Xu, NingYing; Ren, ShouWen

    2008-04-01

    CACNA1S gene encodes the alpha1 subunit of the calcium channel. The mutation of CACNA1S gene can cause hypokalemic periodic paralysis (HypoKPP) and maliglant hyperthermia synarome (MHS) in human beings. Current research on CACNA1S was mainly in human being and model animal, but rarely in livestock and poultry. In this study, Yorkshire pigs (23), Pietrain pigs (30), Jinhua pigs (115) and the second generation (126) of crossbred of Jinhua and Pietrain were used. Primers were designed according to the sequence of human CACNA1S gene and PCR was carried out using pig genome DNA. PCR products were sequenced and compared with that of human, and then single nucleotide polymorphisms (SNPs) were investigated by PCR-SSCP, while PCR-RFLP tests were performed to validate the mutations. Results indicated: (1) the 5211 bp DNA fragments of porcine CACNA1S gene were acquired (GenBank accession number: DQ767693 ) and the identity of the exon region was 82.6% between human and pig; (2) fifty-seven mutations were found within the cloned sequences, among which 24 were in exon region; (3) the results of PCR-RFLP were in accordance with that of PCR-SSCP. According to the EST of porcine CACNA1S gene published in GenBank (Bx914582, Bx666997), 8 of the 11 SNPs identified in the present study were consistent with the base difference between two EST fragments.

  17. Pervasive within-Mitochondrion Single-Nucleotide Variant Heteroplasmy as Revealed by Single-Mitochondrion Sequencing

    Directory of Open Access Journals (Sweden)

    Jacqueline Morris

    2017-12-01

    Full Text Available Summary: A number of mitochondrial diseases arise from single-nucleotide variant (SNV accumulation in multiple mitochondria. Here, we present a method for identification of variants present at the single-mitochondrion level in individual mouse and human neuronal cells, allowing for extremely high-resolution study of mitochondrial mutation dynamics. We identified extensive heteroplasmy between individual mitochondrion, along with three high-confidence variants in mouse and one in human that were present in multiple mitochondria across cells. The pattern of variation revealed by single-mitochondrion data shows surprisingly pervasive levels of heteroplasmy in inbred mice. Distribution of SNV loci suggests inheritance of variants across generations, resulting in Poisson jackpot lines with large SNV load. Comparison of human and mouse variants suggests that the two species might employ distinct modes of somatic segregation. Single-mitochondrion resolution revealed mitochondria mutational dynamics that we hypothesize to affect risk probabilities for mutations reaching disease thresholds. : Morris et al. use independent sequencing of multiple individual mitochondria from mouse and human brain cells to show high pervasiveness of mutations. The mutations are heteroplasmic within single mitochondria and within and between cells. These findings suggest mechanisms by which mutations accumulate over time, resulting in mitochondrial dysfunction and disease. Keywords: single mitochondrion, single cell, human neuron, mouse neuron, single-nucleotide variation

  18. Functional characterization of variations on regulatory motifs.

    Directory of Open Access Journals (Sweden)

    Michal Lapidot

    2008-03-01

    Full Text Available Transcription factors (TFs regulate gene expression through specific interactions with short promoter elements. The same regulatory protein may recognize a variety of related sequences. Moreover, once they are detected it is hard to predict whether highly similar sequence motifs will be recognized by the same TF and regulate similar gene expression patterns, or serve as binding sites for distinct regulatory factors. We developed computational measures to assess the functional implications of variations on regulatory motifs and to compare the functions of related sites. We have developed computational means for estimating the functional outcome of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. We predict the effects of nucleotide variations within motifs on gene expression patterns. In cases where such predictions could be compared to suitable published experimental evidence, we found very good agreement. We further accumulated statistics from multiple substitutions across various binding sites in an attempt to deduce general properties that characterize nucleotide substitutions that are more likely to alter expression. We found that substitutions involving Adenine are more likely to retain the expression pattern and that substitutions involving Guanine are more likely to alter expression compared to the rest of the substitutions. Our results should facilitate the prediction of the expression outcomes of binding site variations. One typical important implication is expected to be the ability to predict the phenotypic effect of variation in regulatory motifs in promoters.

  19. Method of Selection of Bacteria Antibiotic Resistance Genes Based on Clustering of Similar Nucleotide Sequences.

    Science.gov (United States)

    Balashov, I S; Naumov, V A; Borovikov, P I; Gordeev, A B; Dubodelov, D V; Lyubasovskaya, L A; Rodchenko, Yu V; Bystritskii, A A; Aleksandrova, N V; Trofimov, D Yu; Priputnevich, T V

    2017-10-01

    A new method for selection of bacterium antibiotic resistance genes is proposed and tested for solving the problems related to selection of primers for PCR assay. The method implies clustering of similar nucleotide sequences and selection of group primers for all genes of each cluster. Clustering of resistance genes for six groups of antibiotics (aminoglycosides, β-lactams, fluoroquinolones, glycopeptides, macrolides and lincosamides, and fusidic acid) was performed. The method was tested for 81 strains of bacteria of different genera isolated from patients (K. pneumoniae, Staphylococcus spp., S. agalactiae, E. faecalis, E. coli, and G. vaginalis). The results obtained by us are comparable to those in the selection of individual genes; this allows reducing the number of primers necessary for maximum coverage of the known antibiotic resistance genes during PCR analysis.

  20. The nucleotide sequence of histidine tRNA gamma of Drosophila melanogaster.

    OpenAIRE

    Altwegg, M; Kubli, E

    1980-01-01

    The nucleotide sequence of D. melanogaster histidine tRNA gamma was determined to be: pG-G-C-C-G-U-G-A-U-C-G-U-C-psi-A-G-D-G-G-D-D-A-G-G-A-C-C-C-C-A-C-G-psi-U-G-U-G- m1G-C-C-G-U-G-G-U-A-A-C-C-m5C-A-G-G-U-psi-C-G-m1A-A-U-C-C-U-G-G-U-C-A-C-G-G-m5C -A-C-C-AOH. An additional unpaired G is found at the 5' end, and the T in the TpsiC loop is replaced by a U.

  1. Regulation of nif gene expression in Enterobacter agglomerans: nucleotide sequence of the nifLA operon and influence of temperature and ammonium on its transcription.

    Science.gov (United States)

    Siddavattam, D; Steibl, H D; Kreutzer, R; Klingmüller, W

    1995-12-20

    The nucleotide sequence of a plasmid-borne 3.9 kb XhoI-SmaI fragment comprising the 3'-region of the nifM gene, the nifL and nifA genes and the 5'-region of nifB gene of Enterobacter agglomerans was determined. The genes were identified by their homology to the corresponding nif genes of Klebsiella pneumoniae. A typical sigma 54-dependent promoter and a consensus NtrC-binding motif were identified upstream of nifL. The predicted amino acid sequence of NifL showed close similarities to NifL of K. pneumoniae and Azotobacter vinelandii. However, no histidine residue was found to correspond to histidine-304 of A. vinelandii NifL, which had been proposed to be required for the repressor activity of NifL. The NifA sequence with a putative DNA binding motif (Q(x3) A(x3) G(x5)I) and an ATP binding site in the C-terminal and central domains, respectively, resembles that of other known NifA proteins. The function of the nifL and nifA genes was demonstrated in vivo using a binary plasmid system by their ability to activate a nifH promoter-lacZ fusion at different temperatures and concentrations of NH4+. Maximal promoter activity occurred at 25 degrees C, and it appears that the sensitivity of NifA to elevated temperatures is independent of NifL. The expression of nifL inhibited promoter activity in the presence of NifA when the initial NH4+ concentration in the medium exceeded 4 mM.

  2. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  3. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array.

    Science.gov (United States)

    Fuller, Carl W; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J; Kasianowicz, John J; Davis, Randy; Roever, Stefan; Church, George M; Ju, Jingyue

    2016-05-10

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5'-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods.

  4. Comparison of Nucleotide Sequence of P2C Region in Diabetogenic and Non-Diabetogenic Coxsackie Virus B5 Isolates

    Directory of Open Access Journals (Sweden)

    Cheng-Chong Chou

    2004-11-01

    Full Text Available Enteroviruses are environmental triggers in the pathogenesis of type 1 diabetes mellitus (DM. A sequence of six identical amino acids (PEVKEK is shared by the 2C protein of Coxsackie virus B and the glutamic acid decarboxylase (GAD molecules. Between 1995 and 2002, we investigated 22 Coxsackie virus B5 (CVB5 isolates from southern Taiwan. Four of these isolates were obtained from four new-onset type 1 DM patients with diabetic ketoacidosis. We compared a 300 nucleotide sequence in the 2C protein gene (p2C in 24 CVB5 isolates (4 diabetogenic, 18 non-diabetogenic and 2 prototype. We found 0.3-10% nucleotide differences. In the four isolates from type 1 DM patients, there was only 2.4-3.4% nucleotide difference, and there was only 1.7-7.1% nucleotide difference between type 1 DM isolates and non-diabetogenic isolates. Comparison of the nucleotide sequence between prototype virus and 22 CVB5 isolates revealed 18.4-24.1% difference. Twenty-one CVB5 isolates from type 1 DM and non-type 1 DM patients contained the PEVKEK sequence, as shown by the p2C nucleotide sequence. Our data showed that the viral p2C sequence with homology with GAD is highly conserved in CVB5 isolates. There was no difference between diabetogenic and non-diabetogenic CVB5 isolates. All four type 1 DM patients had at least one of the genetic susceptibility alleles HLA-DR, DQA1, DQB1. Other genetic and autoimmune factors such as HLA genetic susceptibility and GAD may also play important roles in the pathogenesis in type 1 DM.

  5. Analysis of Complete Nucleotide Sequences of 12 Gossypium Chloroplast Genomes: Origin and Evolution of Allotetraploids

    Science.gov (United States)

    Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping

    2012-01-01

    Gossypium is the maternal source of extant allotetraploid species and allotetraploids have a monophyletic origin. G. hirsutum AD1 lineages have experienced more sequence variations than other allotetraploids in intergenic regions. The available complete nucleotide sequences of 12 Gossypium chloroplast genomes should facilitate studies to uncover the molecular mechanisms of compartmental co-evolution and speciation of Gossypium allotetraploids. PMID:22876273

  6. The complete nucleotide sequence of Alternanthera mosaic virus infecting Portulaca grandiflora represents a new strain distinct from phlox isolates.

    Science.gov (United States)

    Ivanov, Peter A; Mukhamedzhanova, Anna A; Smirnov, Alexander A; Rodionova, Nina P; Karpova, Olga V; Atabekov, Joseph G

    2011-04-01

    A southeastern European isolate of Alternanthera mosaic virus (AltMV-MU) of the genus Potexvirus (family Flexiviridae) was purified from the ornamental plant Portulaca grandiflora. The complete nucleotide sequence (6606 nucleotides) of AltMV-MU genomic RNA was defined. The AltMV-MU genome is different from those of all isolates described earlier and is most closely related to genomes of partly sequenced portulaca isolates AltMV-Po (America) and AltMV-It (Italy). Phylogenetic analysis supports the view that AltMV-MU belongs to a new "portulaca" genotype distinguishable from the "phlox" genotype.

  7. Complete Nucleotide Sequence Analysis of the Norovirus GII.4 Sydney Variant in South Korea

    Directory of Open Access Journals (Sweden)

    Ji-Sun Park

    2015-01-01

    Full Text Available Norovirus is the primary cause of acute gastroenteritis in individuals of all ages. In Australia, a new strain of norovirus (GII.4 was identified in March 2012, and this strain has spread rapidly around the world. In August 2012, this new GII.4 strain was identified in patients in South Korea. Therefore, to examine the characteristics of the epidemic norovirus GII.4 2012 variant in South Korea, we conducted KM272334 full-length genomic analysis. The genome of the gg-12-08-04 strain consisted of 7,558 bp and contained three open reading frame (ORF composites throughout the whole genome: ORF1 (5,100 bp, ORF2 (1,623 bp, and ORF3 (807 bp. Phylogenetic analyses showed that gg-12-08-04 belonged to the GII.4 Sydney 2012 variant, sharing 98.92% nucleotide similarity with this variant strain. According to SimPlot analysis, the gg-12-08-04 strain was a recombinant strain with breakpoint at the ORF1/2 junction between Osaka 2007 and Apeldoorn 2008 strains. This study is the first report of the complete sequence of the GII.4 Sydney 2012 strain in South Korea. Therefore, this may represent the standard sequence of the norovirus GII.4 2012 variant in South Korea and could therefore be useful for the development of norovirus vaccines.

  8. Identification of 3'UTR sequence elements and a teloplasm localization motif sufficient for the localization of Hro-twist mRNA to the zygotic animal and vegetal poles.

    Science.gov (United States)

    Farooq, Mehrin; Choi, Jonathan; Seoane, Agustin I; Lleras, Roberto A; Tran, Hoan V; Mandal, Stephanie A; Nelson, Christine L; Soto, Julio G

    2012-05-01

    The early localization of mRNA transcripts is critical in sorting cell fate determinants in the developing embryo. In the glossiphoniid leech, Helobdella robusta, maternal mRNAs, such as Hro-twist, localize to the zygotic teloplasm. Ten seven nucleotide repeat elements (AAUAAUA) called ARE2 and a predicted secondary structural motif, called teloplasm localization motif (TLM), are present in the 3'UTR of Hro-twist mRNA. We used site-directed mutagenesis, deletions, and microinjection of labeled, exogenous transcripts to determine if ARE2 elements, and the TLM, play a role in Hro-twist mRNA localization. Deleting the poly-A tail and the cytoplasmic polyadenylation element (CPE) had no effect on Hro-twist mRNA localization. Site-directed mutagenesis of nucleotides that altered ARE2 element sequences or the TLM suggest that the ARE2 elements and the TLM are important for Hro-twist mRNA localization to the teloplasm of pre-cleavage zygotes. Hro-Twist protein expression data suggest that the localization of Hro-twist transcripts in zygotes and stage two embryos is not involved in ensuring mesoderm specification, as Hro-Twist protein is expressed uniformly in most cells before gastrulation. Our data may support a shared molecular mechanism for leech transcripts that localize to the teloplasm. © 2012 The Authors Development, Growth & Differentiation © 2012 Japanese Society of Developmental Biologists.

  9. Prevalence of single nucleotide polymorphism among 27 diverse alfalfa genotypes as assessed by transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Li Xuehui

    2012-10-01

    Full Text Available Abstract Background Alfalfa, a perennial, outcrossing species, is a widely planted forage legume producing highly nutritious biomass. Currently, improvement of cultivated alfalfa mainly relies on recurrent phenotypic selection. Marker assisted breeding strategies can enhance alfalfa improvement efforts, particularly if many genome-wide markers are available. Transcriptome sequencing enables efficient high-throughput discovery of single nucleotide polymorphism (SNP markers for a complex polyploid species. Result The transcriptomes of 27 alfalfa genotypes, including elite breeding genotypes, parents of mapping populations, and unimproved wild genotypes, were sequenced using an Illumina Genome Analyzer IIx. De novo assembly of quality-filtered 72-bp reads generated 25,183 contigs with a total length of 26.8 Mbp and an average length of 1,065 bp, with an average read depth of 55.9-fold for each genotype. Overall, 21,954 (87.2% of the 25,183 contigs represented 14,878 unique protein accessions. Gene ontology (GO analysis suggested that a broad diversity of genes was represented in the resulting sequences. The realignment of individual reads to the contigs enabled the detection of 872,384 SNPs and 31,760 InDels. High resolution melting (HRM analysis was used to validate 91% of 192 putative SNPs identified by sequencing. Both allelic variants at about 95% of SNP sites identified among five wild, unimproved genotypes are still present in cultivated alfalfa, and all four US breeding programs also contain a high proportion of these SNPs. Thus, little evidence exists among this dataset for loss of significant DNA sequence diversity from either domestication or breeding of alfalfa. Structure analysis indicated that individuals from the subspecies falcata, the diploid subspecies caerulea, and the tetraploid subspecies sativa (cultivated tetraploid alfalfa were clearly separated. Conclusion We used transcriptome sequencing to discover large numbers of SNPs

  10. [Study of a comparative statistical analytical model for nucleotide sequences in the genome of viruses of the Papovaviridae family].

    Science.gov (United States)

    Leopardi, R

    1989-08-01

    In the present study a model for comparative analysis of nucleotide sequences has been developed, in order to evaluate statistical features of nucleotide distribution in DNA strands without any genetic relationship. Every DNA strand has been considered as a finite Markov chain; a matrix, whose elements represent the number of couplings between a nucleotide and the following one in 5'-3' direction, has been used for every DNA strand, and the statistical relationship has been detected by using Kendall's test. The genomes of Polyomavirus (strain A2) and DPV have been analysed by the proposed model; a substantial likeness between the behaviour of nucleotide distribution on all four DNA strands analysed has been shown; the strongest likeness concerned the complementary strands of Polyomavirus as well as the homologous sense strands of both viruses.

  11. Nucleotide sequence of the 18S-26S rRNA intergene region of the sea urchin.

    Science.gov (United States)

    Hindenach, B R; Stafford, D W

    1984-02-10

    The DNA sequence which spans the internal transcribed spacers of a cloned ribosomal transcription unit from the sea urchin, Lytechinus variegatus, has been determined. The region extends from the conserved Eco RI site near the 3' end of the 18S rDNA to a Bam HI site in the 26S rDNA and includes 232 nucleotides coding for 18S rRNA, 367 nucleotides of internal transcribed spacer, 159 nucleotides coding for 5.8S rRNA, 338 nucleotides of internal transcribed spacer, and 505 nucleotides coding for 26S rRNA. The rRNA coding regions were identified by direct analysis of 3'-labeled 18S and 5.8S rRNA and 5'-labeled 5.8S rRNA, and by sequence homology of the 26S rDNA with yeast and vertebrate 26/28S rRNAs. The internal transcribed spacers are GC-rich, similar to those of vertebrates. The 5.8S and 5' 26S rDNA sequences support a proposed model for a structural domain of the yeast large subunit ribosomal RNA (Veldman et al. [1981] Nucleic Acids Res. 9, 6935-6952).

  12. Nucleotide and Predicted Amino Acid Sequence-Based Analysis of the Avian Metapneumovirus Type C Cell Attachment Glycoprotein Gene: Phylogenetic Analysis and Molecular Epidemiology of U.S. Pneumoviruses

    Science.gov (United States)

    Alvarez, Rene; Lwamba, Humphrey M.; Kapczynski, Darrell R.; Njenga, M. Kariuki; Seal, Bruce S.

    2003-01-01

    A serologically distinct avian metapneumovirus (aMPV) was isolated in the United States after an outbreak of turkey rhinotracheitis (TRT) in February 1997. The newly recognized U.S. virus was subsequently demonstrated to be genetically distinct from European subtypes and was designated aMPV serotype C (aMPV/C). We have determined the nucleotide sequence of the gene encoding the cell attachment glycoprotein (G) of aMPV/C (Colorado strain and three Minnesota isolates) and predicted amino acid sequence by sequencing cloned cDNAs synthesized from intracellular RNA of aMPV/C-infected cells. The nucleotide sequence comprised 1,321 nucleotides with only one predicted open reading frame encoding a protein of 435 amino acids, with a predicted Mr of 48,840. The structural characteristics of the predicted G protein of aMPV/C were similar to those of the human respiratory syncytial virus (hRSV) attachment G protein, including two mucin-like regions (heparin-binding domains) flanking both sides of a CX3C chemokine motif present in a conserved hydrophobic pocket. Comparison of the deduced G-protein amino acid sequence of aMPV/C with those of aMPV serotypes A, B, and D, as well as hRSV revealed overall predicted amino acid sequence identities ranging from 4 to 16.5%, suggesting a distant relationship. However, G-protein sequence identities ranged from 72 to 97% when aMPV/C was compared to other members within the aMPV/C subtype or 21% for the recently identified human MPV (hMPV) G protein. Ratios of nonsynonymous to synonymous nucleotide changes were greater than one in the G gene when comparing the more recent Minnesota isolates to the original Colorado isolate. Epidemiologically, this indicates positive selection among U.S. isolates since the first outbreak of TRT in the United States. PMID:12682171

  13. Detection, Validation, and Application of Genotyping-by-Sequencing Based Single Nucleotide Polymorphisms in Upland Cotton

    Directory of Open Access Journals (Sweden)

    M. Sariful Islam

    2015-03-01

    Full Text Available The presence of two closely related subgenomes in the allotetraploid Upland cotton, combined with a narrow genetic base of the cultivated varieties, has hindered the identification of polymorphic genetic markers and their use in improving this important crop. Genotyping-by-sequencing (GBS is a rapid way to identify single nucleotide polymorphism (SNP markers; however, these SNPs may be specific to the sequenced cotton lines. Our objective was to obtain a large set of polymorphic SNPs with broad applicability to the cultivated cotton germplasm. We selected 11 diverse cultivars and their random-mated recombinant inbred progeny for SNP marker development via GBS. Two different GBS methodologies were used by Data2Bio (D2B and the Institute for Genome Diversity (IGD to identify 4441 and 1176 polymorphic SNPs with minor allele frequency of ≥0.1, respectively. We further filtered the SNPs and aligned their sequences to the diploid reference genome. We were able to use homeologous SNPs to assign 1071 SNP loci to the At subgenome and 1223 to the Dt subgenome. These filtered SNPs were located in genic regions about twice as frequently as expected by chance. We tested 111 of the SNPs in 154 diverse Upland cotton lines, which confirmed the utility of the SNP markers developed in such approach. Not only were the SNPs identified in the 11 cultivars present in the 154 cotton lines, no two cultivars had identical SNP genotypes. We conclude that GBS can be easily used to discover SNPs in Upland cotton, which can be converted to functional genotypic assays for use in breeding and genetic studies.

  14. Structure and sequence motifs of siRNA linked with in vitro down-regulation of morbillivirus gene expression.

    Science.gov (United States)

    de Almeida, Renata Servan; Keita, Djénéba; Libeau, Geneviève; Albina, Emmanuel

    2008-07-01

    The most challenging task in RNA interference is the design of active small interfering RNA (siRNA) sequences. Numerous strategies have been published to select siRNA. They have proved effective in some applications but have failed in many others. Nonetheless, all existing guidelines have been devised to select effective siRNAs targeting human or murine genes. They may not be appropriate to select functional sequences that target genes from other organisms like viruses. In this study, we have analyzed 62 siRNA duplexes of 19 bases targeting three genes of three morbilliviruses. In those duplexes, we have checked which features are associated with siRNA functionality. Our results suggest that the intramolecular secondary structure of the targeted mRNA contributes to siRNA efficiency. We also confirm that the presence of at least the sequence motifs U13, A or U19, as well as the absence of G13, cooperate to increase siRNA knockdown rates. Additionally, we observe that G11 is linked with siRNA efficacy. We believe that an algorithm based on these findings may help in the selection of functional siRNA sequences directed against viral genes.

  15. Complete nucleotide sequence of a begomovirus associated with satellites molecules infecting a new host Tagetes patula in India.

    Science.gov (United States)

    Marwal, Avinash; Sahu, Anurag Kumar; Choudhary, Devendra Kumar; Gaur, R K

    2013-08-01

    In the year 2012 leaf curl disease was observed on Marigold (Tagetes patula) in Lakshmangrh, Sikar province of India. Affected plants were severely stunted with apical leaf curl and crinkled leaves, symptoms typical of begomovirus infection. This is the first report of complete nucleotide sequence of a begomovirus associated with satellites molecules infecting a new host Tagetes patula in India.

  16. Finding the right coverage : The impact of coverage and sequence quality on single nucleotide polymorphism genotyping error rates

    NARCIS (Netherlands)

    Fountain, Emily D.; Pauli, Jonathan N.; Reid, Brendan N.; Palsboll, Per J.; Peery, M. Zachariah

    Restriction-enzyme-based sequencing methods enable the genotyping of thousands of single nucleotide polymorphism (SNP) loci in nonmodel organisms. However, in contrast to traditional genetic markers, genotyping error rates in SNPs derived from restriction-enzyme-based methods remain largely unknown.

  17. The nucleotide sequence of two restriction fragments located in the gene AB region of bacteriophage S13.

    NARCIS (Netherlands)

    F.G. Grosveld (Frank); J.H. Spencer

    1977-01-01

    textabstractThe nucleotide sequence of a double stranded DNA fragment from the gene AB region of bacteriophage S13 DNA has been determined. The fragment was isolated as two adjacent shorter fragments by cleavage of S13 replicative form (RF) DNA with restriction endonuclease III from Hemophilus

  18. Nucleotide Sequences of 5'-Terminal Parts of Coat Protein Genes of Various Isolates of NTN Strain of Potato Virus Y

    Czech Academy of Sciences Publication Activity Database

    Čeřovská, Noemi; Moravec, Tomáš; Filigarová, Marie; Petrzik, Karel

    2001-01-01

    Roč. 45, - (2001), s. 55-59 ISSN 0001-723X R&D Projects: GA ČR GA522/01/1121 Institutional research plan: CEZ:AV0Z5038910; CEZ:AV0Z5051902 Keywords : Nucleotide Sequences * NTN Strain of Potato Virus Y Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 0.644, year: 2001

  19. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  20. Predicting Kinase Activity in Angiotensin Receptor Phosphoproteomes Based on Sequence-Motifs and Interactions

    DEFF Research Database (Denmark)

    Bøgebo, Rikke; Horn, Heiko; Olsen, Jesper V

    2014-01-01

    Recent progress in the understanding of seven-transmembrane receptor (7TMR) signalling has promoted the development of a new generation of pathway selective ligands. The angiotensin II type I receptor (AT1aR) is one of the most studied 7TMRs with respect to selective activation of the β-arrestin ......Recent progress in the understanding of seven-transmembrane receptor (7TMR) signalling has promoted the development of a new generation of pathway selective ligands. The angiotensin II type I receptor (AT1aR) is one of the most studied 7TMRs with respect to selective activation of the β......-arrestin dependent signalling. Two complimentary global phosphoproteomics studies have analyzed the complex signalling induced by the AT1aR. Here we integrate the data sets from these studies and perform a joint analysis using a novel method for prediction of differential kinase activity from phosphoproteomics data....... The method builds upon NetworKIN, which applies sophisticated linear motif analysis in combination with contextual network modelling to predict kinase-substrate associations with high accuracy and sensitivity. These predictions form the basis for subsequently nonparametric statistical analysis to identify...

  1. Nucleotide sequence and developmental expression of Acanthamoeba S-adenosylmethionine synthetase gene.

    Science.gov (United States)

    Ahn, K S; Henney, H R

    1997-03-20

    We have isolated and characterized a cDNA (cDNA1) from an Acanthamoeba cDNA library encoding the enzyme S-adenosylmethionine (SAM) synthetase (ATP: L-methionine S-adenosyltransferase; EC 2.5.1.6). The nucleotide sequence exhibits about 61-73% overall similarity to the corresponding gene of other organisms. The cDNA displays extreme codon bias with a preference for C or G in the third position. A putative initiation site and an ATP-binding site are identified. An amino acid content of 388 and a molecular mass of about 44,000 Daltons are deduced for the enzyme. Putative phosphorylation sites which might be involved in regulation of the enzyme are revealed. The cDNA was expressed in Escherichia coli BL21(DE3), and the identity of the protein product confirmed by Western blotting analysis. Northern analyses of the expression of the Acanthamoeba SAM synthetase gene during development revealed a pronounced reduction in the level of transcripts as amoebae converted to cysts.

  2. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer

    Science.gov (United States)

    Morrison, Carl D.; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M.; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R.; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H.; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C.; Johnson, Candace S.; Trump, Donald L.

    2014-01-01

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as “stitchers,” to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication–licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer. PMID:24469795

  3. Uncommon nucleotide excision repair phenotypes revealed by targeted high-throughput sequencing.

    Science.gov (United States)

    Calmels, Nadège; Greff, Géraldine; Obringer, Cathy; Kempf, Nadine; Gasnier, Claire; Tarabeux, Julien; Miguet, Marguerite; Baujat, Geneviève; Bessis, Didier; Bretones, Patricia; Cavau, Anne; Digeon, Béatrice; Doco-Fenzy, Martine; Doray, Bérénice; Feillet, François; Gardeazabal, Jesus; Gener, Blanca; Julia, Sophie; Llano-Rivas, Isabel; Mazur, Artur; Michot, Caroline; Renaldo-Robin, Florence; Rossi, Massimiliano; Sabouraud, Pascal; Keren, Boris; Depienne, Christel; Muller, Jean; Mandel, Jean-Louis; Laugel, Vincent

    2016-03-22

    Deficient nucleotide excision repair (NER) activity causes a variety of autosomal recessive diseases including xeroderma pigmentosum (XP) a disorder which pre-disposes to skin cancer, and the severe multisystem condition known as Cockayne syndrome (CS). In view of the clinical overlap between NER-related disorders, as well as the existence of multiple phenotypes and the numerous genes involved, we developed a new diagnostic approach based on the enrichment of 16 NER-related genes by multiplex amplification coupled with next-generation sequencing (NGS). Our test cohort consisted of 11 DNA samples, all with known mutations and/or non pathogenic SNPs in two of the tested genes. We then used the same technique to analyse samples from a prospective cohort of 40 patients. Multiplex amplification and sequencing were performed using AmpliSeq protocol on the Ion Torrent PGM (Life Technologies). We identified causative mutations in 17 out of the 40 patients (43%). Four patients showed biallelic mutations in the ERCC6(CSB) gene, five in the ERCC8(CSA) gene: most of them had classical CS features but some had very mild and incomplete phenotypes. A small cohort of 4 unrelated classic XP patients from the Basque country (Northern Spain) revealed a common splicing mutation in POLH (XP-variant), demonstrating a new founder effect in this population. Interestingly, our results also found ERCC2(XPD), ERCC3(XPB) or ERCC5(XPG) mutations in two cases of UV-sensitive syndrome and in two cases with mixed XP/CS phenotypes. Our study confirms that NGS is an efficient technique for the analysis of NER-related disorders on a molecular level. It is particularly useful for phenotypes with combined features or unusually mild symptoms. Targeted NGS used in conjunction with DNA repair functional tests and precise clinical evaluation permits rapid and cost-effective diagnosis in patients with NER-defects.

  4. Saddlebags: A software interface for submitting full-length HLA allele sequences to the EMBL-ENA nucleotide database.

    Science.gov (United States)

    Matern, B M; Groeneweg, M; Voorter, C E M; Tilanus, M G J

    2018-01-01

    Submission of full-length HLA allele sequences presents a unique challenge, both for high-throughput sequencing laboratories and smaller diagnostic laboratories. HLA's extensive polymorphism means that accurate representation and annotation of allele sequence is of critical importance, and curators of nucleotide databases must establish submission formats to ensure high-quality data and prevent ambiguities. The IPD-IMGT/HLA database is established as the standard repository for HLA sequences, and it is a major goal of the 17th International HLA and Immunogenetics Workshop to fill the IPD-IMGT/HLA database with full-length HLA sequences. The process of preparing sequence annotation and metadata is cumbersome and error prone, and it is desirable to create a straightforward and concise method of preparing sequence submissions. We introduce Saddlebags, a software tool for rapid generation of HLA (novel) full-length allele sequence submissions. HLA allele sequences are submitted first to EMBL European Nucleotide Archive (EMBL-ENA), and metadata is gathered for subsequent preparation of an IPD-IMGT/HLA formatted submission. Combining these steps into a pipeline reduces effort and minimizes errors for submitting laboratories. This software has been used by Maastricht University Medical Center Transplantation Immunology Laboratory to submit 79 novel alleles to EMBL-ENA, and the tool is freely available for the HLA community. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  5. Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing.

    Science.gov (United States)

    Zhang, W; Soika, V; Meehan, J; Su, Z; Ge, W; Ng, H W; Perkins, R; Simonyan, V; Tong, W; Hong, H

    2015-08-01

    Although many quality control (QC) methods have been developed to improve the quality of single-nucleotide variants (SNVs) in SNV-calling, QC methods for use subsequent to single-nucleotide polymorphism-calling have not been reported. We developed five QC metrics to improve the quality of SNVs using the whole-genome-sequencing data of a monozygotic twin pair from the Korean Personal Genome Project. The QC metrics improved both repeatability between the monozygotic twin pair and reproducibility between SNV-calling pipelines. We demonstrated the QC metrics improve reproducibility of SNVs derived from not only whole-genome-sequencing data but also whole-exome-sequencing data. The QC metrics are calculated based on the reference genome used in the alignment without accessing the raw and intermediate data or knowing the SNV-calling details. Therefore, the QC metrics can be easily adopted in downstream association analysis.

  6. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2007-02-01

    Full Text Available Abstract Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

  7. Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)

    OpenAIRE

    Singh, Ranjan K.; Tanner, John J.

    2012-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related ...

  8. Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna

    Directory of Open Access Journals (Sweden)

    Souche Erika L

    2011-06-01

    Full Text Available Abstract Background Daphnia (Crustacea: Cladocera plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. Daphnia magna is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, climate change, and antagonistic interactions with predators and parasites, which have all been demonstrated to induce micro-evolutionary and adaptive responses. Although its ecology and evolutionary biology is intensively studied, little is known on the functional genomics underpinning of phenotypic responses to environmental stressors. The aim of the present study was to find genes expressed in presence of environmental stressors, and target such genes for single nucleotide polymorphic (SNP marker development. Results We developed three expressed sequence tag (EST libraries using clonal lineages of D. magna exposed to ecological stressors, namely fish predation, parasite infection and pesticide exposure. We used these newly developed ESTs and other Daphnia ESTs retrieved from NCBI GeneBank to mine for SNP markers targeting synonymous as well as non synonymous genetic variation. We validate the developed SNPs in six natural populations of D. magna distributed at regional scale. Conclusions A large proportion (47% of the produced ESTs are Daphnia lineage specific genes, which are potentially involved in responses to environmental stress rather than to general cellular functions and metabolic activities, or reflect the arthropod's aquatic lifestyle. The characterization of genes expressed under stress and the validation of their SNPs for population genetic study is important for identifying ecologically responsive genes in D. magna.

  9. An Exploration of the Triplet Periodicity in Nucleotide Sequences with a Mature Self-Adaptive Spectral Rotation Approach

    Directory of Open Access Journals (Sweden)

    Bo Chen

    2014-01-01

    Full Text Available Previously, for predicting coding regions in nucleotide sequences, a self-adaptive spectral rotation (SASR method has been developed, based on a universal statistical feature of the coding regions, named triplet periodicity (TP. It outputs a random walk, that is, TP walk, in the complex plane for the query sequence. Each step in the walk is corresponding to a position in the sequence and generated from a long-term statistic of the TP in the sequence. The coding regions (TP intensive are then visually discriminated from the noncoding ones (without TP, in the TP walk. In this paper, the behaviors of the walks for random nucleotide sequences are further investigated qualitatively. A slightly leftward trend (a negative noise in such walks is observed, which is not reported in the previous SASR literatures. An improved SASR, named the mature SASR, is proposed, in order to eliminate the noise and correct the TP walks. Furthermore, a potential sequence pattern opposite to the TP persistent pattern, that is, the TP antipersistent pattern, is explored. The applications of the algorithms on simulated datasets show their capabilities in detecting such a potential sequence pattern.

  10. Design of sequence-specific DNA binding ligands that use a two-stranded peptide motif for DNA sequence recognition.

    Science.gov (United States)

    Nikolaev, V A; Grokhovsky, S L; Surovaya, A N; Leinsoo, T A; Sidorova NYu; Zasedatelev, A S; Zhuze, A L; Strahan, G A; Shafer, R H; Gursky, G V

    1996-08-01

    The design and DNA binding activity of beta-structure-forming peptides and netropsin-peptide conjugates are reported. It is found that a pair of peptides-S,S'-bis(Lys-Gly-Val-Cys-Val-NH-NH-Dns)-bridged by an S-S bond binds at least 10 times more strongly to poly(dG).poly(dC) than to poly(dA).poly(dT). This peptide can also discriminate between 5'-GpG-3' and 5'-GpC-3' steps in the DNA minor groove. Based on these observations, new synthetic ligands, bis-netropsins, were constructed in which two netropsin-like fragments were attached by means of short linkers to a pair of peptides-Gly-Cys-Gly- or Val-Cys-Val-bridged by S-S bonds. These compounds possess a composite binding specificity: the peptide chains recognize 5'-GpG-3' steps on DNA, whereas the netropsin-like fragments bind preferentially to runs of 4 AT base pairs. Our data indicate that combining the AT-base-pair specific properties of the netropsin-type structure with the 5'-GpG-3'-specific properties of certain oligopeptides offers a new approach to the synthesis of ligands capable of recognizing mixed sequences of AT- and GC-base pairs in the DNA minor groove. These compounds are potential models for DNA-binding domains in proteins which specifically recognize base pair sequences in the minor groove of DNA.

  11. Complete nucleotide sequence of a highly divergent cherry-associated luteovirus (ChALV) isolate from peach in South Korea.

    Science.gov (United States)

    Igori, Davaajargal; Lim, Seungmo; Baek, Dasom; Cho, In Sook; Moon, Jae Sun

    2017-09-01

    We determined the complete genome sequence of a highly divergent South Korean (SK) isolate of a cherry-associated luteovirus (ChALV) from peach. The ChALV-SK genome consists of 5,815 nucleotides, and contains five open reading frames (ORFs). A comparative analysis of the full genome showed only 73.1% nucleotide sequence identity with a recently described ChALV from the Czech Republic (CZ). Amino acid similarities of the individual ORFs between ChALV-SK and other luteoviruses range from 17.3 to 92%, which places the new isolate close to the species demarcation value for luteoviruses. Results show our ChALV-SK isolate to be highly diverged from the ChALV-CZ isolate.

  12. Comparison of the nucleotide sequence of wild-type hepatitis - A virus and its attenuated candidate vaccine derivative

    International Nuclear Information System (INIS)

    Cohen, J.I.; Rosenblum, B.; Ticehurst, J.R.; Daemer, R.; Feinstone, S.; Purcell, R.H.

    1987-01-01

    Development of attenuated mutants for use as vaccines is in progress for other viruses, including influenza, rotavirus, varicella-zoster, cytomegalovirus, and hepatitis-A virus (HAV). Attenuated viruses may be derived from naturally occurring mutants that infect human or nonhuman hosts. Alternatively, attenuated mutants may be generated by passage of wild-type virus in cell culture. Production of attenuated viruses in cell culture is a laborious and empiric process. Despite previous empiric successes, understanding the molecular basis for attenuation of vaccine viruses could facilitate future development and use of live-virus vaccines. Comparison of the complete nucleotide sequences of wild-type (virulent) and vaccine (attenuated) viruses has been reported for polioviruses and yellow fever virus. Here, the authors compare the nucleotide sequence of wild-type HAV HM-175 with that of a candidate vaccine derivative

  13. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  14. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  15. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Science.gov (United States)

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  16. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Directory of Open Access Journals (Sweden)

    Zing Tsung-Yeh Tsai

    2015-08-01

    Full Text Available Transcription factor (TF binding is determined by the presence of specific sequence motifs (SM and chromatin accessibility, where the latter is influenced by both chromatin state (CS and DNA structure (DS properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  17. Nucleotide sequence of the cDNA encoding the precursor of the beta subunit of rat lutropin.

    OpenAIRE

    Chin, W W; Godine, J E; Klein, D R; Chang, A S; Tan, L K; Habener, J F

    1983-01-01

    We have determined the nucleotide sequences of cDNAs encoding the precursor of the beta subunit of rat lutropin, a polypeptide hormone that regulates gonadal function, including the development of gametes and the production of steroid sex hormones. The cDNAs were prepared from poly(A)+ RNA derived from the pituitary glands of rats 4 weeks after ovariectomy and were cloned in bacterial plasmids. Bacterial colonies containing transfected plasmids were screened by hybridization with a 32P-labele...

  18. Spatial clustering of binding motifs and charges reveals conserved functional features in disordered nucleoporin sequences

    Science.gov (United States)

    Ando, David; Colvin, Michael; Rexach, Michael; Gopinathan, Ajay

    2013-03-01

    The Nuclear Pore Complex (NPC) gates the only channel through which cells exchange material between the nucleus and cytoplasm. Traffic is regulated by transport receptors bound to cargo which interact with numerous of disordered phenylalanine glycine (FG) repeat containing proteins (FG nups) that line this channel. The precise physical mechanism of transport regulation has remained elusive primarily due to the difficulty in understanding the structure and dynamics of such a large assembly of interacting disordered proteins. Here we have performed a comprehensive bioinformatic analysis, specifically tailored towards disordered proteins, on thousands of nuclear pore proteins from a variety of species revealing a set of highly conserved features in the sequence structure among FG nups. Contrary to the general perception that these proteins are functionally equivalent to homogeneous polymers, we show that biophysically important features within individual nups like the separation, spatial localization and ordering along the chain of FG and charge domains are highly conserved. Our current understanding of NPC structure and function should therefore be revised to account for these common features that are functionally relevant for the underlying physical mechanism of NPC gating.

  19. From next-generation sequencing alignments to accurate comparison and validation of single-nucleotide variants: the pibase software

    Science.gov (United States)

    Forster, Michael; Forster, Peter; Elsharawy, Abdou; Hemmrich, Georg; Kreck, Benjamin; Wittig, Michael; Thomsen, Ingo; Stade, Björn; Barann, Matthias; Ellinghaus, David; Petersen, Britt-Sabina; May, Sandra; Melum, Espen; Schilhabel, Markus B.; Keller, Andreas; Schreiber, Stefan; Rosenstiel, Philip; Franke, Andre

    2013-01-01

    Scientists working with single-nucleotide variants (SNVs), inferred by next-generation sequencing software, often need further information regarding true variants, artifacts and sequence coverage gaps. In clinical diagnostics, e.g. SNVs must usually be validated by visual inspection or several independent SNV-callers. We here demonstrate that 0.5–60% of relevant SNVs might not be detected due to coverage gaps, or might be misidentified. Even low error rates can overwhelm the true biological signal, especially in clinical diagnostics, in research comparing healthy with affected cells, in archaeogenetic dating or in forensics. For these reasons, we have developed a package called pibase, which is applicable to diploid and haploid genome, exome or targeted enrichment data. pibase extracts details on nucleotides from alignment files at user-specified coordinates and identifies reproducible genotypes, if present. In test cases pibase identifies genotypes at 99.98% specificity, 10-fold better than other tools. pibase also provides pair-wise comparisons between healthy and affected cells using nucleotide signals (10-fold more accurately than a genotype-based approach, as we show in our case study of monozygotic twins). This comparison tool also solves the problem of detecting allelic imbalance within heterozygous SNVs in copy number variation loci, or in heterogeneous tumor sequences. PMID:22965131

  20. Molecular cloning, nucleotide sequence, and characterization of a 40,000-molecular-weight lipoprotein of Haemophilus somnus.

    OpenAIRE

    Theisen, M; Rioux, C R; Potter, A A

    1992-01-01

    A gene of Haemophilus somnus encoding the major 40,000-molecular-weight antigen (LppA) was cloned on a 2-kb Sau3AI fragment. The nucleotide sequence of the entire DNA insert was determined. One open reading frame, encoding a 247-residue polypeptide with a calculated molecular weight of 27,072, was identified. This reading frame was confirmed by sequencing the fusion joint of two independent IppA::TnphoA gene fusions. The 21 amino-terminal amino acids of the deduced polypeptide showed strong s...

  1. UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs

    Science.gov (United States)

    Mignone, Flavio; Grillo, Giorgio; Licciulli, Flavio; Iacono, Michele; Liuni, Sabino; Kersey, Paul J.; Duarte, Jorge; Saccone, Cecilia; Pesole, Graziano

    2005-01-01

    The 5′ and 3′ untranslated regions of eukaryotic mRNAs play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5′ and 3′ untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated (and also collated as the UTRsite database) and cross-links to genomic and protein data are provided. The integration of UTRdb with genomic and protein data has allowed the implementation of a powerful retrieval resource for the selection and extraction of UTR subsets based on their genomic coordinates and/or features of the protein encoded by the relevant mRNA (e.g. GO term, PFAM domain, etc.). All internet resources implemented for retrieval and functional analysis of 5′ and 3′ untranslated regions of eukaryotic mRNAs are accessible at http://www.ba.itb.cnr.it/UTR/. PMID:15608165

  2. Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.

    Science.gov (United States)

    Mohamed Hashim, Ezzeddin Kamil; Abdullah, Rosni

    2015-12-21

    Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.

  3. Single nucleotide polymorphism barcoding of cytochrome c oxidase I sequences for discriminating 17 species of Columbidae by decision tree algorithm.

    Science.gov (United States)

    Yang, Cheng-Hong; Wu, Kuo-Chuan; Dahms, Hans-Uwe; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-07-01

    DNA barcodes are widely used in taxonomy, systematics, species identification, food safety, and forensic science. Most of the conventional DNA barcode sequences contain the whole information of a given barcoding gene. Most of the sequence information does not vary and is uninformative for a given group of taxa within a monophylum. We suggest here a method that reduces the amount of noninformative nucleotides in a given barcoding sequence of a major taxon, like the prokaryotes, or eukaryotic animals, plants, or fungi. The actual differences in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, provide a tool for developing a rapid, reliable, and high-throughput assay for the discrimination between known species. Here, we investigated SNPs as robust markers of genetic variation for identifying different pigeon species based on available cytochrome c oxidase I (COI) data. We propose here a decision tree-based SNP barcoding (DTSB) algorithm where SNP patterns are selected from the DNA barcoding sequence of several evolutionarily related species in order to identify a single species with pigeons as an example. This approach can make use of any established barcoding system. We here firstly used as an example the mitochondrial gene COI information of 17 pigeon species (Columbidae, Aves) using DTSB after sequence trimming and alignment. SNPs were chosen which followed the rule of decision tree and species-specific SNP barcodes. The shortest barcode of about 11 bp was then generated for discriminating 17 pigeon species using the DTSB method. This method provides a sequence alignment and tree decision approach to parsimoniously assign a unique and shortest SNP barcode for any known species of a chosen monophyletic taxon where a barcoding sequence is available.

  4. Ab initio electron propagator calculations of transverse conduction through DNA nucleotide bases in 1-nm nanopore corroborate third generation sequencing.

    Science.gov (United States)

    Kletsov, Aleksey A; Glukhovskoy, Evgeny G; Chumakov, Aleksey S; Ortiz, Joseph V

    2016-01-01

    The conduction properties of DNA molecule, particularly its transverse conductance (electron transfer through nucleotide bridges), represent a point of interest for DNA chemistry community, especially for DNA sequencing. However, there is no fully developed first-principles theory for molecular conductance and current that allows one to analyze the transverse flow of electrical charge through a nucleotide base. We theoretically investigate the transverse electron transport through all four DNA nucleotide bases by implementing an unbiased ab initio theoretical approach, namely, the electron propagator theory. The electrical conductance and current through DNA nucleobases (guanine [G], cytosine [C], adenine [A] and thymine [T]) inserted into a model 1-nm Ag-Ag nanogap are calculated. The magnitudes of the calculated conductance and current are ordered in the following hierarchies: gA>gG>gC>gT and IG>IA>IT>IC correspondingly. The new distinguishing parameter for the nucleobase identification is proposed, namely, the onset bias magnitude. Nucleobases exhibit the following hierarchy with respect to this parameter: Vonset(A)DNA translocation through an electrode-equipped nanopore. The results represent interest for the theorists and practitioners in the field of third generation sequencing techniques as well as in the field of DNA chemistry. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Purification, enzymatic characterization, and nucleotide sequence of a high-isoelectric-point alpha-glucosidase from barley malt

    DEFF Research Database (Denmark)

    Frandsen, T P; Lok, F; Mirgorodskaya, E

    2000-01-01

    in the transition state complex. Mass spectrometry of tryptic fragments assigned the 92-kD protein to a barley cDNA (GenBank accession no. U22450) that appears to encode an alpha-glucosidase. A corresponding sequence (HvAgl97; GenBank accession no. AF118226) was isolated from a genomic phage library using a c......DNA fragment from a barley cDNA library. HvAgl97 encodes a putative 96.6-kD protein of 879 amino acids with 93.8% identity to the protein deduced from U22450. The sequence contains two active site motifs of glycoside hydrolase family 31. Three introns of 86 to 4,286 bp interrupt the coding region. The four...

  6. Analysis of nucleotide sequence variations in herpes simplex virus types 1 and 2, and varicella-zoster virus

    International Nuclear Information System (INIS)

    Chiba, A.; Suzutani, T.; Koyano, S.; Azuma, M.; Saijo, M.

    1998-01-01

    To analyze the difference in the degree of divergence between genes from identical herpes virus species, we examined the nucleotide sequence of genes from the herpes simplex virus type 1 (HSV-l ) strains VR-3 and 17 encoding thymidine kinase (TK), deoxyribonuclease (DNase), protein kinase (PK; UL13) and virion-associated host shut off (vhs) protein (UL41). The frequency of nucleotide substitutions per 1 kb in TK gene was 2.5 to 4.3 times higher than those in the other three genes. To prove that the polymorphism of HSV-1 TK gene is common characteristic of herpes virus TK genes, we compared the diversity of TK genes among eight HSV-l , six herpes simplex virus type 2 (HSV-2) and seven varicella-zoster virus (VZV) strains. The average frequency of nucleotide substitutions per 1 kb in the TK gene of HSV-l strains was 4-fold higher than that in the TK gene of HSV-2 strains. The VZV TK gene was highly conserved and only two nucleotide changes were evident in VZV strains. However, the rate of non-synonymous substitutions in total nucleotide substitutions was similar among the TK genes of the three viruses. This result indicated that the mutational rates differed, but there were no significant differences in selective pressure. We conclude that HSV-l TK gene is highly diverged and analysis of variations in the gene is a useful approach for understanding the molecular evolution of HSV-l in a short period. (authors)

  7. Fusion protein gene nucleotide sequence similarities, shared antigenic sites and phylogenetic analysis suggest that phocid distemper virus 2 and canine distemper virus belong to the same virus entity.

    NARCIS (Netherlands)

    I.K.G. Visser (Ilona); R.W.J. van der Heijden (Roger); M.W.G. van de Bildt (Marco); M.J.H. Kenter (Marcel); C. Örvell; A.D.M.E. Osterhaus (Albert)

    1993-01-01

    textabstractNucleotide sequencing of the fusion protein (F) gene of phocid distemper virus-2 (PDV-2), recently isolated from Baikal seals (Phoca sibirica), revealed an open reading frame (nucleotides 84 to 2075) with two potential in-frame ATG translation initiation codons. We suggest that the

  8. The nucleotide sequence of the right-hand terminus of adenovirus type 5 DNA: Implications for the mechanism of DNA replication

    NARCIS (Netherlands)

    Steenbergh, P.H.; Sussenbach, J.S.

    The nucleotide sequence of the right-hand terminal 3% of adenovirus type 5 (Ad5) DNA has been determined, using the chemical degradation technique developed by Maxam and Gilbert (1977). This region of the genome comprises the 1003 basepair long HindIII-I fragment and the first 75 nucleotides of the

  9. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  10. A revised its nucleotide sequence gives a specifity for Smallanthus sonchifolius (Poepp. and Endl. and its products identification

    Directory of Open Access Journals (Sweden)

    Žiarovská Jana

    2013-01-01

    Full Text Available Yacon (Smallanthus sonchifolius is an Andean crop which is very regarded for its benefits for people suffering from diabetes or various digestive or renal disorders. Because no specific Smallanthus sonchifolius identification DNA markers are still known the paper demonstrates ITS regions to be able to detect and differentiate among yacon species and the potential for specific food authentification purposes is reported, too. The newly sequenced ITS of yacon accessions originated in Peru, Ecuador and Bolivia analyse provide the unique sequence site that differs from all of the other yacon species and is recognized by DraIII restriction endonuclease. Restriction cleavadge of the PCR amplified ITSs of the twenty-eight yacon accessions was performed and in all cases the recognition site was confirmed as a typical for Smallanthus sonchifolius . Based on the nucleotide specifity of Smallanthus sonchifolius, ITS sequence the PCR method combined with the restriction clevadge protocol was developed for yacon identification.

  11. Nucleotide sequences from the genomes of diverse cowpea accessions for discovery of genetic variation as part of the Feed the Future Innovation Lab for Climate Resilient Cowpea

    Data.gov (United States)

    US Agency for International Development — Nucleotide sequences were generated from 37 cowpea (Vigna unguiculata L. Walp.) accessions relevant to Africa, China and the USA to discover at type of genetic...

  12. CircularLogo: A lightweight web application to visualize intra-motif dependencies.

    Science.gov (United States)

    Ye, Zhenqing; Ma, Tao; Kalmbach, Michael T; Dasari, Surendra; Kocher, Jean-Pierre A; Wang, Liguo

    2017-05-22

    The sequence logo has been widely used to represent DNA or RNA motifs for more than three decades. Despite its intelligibility and intuitiveness, the traditional sequence logo is unable to display the intra-motif dependencies and therefore is insufficient to fully characterize nucleotide motifs. Many methods have been developed to quantify the intra-motif dependencies, but fewer tools are available for visualization. We developed CircularLogo, a web-based interactive application, which is able to not only visualize the position-specific nucleotide consensus and diversity but also display the intra-motif dependencies. Applying CircularLogo to HNF6 binding sites and tRNA sequences demonstrated its ability to show intra-motif dependencies and intuitively reveal biomolecular structure. CircularLogo is implemented in JavaScript and Python based on the Django web framework. The program's source code and user's manual are freely available at http://circularlogo.sourceforge.net . CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html . CircularLogo is an innovative web application that is specifically designed to visualize and interactively explore intra-motif dependencies.

  13. Nucleotide sequences of ribosomal internal transcribed spacers and their utility in distinguishing closely related Perinereis polychaets (Annelida; Polychaeta; Nereididae).

    Science.gov (United States)

    Chen, Chaolun Allen; Chen, Chang-Po; Fan, Tung-Yung; Yu, Jr-Kai; Hsieh, Hwey-Lian

    2002-01-01

    Nucleotide sequences of a segment of the rRNA transcription unit spanning from the 3' end of the 18S rDNA to the 5' end of 28S rDNA were determined for four species of Perinereis polychaetes: P. aibuhitensis, P. floridana, and two undescribed species, Perinereis sp1 and sp2. The 5.8S rDNA sequences are identical among the four species. Intraspecific variability was low with the Kimura 2-parameter (K2P) distance, ranging from 0 to 0.0138 for ITS1 and 0 to 0.0247 for ITS2. The interspecific nucleotide difference was significantly higher than those within species, with a mean K2P of 0.172 for ITS1 and 0.204 for ITS2, suggesting that comparisons of ITS regions can be used to evaluate the phylogenetic relationships among Perinereis species. Both neighbor-joining and parsimony analyses of ITS variability indicate a close relationship between the two undescribed species of Perinereis. These findings highlight the utility of the ITS sequence in conjunction with other morphological and ecological characters to delineate species boundaries among closely related polychaetes.

  14. [Molecular phylogeny of Turbellaria, based on data from comparing the nucleotide sequences of 18S ribosomal RNA genes].

    Science.gov (United States)

    Kuznedelov, K D; Timoshkin, O A

    1995-01-01

    Polymerase chain reaction and direct sequencing of the 5'-end region of the 18S ribosomal RNA gene were used to infer phylogenetic relationship among turbellarian flatworms from Lake Baikal. Representatives of 5 orders (Tricladida--10 spp., Lecithoepitheliata--5 spp., Prolecithophora--3 spp., Proseriata and Kalyptorhynchia one for each) were studied; nucleotide sequence of more than 340 nucleotides was determined for each species. Consensus sequence for each order having more than one representative species was determined. Distance matrix and maximum parsimony approaches were applied to infer phylogenies. Bootstrap procedure was used to estimate confidence limits, at the 100% level by bootstrapping, the group of three orders: Kalyptorhynchia, Proseriata and Lecithoepitheliata was found to be monophyletic. However, subsets inside the group had no significant support to be preferred or rejected. Our data do not support traditional systematics which joins two suborders Tricladida and Proseriata into the single order Seriata, and also do not support comparative anatomical data which show close relationship of Lecithoepitheliata and lower Prolecithophora.

  15. Nucleotide sequence and infectious cDNA clone of the L1 isolate of Pea seed-borne mosaic potyvirus.

    Science.gov (United States)

    Olsen, B S; Johansen, I E

    2001-01-01

    The complete nucleotide sequence of Pea seed-borne mosaic potyvirus isolate L1 has been determined from cloned virus cDNA. The PSbMV L1 genome is 9895 nucleotides in length excluding the poly(A) tail. Computer analysis of the sequence revealed a single long open reading frame (ORF) of 9594 nucleotides. The ORF potentially encodes a polyprotein of 3198 amino acids with a deduced Mr of 363537. Nine putative proteolytic cleavage sites were identified by analogy to consensus sequences and genome arrangement in other potyviruses. Two full-length cDNA clones, p35S-L1-4 and p35S-L1-5, were assembled under control of an enhanced 35S promoter and nopaline synthase terminator. Clone p35S-L1-4 was constructed with four introns and p35S-L1-5 with five introns inserted in the cDNA. Clone p35S-L1-4 was unstable in Escherichia coli often resulting in amplification of plasmids with deletions. Clone p35S-L1-5 was stable and apparently less toxic to Escherichia coli resulting in larger bacterial colonies and higher plasmid yield. Both clones were infectious upon mechanical inoculation of plasmid DNA on susceptible pea cultivars Fjord, Scout, and Brutus. Eight pea genotypes resistant to L1 virus were also resistant to the cDNA derived L1 virus. Both native PSbMV L1 and the cDNA derived virus infected Chenopodium quinoa systemically giving rise to characteristic necrotic lesions on uninoculated leaves.

  16. A cyclic nucleotide-gated channel mutation associated with canine daylight blindness provides insight into a role for the S2 segment tri-Asp motif in channel biogenesis.

    Directory of Open Access Journals (Sweden)

    Naoto Tanaka

    Full Text Available Cone cyclic nucleotide-gated channels are tetramers formed by CNGA3 and CNGB3 subunits; CNGA3 subunits function as homotetrameric channels but CNGB3 exhibits channel function only when co-expressed with CNGA3. An aspartatic acid (Asp to asparagine (Asn missense mutation at position 262 in the canine CNGB3 (D262N subunit results in loss of cone function (daylight blindness, suggesting an important role for this aspartic acid residue in channel biogenesis and/or function. Asp 262 is located in a conserved region of the second transmembrane segment containing three Asp residues designated the Tri-Asp motif. This motif is conserved in all CNG channels. Here we examine mutations in canine CNGA3 homomeric channels using a combination of experimental and computational approaches. Mutations of these conserved Asp residues result in the absence of nucleotide-activated currents in heterologous expression. A fluorescent tag on CNGA3 shows mislocalization of mutant channels. Co-expressing CNGB3 Tri-Asp mutants with wild type CNGA3 results in some functional channels, however, their electrophysiological characterization matches the properties of homomeric CNGA3 channels. This failure to record heteromeric currents suggests that Asp/Asn mutations affect heteromeric subunit assembly. A homology model of S1-S6 of the CNGA3 channel was generated and relaxed in a membrane using molecular dynamics simulations. The model predicts that the Tri-Asp motif is involved in non-specific salt bridge pairings with positive residues of S3/S4. We propose that the D262N mutation in dogs with CNGB3-day blindness results in the loss of these inter-helical interactions altering the electrostatic equilibrium within in the S1-S4 bundle. Because residues analogous to Tri-Asp in the voltage-gated Shaker potassium channel family were implicated in monomer folding, we hypothesize that destabilizing these electrostatic interactions impairs the monomer folding state in D262N mutant CNG

  17. Molecular cloning and nucleotide sequence of full-length cDNA for sweet potato catalase mRNA.

    Science.gov (United States)

    Sakajo, S; Nakamura, K; Asahi, T

    1987-06-01

    A nearly full-length cDNA clone for catalase (pCAS01) was obtained through immunological screening of cDNA expression library constructed from size-fractionated poly(A)-rich RNA of wounded sweet potato tuberous roots by Escherichia coli expression vector-primed cDNA synthesis. Two additional catalase cDNA clones (pCAS10 and pCAS13), which contained cDNA inserts slightly longer than that of pCAS01 at their 5'-termini, were identified by colony hybridization of another cDNA library. Those three catalase cDNAs contained primary structures not identical, but closely related, to one another based on their restriction enzyme and RNase cleavage mapping analyses, suggesting that microheterogeneity exists in catalase mRNAs. The cDNA insert of pCAS13 carried the entire catalase coding capacity, since the RNA transcribed in vitro from the cDNA under the SP6 phage promoter directed the synthesis of a catalase polypeptide in the wheat germ in vitro translation assay. The nucleotide sequencing of these catalase cDNAs indicated that 1900-base catalase mRNA contained a coding region of 1476 bases. The amino acid sequence of sweet potato catalase deduced from the nucleotide sequence was 35 amino acids shorter than rat liver catalase [Furuta, S., Hayashi, H., Hijikata, M., Miyazawa, S., Osumi, T. & Hashimoto, T. (1986) Proc. Natl Acad. Sci. USA 83, 313-317]. Although these two sequences showed only 38% homology, the sequences around the amino acid residues implicated in catalytic function, heme ligand or heme contact had been well conserved during evolution.

  18. A novel Bayesian DNA motif comparison method for clustering and retrieval.

    Directory of Open Access Journals (Sweden)

    Naomi Habib

    2008-02-01

    Full Text Available Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors.

  19. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Science.gov (United States)

    2010-07-01

    ... Colombettes; 1211 Geneva 20 Switzerland. Copies may also be inspected at the National Archives and Records... number of SEQ ID NOs, whether followed by a sequence or by the code “000.” (d) Where the description or... identifier, preceded by “SEQ ID NO:” in the text of the description or claims, even if the sequence is also...

  20. Immune selection in vitro reveals human immunodeficiency virus type 1 Nef sequence motifs important for its immune evasion function in vivo.

    Science.gov (United States)

    Lewis, Martha J; Lee, Patricia; Ng, Hwee L; Yang, Otto O

    2012-07-01

    Human immunodeficiency virus type 1 (HIV-1) Nef downregulates major histocompatibility complex class I (MHC-I), impairing the clearance of infected cells by CD8(+) cytotoxic T lymphocytes (CTLs). While sequence motifs mediating this function have been determined by in vitro mutagenesis studies of laboratory-adapted HIV-1 molecular clones, it is unclear whether the highly variable Nef sequences of primary isolates in vivo rely on the same sequence motifs. To address this issue, nef quasispecies from nine chronically HIV-1-infected persons were examined for sequence evolution and altered MHC-I downregulatory function under Gag-specific CTL immune pressure in vitro. This selection resulted in decreased nef diversity and strong purifying selection. Site-by-site analysis identified 13 codons undergoing purifying selection and 1 undergoing positive selection. Of the former, only 6 have been reported to have roles in Nef function, including 4 associated with MHC-I downregulation. Functional testing of naturally occurring in vivo polymorphisms at the 7 sites with no previously known functional role revealed 3 mutations (A84D, Y135F, and G140R) that ablated MHC-I downregulation and 3 (N52A, S169I, and V180E) that partially impaired MHC-I downregulation. Globally, the CTL pressure in vitro selected functional Nef from the in vivo quasispecies mixtures that predominately lacked MHC-I downregulatory function at the baseline. Overall, these data demonstrate that CTL pressure exerts a strong purifying selective pressure for MHC-I downregulation and identifies novel functional motifs present in Nef sequences in vivo.

  1. Targeted capture enrichment and sequencing identifies extensive nucleotide variation in the turkey MHC-B.

    Science.gov (United States)

    Reed, Kent M; Mendoza, Kristelle M; Settlage, Robert E

    2016-03-01

    Variation in the major histocompatibility complex (MHC) is increasingly associated with disease susceptibility and resistance in avian species of agricultural importance. This variation includes sequence polymorphisms but also structural differences (gene rearrangement) and copy number variation (CNV). The MHC has now been described for multiple galliform species including the best defined assemblies of the chicken (Gallus gallus) and domestic turkey (Meleagris gallopavo). Using this sequence resource, this study applied high-throughput sequencing to investigate MHC variation in turkeys of North America (NA turkeys). An MHC-specific SureSelect (Agilent) capture array was developed, and libraries were created for 14 turkeys representing domestic (commercial bred), heritage breed, and wild turkeys. In addition, a representative of the Ocellated turkey (M. ocellata) and chicken (G. gallus) was included to test cross-species applicability of the capture array allowing for identification of new species-specific polymorphisms. Libraries were hybridized to ∼12 K cRNA baits and the resulting pools were sequenced. On average, 98% of processed reads mapped to the turkey whole genome sequence and 53% to the MHC target. In addition to the MHC, capture hybridization recovered sequences corresponding to other MHC regions. Sequence alignment and de novo assembly indicated the presence of several additional BG genes in the turkey with evidence for CNV. Variant detection identified an average of 2245 polymorphisms per individual for the NA turkeys, 3012 for the Ocellated turkey, and 462 variants in the chicken (RJF-256). This study provides an extensive sequence resource for examining MHC variation and its relation to health of this agriculturally important group of birds.

  2. Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies.

    Directory of Open Access Journals (Sweden)

    Jiaxin Wu

    2014-03-01

    Full Text Available Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations, SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring.

  3. Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine.

    Science.gov (United States)

    Granberg, F; Bálint, Á; Belák, S

    2016-04-01

    Next-generation sequencing (NGS), also referred to as deep, high-throughput or massively parallel sequencing, is a powerful new tool that can be used for the complex diagnosis and intensive monitoring of infectious disease in veterinary medicine. NGS technologies are also being increasingly used to study the aetiology, genomics, evolution and epidemiology of infectious disease, as well as host-pathogen interactions and other aspects of infection biology. This review briefly summarises recent progress and achievements in this field by first introducing a range of novel techniques and then presenting examples of NGS applications in veterinary infection biology. Various work steps and processes for sampling and sample preparation, sequence analysis and comparative genomics, and improving the accuracy of genomic prediction are discussed, as are bioinformatics requirements. Examples of sequencing-based applications and comparative genomics in veterinary medicine are then provided. This review is based on novel references selected from the literature and on experiences of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden.

  4. Functional analysis reveals the possible role of the C-terminal sequences and PI motif in the function of lily (Lilium longiflorum) PISTILLATA (PI) orthologues.

    Science.gov (United States)

    Chen, Ming-Kun; Hsieh, Wen-Ping; Yang, Chang-Hsien

    2012-01-01

    Two lily (Lilium longiflorum) PISTILLATA (PI) genes, Lily MADS Box Gene 8 and 9 (LMADS8/9), were characterized. LMADS9 lacked 29 C-terminal amino acids including the PI motif that was present in LMADS8. Both LMADS8/9 mRNAs were prevalent in the first and second whorl tepals during all stages of development and were expressed in the stamen only in young flower buds. LMADS8/9 could both form homodimers, but the ability of LMADS8 homodimers to bind to CArG1 was relatively stronger than that of LMADS9 homodimers. 35S:LMADS8 completely, and 35S:LMADS9 only partially, rescued the second whorl petal formation and partially converted the first whorl sepal into a petal-like structure in Arabidopsis pi-1 mutants. Ectopic expression of LMADS8-C (with deletion of the 29 amino acids of the C-terminal sequence) or LMADS8-PI (with only the PI motif deleted) only partially rescued petal formation in pi mutants, which was similar to what was observed in 35S:LMADS9/pi plants. In contrast, 35:LMADS9+L8C (with the addition of the 29 amino acids of the LMADS8 C-terminal sequence) or 35S:LMADS9+L8PI (with the addition of the LMADS8 PI motif) demonstrated an increased ability to rescue petal formation in pi mutants, which was similar to what was observed in 35S:LMADS8/pi plants. Furthermore, ectopic expression of LMADS8-M (with the MADS domain truncated) generated more severe dominant negative phenotypes than those seen in 35S:LMADS9-M flowers. These results revealed that the 29 amino acids including the PI motif in the C-terminal region of the lily PI orthologue are valuable for its function in regulating perianth organ formation.

  5. Conservation of sequence motifs suggests that the nonclassical MHC class I lineages CD1/PROCR and UT were established before the emergence of tetrapod species.

    Science.gov (United States)

    Dijkstra, Johannes M; Yamaguchi, Takuya; Grimholt, Unni

    2017-12-21

    Humans have a number of nonclassical major histocompatibility complex (MHC) class I molecules that are quite divergent from the classical ones, and that may have separated from the classical lineage in pre-mammalian times. To estimate when in evolution the respective nonclassical lineages separated from the classical lineage, we first identified "phylogenetic marker motifs" within the evolution of classical MHC class I; the selected motifs are rather specific for and rather stably inherited within clades of species. Distribution of these motifs in nonclassical MHC class I molecules indicates that the lineage including the nonclassical MHC class I molecules CD1 and PROCR separated from the classical lineage before the emergence of tetrapod species, and that the human nonclassical MHC class I molecules FCGRT, MIC/ULBP/RAET, HFE, MR1, and ZAG show similarity with classical MHC class I at the avian/reptilian level. An MR1-like α1 exon sequence was identified in turtle. Our system furthermore indicates that the lineage UT, hitherto only found in non-eutherian mammals, predates tetrapod existence, and we identified UT genes in reptiles. If only accepting wide distribution of a lineage among extant species as true evidence for ancientness, the oldest identified nonclassical MHC class I lineage remains the fish-specific lineage Z, which was corroborated in the present study by finding both Z and classical-type MHC class I sequences in a primitive fish, the bichir. In short, we gained important new insights into the evolution of classical MHC class I motifs and the probable time of origin of nonclassical MHC class I lineages.

  6. Functional analysis reveals the possible role of the C-terminal sequences and PI motif in the function of lily (Lilium longiflorum) PISTILLATA (PI) orthologues

    Science.gov (United States)

    Chen, Ming-Kun; Hsieh, Wen-Ping; Yang, Chang-Hsien

    2012-01-01

    Two lily (Lilium longiflorum) PISTILLATA (PI) genes, Lily MADS Box Gene 8 and 9 (LMADS8/9), were characterized. LMADS9 lacked 29 C-terminal amino acids including the PI motif that was present in LMADS8. Both LMADS8/9 mRNAs were prevalent in the first and second whorl tepals during all stages of development and were expressed in the stamen only in young flower buds. LMADS8/9 could both form homodimers, but the ability of LMADS8 homodimers to bind to CArG1 was relatively stronger than that of LMADS9 homodimers. 35S:LMADS8 completely, and 35S:LMADS9 only partially, rescued the second whorl petal formation and partially converted the first whorl sepal into a petal-like structure in Arabidopsis pi-1 mutants. Ectopic expression of LMADS8-C (with deletion of the 29 amino acids of the C-terminal sequence) or LMADS8-PI (with only the PI motif deleted) only partially rescued petal formation in pi mutants, which was similar to what was observed in 35S:LMADS9/pi plants. In contrast, 35:LMADS9+L8C (with the addition of the 29 amino acids of the LMADS8 C-terminal sequence) or 35S:LMADS9+L8PI (with the addition of the LMADS8 PI motif) demonstrated an increased ability to rescue petal formation in pi mutants, which was similar to what was observed in 35S:LMADS8/pi plants. Furthermore, ectopic expression of LMADS8-M (with the MADS domain truncated) generated more severe dominant negative phenotypes than those seen in 35S:LMADS9-M flowers. These results revealed that the 29 amino acids including the PI motif in the C-terminal region of the lily PI orthologue are valuable for its function in regulating perianth organ formation. PMID:22068145

  7. Mason: a JavaScript web site widget for visualizing and comparing annotated features in nucleotide or protein sequences.

    Science.gov (United States)

    Jaschob, Daniel; Davis, Trisha N; Riffle, Michael

    2015-03-07

    Sequence feature annotations (e.g., protein domain boundaries, binding sites, and secondary structure predictions) are an essential part of biological research. Annotations are widely used by scientists during research and experimental design, and are frequently the result of biological studies. A generalized and simple means of disseminating and visualizing these data via the web would be of value to the research community. Mason is a web site widget designed to visualize and compare annotated features of one or more nucleotide or protein sequence. Annotated features may be of virtually any type, ranging from annotating transcription binding sites or exons and introns in DNA to secondary structure or domain boundaries in proteins. Mason is simple to use and easy to integrate into web sites. Mason has a highly dynamic and configurable interface supporting multiple sets of annotations per sequence, overlapping regions, customization of interface and user-driven events (e.g., clicks and text to appear for tooltips). It is written purely in JavaScript and SVG, requiring no 3(rd) party plugins or browser customization. Mason is a solution for dissemination of sequence annotation data on the web. It is highly flexible, customizable, simple to use, and is designed to be easily integrated into web sites. Mason is open source and freely available at https://github.com/yeastrc/mason.

  8. Nucleotide sequence of a cDNA for branched chain acyltransferase with analysis of the deduced protein structure

    International Nuclear Information System (INIS)

    Hummel, K.B.; Litwer, S.; Bradford, A.P.; Aitken, A.; Danner, D.J.; Yeaman, S.J.

    1988-01-01

    Nucleotide sequence was determined for a 1.6-kilobase human cDNA putative for the branched chain acyltransferase protein of the branched chain α-ketoacid dehydrogenase complex. Translation of the sequence reveals an open reading frame encoding a 315-amino acid protein of molecular weight 35,759 followed by 560 bases of 3'-untranslated sequence. Three repeats of the polyadenylation signal hexamer ATTAAA are present prior to the polyadenylate tail. Within the open reading frame is a 10-amino acid fragment which matches exactly the amino acid sequence around the lipoate-lysine residue in bovine kidney branched chain acyltransferase, thus confirming the identity of the cDNA. Analysis of the deduced protein structure for the human branched chain acyltransferase revealed an organization into domains similar to that reported for the acyltransferase proteins of the pyruvate and α-ketoglutarate dehydrogenase complexes. This similarity in organization suggests that a more detailed analysis of the proteins will be required to explain the individual substrate and multienzyme complex specificity shown by these acyltransferases

  9. Determination of single nucleotide variants in Escherichia coli DH5α by using short-read sequencing.

    Science.gov (United States)

    Song, Yoseb; Lee, Bo-Rahm; Cho, Suhyung; Cho, Yoo-Bok; Kim, Seon-Won; Kang, Taek Jin; Kim, Sun Chang; Cho, Byung-Kwan

    2015-06-01

    Escherichia coli DH5α is a common laboratory strain that provides an important platform for routine use in cloning and synthetic biology applications. Many synthetic circuits have been constructed and successfully expressed in E. coli DH5α; however, its genome sequence has not been determined yet. Here, we determined E. coli DH5α genome sequence and identified genetic mutations that affect its phenotypic functions by using short-read sequencing. The sequencing results clearly described the genotypes of E. coli DH5α, which aid in further studies using the strain. Additionally, we observed 105 single nucleotide variants (SNVs), 83% of which were detected in protein-coding regions compared to the parental strain E. coli DH1. Interestingly, 23% of the protein-coding regions have mutations in their amino acid residues, whose biological functions were categorized into two-component systems, peptidoglycan biosynthesis and lipopolysaccharide biosynthesis. These results underscore the advantages of E. coli DH5α, which tolerates the components of transformation buffer and expresses foreign plasmids efficiently. Moreover, these SNVs were also observed in the commercially available strain. These data provide the genetic information of E. coli DH5α for its future application in metabolic engineering and synthetic biology. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Biological characterization and complete nucleotide sequence of a Tunisian isolate of Moroccan watermelon mosaic virus.

    Science.gov (United States)

    Yakoubi, S; Desbiez, C; Fakhfakh, H; Wipf-Scheibel, C; Marrakchi, M; Lecoq, H

    2008-01-01

    During a survey conducted in October 2005, cucurbit leaf samples showing virus-like symptoms were collected from the major cucurbit-growing areas in Tunisia. DAS-ELISA showed the presence of Moroccan watermelon mosaic virus (MWMV, Potyvirus), detected for the first time in Tunisia, in samples from the region of Cap Bon (Northern Tunisia). MWMV isolate TN05-76 (MWMV-Tn) was characterized biologically and its full-length genome sequence was established. MWMV-Tn was found to have biological properties similar to those reported for the MWMV type strain from Morocco. Phylogenetic analysis including the comparison of complete amino-acid sequences of 42 potyviruses confirmed that MWMV-Tn is related (65% amino-acid sequence identity) to Papaya ringspot virus (PRSV) isolates but is a member of a distinct virus species. Sequence analysis on parts of the CP gene of MWMV isolates from different geographical origins revealed some geographic structure of MWMV variability, with three different clusters: one cluster including isolates from the Mediterranean region, a second including isolates from western and central Africa, and a third one including isolates from the southern part of Africa. A significant correlation was observed between geographic and genetic distances between isolates. Isolates from countries in the Mediterranean region where MWMV has recently emerged (France, Spain, Portugal) have highly conserved sequences, suggesting that they may have a common and recent origin. MWMV from Sudan, a highly divergent variant, may be considered an evolutionary intermediate between MWMV and PRSV.

  11. Selection, Recombination and History in a Parasitic Flatworm (Echinococcus Inferred from Nucleotide Sequences

    Directory of Open Access Journals (Sweden)

    KL Haag

    1998-09-01

    Full Text Available Three species of flatworms from the genus Echinococcus (E. granulosus, E. multilocularis and E. vogeli and four strains of E. granulosus (cattle, horse, pig and sheep strains were analysed by the PCR-SSCP method followed by sequencing, using as targets two non-coding and two coding (one nuclear and one mitochondrial genomic regions. The sequencing data was used to evaluate hypothesis about the parasite breeding system and the causes of genetic diversification. The calculated recombination parameters suggested that cross-fertilisation was rare in the history of the group. However, the relative rates of substitution in the coding sequences showed that positive selection (instead of purifying selection drove the evolution of an elastase and neutrophil chemotaxis inhibitor gene (AgB/1. The phylogenetic analyses revealed several ambiguities, indicating that the taxonomic status of the E. granulosus horse strain should be revised

  12. Symbolic complexity for nucleotide sequences: a sign of the genome structure

    International Nuclear Information System (INIS)

    Salgado-García, R; Ugalde, E

    2016-01-01

    We introduce a method for estimating the complexity function (which counts the number of observable words of a given length) of a finite symbolic sequence, which we use to estimate the complexity function of coding DNA sequences for several species of the Hominidae family. In all cases, the obtained symbolic complexities show the same characteristic behavior: exponential growth for small word lengths, followed by linear growth for larger word lengths. The symbolic complexities of the species we consider exhibit a systematic trend in correspondence with the phylogenetic tree. Using our method, we estimate the complexity function of sequences obtained by some known evolution models, and in some cases we observe the characteristic exponential-linear growth of the Hominidae coding DNA complexity. Analysis of the symbolic complexity of sequences obtained from a specific evolution model points to the following conclusion: linear growth arises from the random duplication of large segments during the evolution of the genome, while the decrease in the overall complexity from one species to another is due to a difference in the speed of accumulation of point mutations. (paper)

  13. Cloning, nucleotide sequence and transcriptional analysis of the uvrA gene from Neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Black, C.G.; Fyfe, J.A.M.; Davies, J.K.

    1997-01-01

    A recombinant plasmid capable of restoring UV resistance to an Escherichia coli uvrA mutant was isolated from a genomic library of Neisseria gonorrhoeae. Sequence analysis revealed an open reading frame whose deduced amino acid sequence displayed significant similarity to those of the UvrA proteins of other bacterial species. A second open reading frame (ORF259) was identified upstream from, and in the opposite orientation to the gonococcal uvrA gene. Transcriptional fusions between portions of the gonococcal uvrA upstream region and a reporter gene were used to localise promoter activity in both E. coli and N. gonorrhoeae. The transcriptional starting points of uvrA and ORF259 were mapped in E. coli by primer extension analysis, and corresponding σ 70 promoters were identified. The arrangement of the uvrA-ORF259 intergenic region is similar to that of the gonococcal recA-aroD intergenic region. Both contain inverted copies of the 10 bp neisserial DNA uptake sequence situated between divergently transcribed genes. However, there is no evidence that either the uptake sequence or the proximity of the promoters influences expression of these genes. (author)

  14. Patterns of nucleotide sequence variation in ICAM1 and TNF genes ...

    Indian Academy of Sciences (India)

    We have studied DNA sequence variation in and around the genes ICAM1 and TNF, which play functional and correlated roles in inflammatory processes and immune cell responses, in 12 diverse ethnic groups of India, with a view to investigating the relative roles of demographic history and natural selection in shaping the ...

  15. Phylogenetic reconstruction using secondary structures and sequence motifs of ITS2 rDNA of Paragonimus westermani (Kerbert, 1878) Braun, 1899 (Digenea: Paragonimidae) and related species.

    Science.gov (United States)

    Prasad, Pramod Kumar; Tandon, Veena; Biswal, Devendra Kumar; Goswami, Lalit Mohan; Chatterjee, Anupam

    2009-12-03

    Most phylogenetic studies using current methods have focused on primary DNA sequence information. However, RNA secondary structures are particularly useful in systematics because they include characteristics that give "morphological" information, not found in the primary sequence. In several mountainous regions of Northeastern India, foci of Paragonimus (lung fluke) infection reportedly involve species that are known to prevail in neighbouring countries. The present study was undertaken to demonstrate the sequence analysis of the ribosomal DNA (ITS2) of the infective (metacercarial) stage of the lung fluke collected from the edible crab hosts that are abundant in a mountain stream of the area (Miao, Changlang District in Arunachal Pradesh) and to construct its phylogeny. Using the approach of molecular morphometrics that is based on ITS2 secondary structure homologies, phylogenetic relationships of the various isolates of Paragonimus species that are prevalent in the neighbouring Near-eastern countries have been discussed. Initially, ten predicted RNA secondary structures were reconstructed and the topology based only on the predicted RNA secondary structure of the ITS2 region resolved most relationships among the species studied. We obtained three similar topologies for seven species of the genus Paragonimus on the basis of traditional primary sequence analysis using MEGA and a Bayesian analysis of the combined data. The latter approach allowed us to include both primary sequence and RNA molecular morphometrics; each data partition was allowed to have a different evolution rate. Paragonimus westermani was found to group with P. siamensis of Thailand; this was best supported by both the molecular morphometrics and combined analyses. P. heterotremus, P. proliferus, P. skrjabini, P. bangkokensis and P. harinasutai formed a separate clade in the molecular phylogenies, and were reciprocally monophyletic with respect to other species. ITS2 sequence motifs allowed an

  16. Molecular phylogeny of Vincetoxicum (Apocynaceae-Asclepiadoideae) based on the nucleotide sequences of cpDNA and nrDNA.

    Science.gov (United States)

    Yamashiro, Tadashi; Fukuda, Tatsuya; Yokoyama, Jun; Maki, Masayuki

    2004-05-01

    Molecular phylogenetic analyses of Vincetoxicum and Tylophora (Apocynaceae-Asclepiadoideae) were conducted based on the nucleotide sequences of cpDNA (two intergenic spacers of trnL (UAA)-trnF (GAA) and psbA-trnH and three introns, i.e., atpF, trnG (UCC) and trnL (UAA)), and nrDNA (ITS and ETS regions). Our phylogenetic analysis revealed two monophyletic groups; one consisted of seven taxa of Tylophora and Vincetoxicum inamoenum, Vincetoxicum magnificum and Vincetoxicum macrophyllum (Clade I) and the other consisted of 17 accessions of Vincetoxicum (Clade II). The monophyly of the genus Vincetoxicum was not supported. Although many nucleotide substitutions were observed in Clade I, the genetic differentiation within Clade II was small. Low genetic diversification but considerable morphological divergence suggests that the species in Clade II had undergone rapid diversification. Although most species in Clade I have tiny flowers, those in Clade II have larger and more nectariferous ones. Thus, we hypothesized that the rapid morphological radiation in Clade II may have been due to the gaining of floral characters such as large flowers and large amounts of nectar corresponding to diverse pollinators.

  17. Sequence analysis of selected nucleotide sequences of abortogenic isolate of Equine Herpesvirus 1 and changes caused by serial passage in vitro

    Directory of Open Access Journals (Sweden)

    Dobromila Molinková

    2012-01-01

    Full Text Available The aim of this work was to isolate the abortogenic virus strain of Equine Herpesvirus 1 representing the current infection situation in the Czech Republic, describe it at the molecular level with accent on genes coding viral glycoproteins and observe the changes caused by the passaging of the virus on a cell culture. In 2009, an isolate of equine herpesvirus 1 was obtained from an abortion case in a mare from a herd affected by abortion storm. The virus identification was performed using the PCR method. The virus was isolated on the RK 13 cell line and after 6 passages in vitro the stability of sequences of selected sections of the virus genome was assessed and compared with the original field isolate. The virus sequences were also compared with known sequences of the abortogenic reference virus strain (V592 and with other known viral strains. One point mutation in a nucleotide sequence coding glycoprotein G was found, distinguishing the field isolate from V592. One point mutation in the gene for glycoprotein C was passage-induced. It was noted that the virus during the passage on RK13 cell line in the monitored sections was stable and is a suitable starting material for next experiments.

  18. Comparison of Two Massively Parallel Sequencing Platforms using 83 Single Nucleotide Polymorphisms for Human Identification

    OpenAIRE

    Apaga, Dame Loveliness T.; Dennis, Sheila E.; Salvador, Jazelyn M.; Calacal, Gayvelline C.; De Ungria, Maria Corazon A.

    2017-01-01

    The potential of Massively Parallel Sequencing (MPS) technology to vastly expand the capabilities of human identification led to the emergence of different MPS platforms that use forensically relevant genetic markers. Two of the MPS platforms that are currently available are the MiSeq? FGx? Forensic Genomics System (Illumina) and the HID-Ion Personal Genome Machine (PGM)? (Thermo Fisher Scientific). These are coupled with the ForenSeq? DNA Signature Prep kit (Illumina) and the HID-Ion AmpliSe...

  19. Complete nucleotide sequences and virion particle association of two satellite RNAs of panicum mosaic virus.

    Science.gov (United States)

    Pyle, Jesse D; Monis, Judit; Scholthof, Karen-Beth

    2017-08-15

    Over six decades ago, panicum mosaic virus (PMV) was identified as the first viral pathogen of cultivated switchgrass (Panicum virgatum). Subsequently, PMV was demonstrated to support the replication of both a satellite RNA virus (SPMV) and satellite RNA (satRNA) agents during natural infections of host grasses. In this study, we report the isolation and full-length sequences of two PMV satRNAs identified in 1988 from St. Augustinegrass (Stenotaphrum secundatum) and centipedegrass (Eremochloa ophiuroides) hosts. Each of these satellites have sequence relatedness at their 5'- and 3'-ends. In addition, satC has a region of ∼100 nt complementary to the 3'-end of the PMV genome. These agents are associated with purified virions of SPMV infections. Additionally, satS and satC RNAs contain conserved in-frame open reading frames in the complementary-sense sequences that could potentially generate 6.6- and 7.9-kDa proteins, respectively. In protoplasts and plants satS is infectious, when co-inoculated with the PMV RNA alone or PMV+SPMV RNAs, and negatively affects their accumulation. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. [Nucleotide sequences of 5S rRNA genes of polyploid species of wheat and Aegilops species].

    Science.gov (United States)

    Vakhitov, V A; Gimalov, F R; Shumiatskiĭ, G P

    1989-01-01

    Primary structures of 5S rRNA genes and of non-transcribed spacers between them were determined in families of 5S DNA repeats 420 and 500 b.p. long in 8 wheat and Aegilops species. The high conservatism of sequences coding for 5S rRNA, 3'- and 5'-ends of non-transcribed spacers was shown not to depend on the evolutional position, ploidy level and genomic composition of species. The activity of transcription of 5S rRNA cloned genes was determined in vitro. The functional heterogeneity was revealed in each family of repeats due to the existence of exchanges of separate nucleotides within the internal transcription control region. A greater deficiency of CpG dinucleotide was revealed in 5S rRNA genes than in non-transcribed spacers.

  1. Genetic diversity of Argentina tomato varieties revealed by morphological traits, simple sequence repeat, and single nucleotide polymorphism markers

    International Nuclear Information System (INIS)

    Xiaorong, H.U.; Yang, W.

    2012-01-01

    Twenty-six morphological traits as well as 47 single nucleotide polymorphism and simple sequence repeat markers were used to investigate genetic variation in 67 tomato (Solanum lycopersicum L.) varieties collected from Argentina between 1932 and 1974. Approximately 65.0% of the morphological traits and 55.3% of the molecular markers showed polymorphisms in the 67 varieties. Average taxonomic distance between any two varieties ranged from 0.6643 to 1.1776, while Nei's genetic distance varied from 0 to 0.2022. Cluster analysis indicated that 67 varieties could be grouped into three clusters at both morphological and molecular levels. The varieties collected before 1960 had larger genetic variation than those collected after 1960. (author)

  2. Telomeres of the linear chromosomes of Lyme disease spirochaetes: nucleotide sequence and possible exchange with linear plasmid telomeres.

    Science.gov (United States)

    Casjens, S; Murphy, M; DeLange, M; Sampson, L; van Vugt, R; Huang, W M

    1997-11-01

    Bacteria of the spirochaete genus Borrelia have linear chromosomes about 950 kbp in size. We report here that these linear chromosomes have covalently closed hairpin structures at their termini that are similar but not identical to those reported for linear plasmids carried by these organisms. Nucleotide sequence analysis of the chromosomal telomeric regions indicates that unique, apparently functional genes lie within a few hundred bp of each of the telomeres, and that there is an imperfect 26 bp inverted repeat at the two telomeres. In addition, we characterize a major chromosomal length polymorphism within the right telomeric regions of various Borrelia isolates, and show that sequences similar to those near the right telomere are often found on linear plasmids in B. burgdorferi (sensu stricto) isolates from nature. Sequences similar to a number of other regions of the chromosome, including those near the left telomere, were not found on B. burgdorferi plasmids. These observations suggest that there has been historical exchange of genetic information between the linear plasmids and the right end of the linear chromosome.

  3. Complete nucleotide sequence and gene rearrangement of the mitochondrial genome of the Japanese pond frog Rana nigromaculata.

    Science.gov (United States)

    Sumida, M; Kanamori, Y; Kaneda, H; Kato, Y; Nishioka, M; Hasegawa, M; Yonekawa, H

    2001-10-01

    In this study, we determined the complete nucleotide sequence of the mitochondrial genome of the Japanese pond frog Rana nigromaculata. The length of the sequence of the frog was 17,804 bp, though this was not absolute due to length variation caused by differing numbers of repetitive units in the control regions of individual frogs. The gene content, base composition, and codon usage of the Japanese pond frog conformed to those of typical vertebrate patterns. However, the comparison of gene organization between three amphibian species (Rana, Xenopus and caecilian) provided evidence that the gene arrangement of Rana differs by four tRNA gene positions from that of Xenopus or caecilian, a common gene arrangement in vertebrates. These gene rearrangements are presumed to have occurred by the tandem duplication of a gene region followed by multiple deletions of redundant genes. It is probable that the rearrangements start and end at tRNA genes involved in the initial production of a tandemly duplicated gene region. Putative secondary structures for the 22 tRNAs and the origin of the L-strand replication (OL) are described. Evolutionary relationships were estimated from the concatenated sequences of the 12 proteins encoded in the H-strand of mtDNA among 37 vertebrate species. A quartet-puzzling tree showed that three amphibian species form a monophyletic clade and that the caecilian is a sister group of the monophyletic Anura.

  4. PscanChIP: finding over-represented transcription factor-binding site motifs and their correlations in sequences from ChIP-Seq experiments

    Science.gov (United States)

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2013-01-01

    Chromatin immunoprecipitation followed by sequencing with next-generation technologies (ChIP-Seq) has become the de facto standard for building genome-wide maps of regions bound by a given transcription factor (TF). The regions identified, however, have to be further analyzed to determine the actual DNA-binding sites for the TF, as well as sites for other TFs belonging to the same TF complex or in general co-operating or interacting with it in transcription regulation. PscanChIP is a web server that, starting from a collection of genomic regions derived from a ChIP-Seq experiment, scans them using motif descriptors like JASPAR or TRANSFAC position-specific frequency matrices, or descriptors uploaded by users, and it evaluates both motif enrichment and positional bias within the regions according to different measures and criteria. PscanChIP can successfully identify not only the actual binding sites for the TF investigated by a ChIP-Seq experiment but also secondary motifs corresponding to other TFs that tend to bind the same regions, and, if present, precise positional correlations among their respective sites. The web interface is free for use, and there is no login requirement. It is available at http://www.beaconlab.it/pscan_chip_dev. PMID:23748563

  5. Sequence motif upstream of the Hendra virus fusion protein cleavage site is not sufficient to promote efficient proteolytic processing

    International Nuclear Information System (INIS)

    Craft, Willie Warren; Dutch, Rebecca Ellis

    2005-01-01

    The Hendra virus fusion (HeV F) protein is synthesized as a precursor, F 0 , and proteolytically cleaved into the mature F 1 and F 2 heterodimer, following an HDLVDGVK 109 motif. This cleavage event is required for fusogenic activity. To determine the amino acid requirements for processing of the HeV F protein, we constructed multiple mutants. Individual and simultaneous alanine substitutions of the eight residues immediately upstream of the cleavage site did not eliminate processing. A chimeric SV5 F protein in which the furin site was substituted for the VDGVK 109 motif of the HeV F protein was not processed but was expressed on the cell surface. Another chimeric SV5 F protein containing the HDLVDGVK 109 motif of the HeV F protein underwent partial cleavage. These data indicate that the upstream region can play a role in protease recognition, but is neither absolutely required nor sufficient for efficient processing of the HeV F protein

  6. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  7. The complete nucleotide sequence, genome organization, and origin of human adenovirus type 11

    International Nuclear Information System (INIS)

    Stone, Daniel; Furthmann, Anne; Sandig, Volker; Lieber, Andre

    2003-01-01

    The complete DNA sequence and transcription map of human adenovirus type 11 are reported here. This is the first published sequence for a subgenera B human adenovirus and demonstrates a genome organization highly similar to those of other human adenoviruses. All of the genes from the early, intermediate, and late regions are present in the expected locations of the genome for a human adenovirus. The genome size is 34,794 bp in length and has a GC content of 48.9%. Sequence alignment with genomes of groups A (Ad12), C (Ad5), D (Ad17), E (Simian adenovirus 25), and F (Ad40) revealed homologies of 64, 54, 68, 75, and 52%, respectively. Detailed genomic analysis demonstrated that Ads 11 and 35 are highly conserved in all areas except the hexon hypervariable regions and fiber. Similarly, comparison of Ad11 with subgroup E SAV25 revealed poor homology between fibers but high homology in proteins encoded by all other areas of the genome. We propose an evolutionary model in which functional viruses can be reconstituted following fiber substitution from one serotype to another. According to this model either the Ad11 genome is a derivative of Ad35, from which the fiber was substituted with Ad7, or the Ad35 genome is the product of a fiber substitution from Ad21 into the Ad11 genome. This model also provides a possible explanation for the origin of group E Ads, which are evolutionarily derived from a group C fiber substitution into a group B genome

  8. Update on Pneumocystis carinii f. sp. hominis typing based on nucleotide sequence variations in internal transcribed spacer regions of rRNA genes

    DEFF Research Database (Denmark)

    Lee, C H; Helweg-Larsen, J; Tang, X

    1998-01-01

    Pneumocystis carinii f. sp. hominis isolates from 207 clinical specimens from nine countries were typed based on nucleotide sequence variations in the internal transcribed spacer regions I and II (ITS1 and ITS2, respectively) of rRNA genes. The number of ITS1 nucleotides has been revised from...... the previously reported 157 bp to 161 bp. Likewise, the number of ITS2 nucleotides has been changed from 177 to 192 bp. The number of ITS1 sequence types has increased from 2 to 15, and that of ITS2 has increased from 3 to 14. The 15 ITS1 sequence types are designated types A through O, and the 14 ITS2 types...... are named types a through n. A total of 59 types of P. carinii f. sp. hominis were found in this study....

  9. Identification and nucleotide sequence of the thymidine kinase gene of Shope fibroma virus

    International Nuclear Information System (INIS)

    Upton, C.; McFadden, G.

    1986-01-01

    The thymidine kinase (TK) gene of Shope fibroma virus (SFV), a tumorigenic leporipoxvirus, was localized within the viral genome with degenerate oligonucleotide probes. These probes were constructed to two regions of high sequence conservation between the vaccinia virus TK gene and those of several known eucaryotic cellular TK genes, including human, mouse, hamster, and chicken TK genes. The oligonucleotide probes initially localized the SFV TK gene 50 kilobases (kb) from the right terminus of the 160-kb SFV genome within the 9.5-kb BamHI-HindIII fragment E. Fine-mapping analysis indicated that the TK Gene was within a 1.2-kb AvaI-HaeIII fragment, and DNA sequencing of this region revealed an open reading frame capable of encoding a polypeptide of 187 amino acids possessing considerable homology to the TK genes of the vaccinia, variola, and monkeypox orthopoxviruses and also to a variety of cellular TK genes. Homology matrix analysis and homology scores suggest that the SFV TK gene has diverged significantly from its counterpart members in the orthopoxvirus genus. Nevertheless, the presence of conserved upstream open reading frames on the 5' side of all of the poxvirus TK genes indicates a similarity of functional organization between the orthopoxviruses and leporipoxviruses. These data suggest a common ancestral origin for at least some of the unique internal regions of the leporipoxviruses and orthopoxviruses as exemplified by SFV and vaccinia virus, respectively

  10. Assessment of the labelling accuracy of spanish semipreserved anchovies products by FINS (forensically informative nucleotide sequencing

    Directory of Open Access Journals (Sweden)

    Amaya Velasco

    2016-06-01

    Full Text Available Anchovies have been traditionally captured and processed for human consumption for millennia. In the case of Spain, ripened and salted anchovies are a delicacy, which, in some cases, can reach high commercial values. Although there have been a number of studies presenting DNA methodologies for the identification of anchovies, this is one of the first studies investigating the level of mislabelling in this kind of products in Europe. Sixty-three commercial semipreserved anchovy products were collected in different types of food markets in four Spanish cities to check labelling accuracy. Species determination in these commercial products was performed by sequencing two different cyt-b mitochondrial DNA fragments. Results revealed mislabelling levels higher than 15%, what authors consider relatively high considering the importance of the product. The most frequent substitute species was the Argentine anchovy, Engraulis anchoita, which can be interpreted as an economic fraud.

  11. Assessment of the labelling accuracy of spanish semipreserved anchovies products by FINS (forensically informative nucleotide sequencing).

    Science.gov (United States)

    Velasco, Amaya; Aldrey, Anxela; Pérez-Martín, Ricardo I; Sotelo, Carmen G

    2016-06-01

    Anchovies have been traditionally captured and processed for human consumption for millennia. In the case of Spain, ripened and salted anchovies are a delicacy, which, in some cases, can reach high commercial values. Although there have been a number of studies presenting DNA methodologies for the identification of anchovies, this is one of the first studies investigating the level of mislabelling in this kind of products in Europe. Sixty-three commercial semipreserved anchovy products were collected in different types of food markets in four Spanish cities to check labelling accuracy. Species determination in these commercial products was performed by sequencing two different cyt-b mitochondrial DNA fragments. Results revealed mislabelling levels higher than 15%, what authors consider relatively high considering the importance of the product. The most frequent substitute species was the Argentine anchovy, Engraulis anchoita, which can be interpreted as an economic fraud.

  12. The Bryopsis hypnoides plastid genome: multimeric forms and complete nucleotide sequence.

    Directory of Open Access Journals (Sweden)

    Fang Lü

    Full Text Available BACKGROUND: Bryopsis hypnoides Lamouroux is a siphonous green alga, and its extruded protoplasm can aggregate spontaneously in seawater and develop into mature individuals. The chloroplast of B. hypnoides is the biggest organelle in the cell and shows strong autonomy. To better understand this organelle, we sequenced and analyzed the chloroplast genome of this green alga. PRINCIPAL FINDINGS: A total of 111 functional genes, including 69 potential protein-coding genes, 5 ribosomal RNA genes, and 37 tRNA genes were identified. The genome size (153,429 bp, arrangement, and inverted-repeat (IR-lacking structure of the B. hypnoides chloroplast DNA (cpDNA closely resembles that of Chlorella vulgaris. Furthermore, our cytogenomic investigations using pulsed-field gel electrophoresis (PFGE and southern blotting methods showed that the B. hypnoides cpDNA had multimeric forms, including monomer, dimer, trimer, tetramer, and even higher multimers, which is similar to the higher order organization observed previously for higher plant cpDNA. The relative amounts of the four multimeric cpDNA forms were estimated to be about 1, 1/2, 1/4, and 1/8 based on molecular hybridization analysis. Phylogenetic analyses based on a concatenated alignment of chloroplast protein sequences suggested that B. hypnoides is sister to all Chlorophyceae and this placement received moderate support. CONCLUSION: All of the results suggest that the autonomy of the chloroplasts of B. hypnoides has little to do with the size and gene content of the cpDNA, and the IR-lacking structure of the chloroplasts indirectly demonstrated that the multimeric molecules might result from the random cleavage and fusion of replication intermediates instead of recombinational events.

  13. A resource of genome-wide single-nucleotide polymorphisms generated by RAD tag sequencing in the critically endangered European eel

    DEFF Research Database (Denmark)

    Pujolar, J.M.; Jacobsen, M.W.; Frydenberg, J.

    2013-01-01

    Reduced representation genome sequencing such as restriction-site-associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the Eu......Reduced representation genome sequencing such as restriction-site-associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers...... for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome-wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome-wide set of SNP markers...

  14. Purification, enzymatic characterization, and nucleotide sequence of a high-isoelectric-point alpha-glucosidase from barley malt

    DEFF Research Database (Denmark)

    Frandsen, T P; Lok, F; Mirgorodskaya, E

    2000-01-01

    .5, and catalyzed the hydrolysis by a retaining mechanism, as shown by nuclear magnetic resonance. Acarbose was a strong inhibitor (K(i) = 1.5 microM). Molecular recognition revealed that all OH-groups in the non-reducing ring and OH-3 in the reducing ring of maltose formed important hydrogen bonds to the enzyme......High-isoelectric-point (pI) alpha-glucosidase was purified 7, 300-fold from an extract of barley (Hordeum vulgare) malt by ammonium sulfate fractionation, ion-exchange, and butyl-Sepharose chromatography. The enzyme had high activity toward maltose (k(cat) = 25 s(-1)), with an optimum at pH 4......DNA fragment from a barley cDNA library. HvAgl97 encodes a putative 96.6-kD protein of 879 amino acids with 93.8% identity to the protein deduced from U22450. The sequence contains two active site motifs of glycoside hydrolase family 31. Three introns of 86 to 4,286 bp interrupt the coding region. The four...

  15. Nucleotide and protein sequences for dog masticatory tropomyosin identify a novel Tpm4 gene product.

    Science.gov (United States)

    Brundage, Elizabeth A; Biesiadecki, Brandon J; Reiser, Peter J

    2015-10-01

    Jaw-closing muscles of several vertebrate species, including members of Carnivora, express a unique, "masticatory", isoform of myosin heavy chain, along with isoforms of other myofibrillar proteins that are not expressed in most other muscles. It is generally believed that the complement of myofibrillar isoforms in these muscles serves high force generation for capturing live prey, breaking down tough plant material and defensive biting. A unique isoform of tropomyosin (Tpm) was reported to be expressed in cat jaw-closing muscle, based upon two-dimensional gel mobility, peptide mapping, and immunohistochemistry. The objective of this study was to obtain protein and gene sequence information for this unique Tpm isoform. Samples of masseter (a jaw-closing muscle), tibialis (predominantly fast-twitch fibers), and the deep lateral gastrocnemius (predominantly slow-twitch fibers) were obtained from adult dogs. Expressed Tpm isoforms were cloned and sequencing yielded cDNAs that were identical to genomic predicted striated muscle Tpm1.1St(a,b,b,a) (historically referred to as αTpm), Tpm2.2St(a,b,b,a) (βTpm) and Tpm3.12St(a,b,b,a) (γTpm) isoforms (nomenclature reflects predominant tissue expression ("St"-striated muscle) and exon splicing pattern), as well as a novel 284 amino acid isoform observed in jaw-closing muscle that is identical to a genomic predicted product of the Tpm4 gene (δTpm) family. The novel isoform is designated as Tpm4.3St(a,b,b,a). The myofibrillar Tpm isoform expressed in dog masseter exhibits a unique electrophoretic mobility on gels containing 6 M urea, compared to other skeletal Tpm isoforms. To validate that the cloned Tpm4.3 isoform is the Tpm expressed in dog masseter, E. coli-expressed Tpm4.3 was electrophoresed in the presence of urea. Results demonstrate that Tpm4.3 has identical electrophoretic mobility to the unique dog masseter Tpm isoform and is of different mobility from that of muscle Tpm1.1, Tpm2.2 and Tpm3.12 isoforms. We

  16. Nucleotide sequence analysis of NIPBL gene in Indian Cornelia de Lange syndrome cases.

    Science.gov (United States)

    Bajaj, Shailesh; Ranade, Suvidya; Gambhir, Prakash

    2013-01-01

    Cornelia de Lange syndrome (CdLS) is a multisystem developmental disorder in children. The disorder is caused mainly due to mutations in Nipped-B-like protein. The molecular data for CdLS is available from developed countries, but not available in developing countries like India. In the present study, the hotspot region of NIPBL gene was screened by Polymerase Chain Reaction which includes exon 2, 22, 42, and a biggest exon 10, in six CdLS patients and ten controls. The method adopted in present study was amplification of the target exon by using polymerase chain reaction, qualitative confirmation of amplicons by Agarose Gel Electrophoresis and use of amplicons for Conformation Sensitive Gel Electrophoresis to find heteroduplex formation followed by sequencing. We report two polymorphisms in the studied region of gene NIPBL. The polymorphisms are in the region of intron 1 and in exon 10. The polymorphism C/A is present in intron 1 region and polymorphism T/G in exon 10. The intronic region polymorphism may have a role in intron splicing whereas the polymorphism in exon 10 results in amino acid change (Val to Gly). These polymorphisms are disease associated as these are found in CdLS patients only and not in controls.

  17. Nodavirus Coat Protein Imposes Dodecahedral RNA Structure Independent of Nucleotide Sequence and Length†

    Science.gov (United States)

    Tihova, Mariana; Dryden, Kelly A.; Le, Thuc-vy L.; Harvey, Stephen C.; Johnson, John E.; Yeager, Mark; Schneemann, Anette

    2004-01-01

    The nodavirus Flock house virus (FHV) has a bipartite, positive-sense RNA genome that is packaged into an icosahedral particle displaying T=3 symmetry. The high-resolution X-ray structure of FHV has shown that 10 bp of well-ordered, double-stranded RNA are located at each of the 30 twofold axes of the virion, but it is not known which portions of the genome form these duplex regions. The regular distribution of double-stranded RNA in the interior of the virus particle indicates that large regions of the encapsidated genome are engaged in secondary structure interactions. Moreover, the RNA is restricted to a topology that is unlikely to exist during translation or replication. We used electron cryomicroscopy and image reconstruction to determine the structure of four types of FHV particles that differed in RNA and protein content. RNA-capsid interactions were primarily mediated via the N and C termini, which are essential for RNA recognition and particle assembly. A substantial fraction of the packaged nucleic acid, either viral or heterologous, was organized as a dodecahedral cage of duplex RNA. The similarity in tertiary structure suggests that RNA folding is independent of sequence and length. Computational modeling indicated that RNA duplex formation involves both short-range and long-range interactions. We propose that the capsid protein is able to exploit the plasticity of the RNA secondary structures, capturing those that are compatible with the geometry of the dodecahedral cage. PMID:14990708

  18. Comparison of Two Massively Parallel Sequencing Platforms using 83 Single Nucleotide Polymorphisms for Human Identification.

    Science.gov (United States)

    Apaga, Dame Loveliness T; Dennis, Sheila E; Salvador, Jazelyn M; Calacal, Gayvelline C; De Ungria, Maria Corazon A

    2017-03-24

    The potential of Massively Parallel Sequencing (MPS) technology to vastly expand the capabilities of human identification led to the emergence of different MPS platforms that use forensically relevant genetic markers. Two of the MPS platforms that are currently available are the MiSeq ® FGx™ Forensic Genomics System (Illumina) and the HID-Ion Personal Genome Machine (PGM)™ (Thermo Fisher Scientific). These are coupled with the ForenSeq™ DNA Signature Prep kit (Illumina) and the HID-Ion AmpliSeq™ Identity Panel (Thermo Fisher Scientific), respectively. In this study, we compared the genotyping performance of the two MPS systems based on 83 SNP markers that are present in both MPS marker panels. Results show that MiSeq ® FGx™ has greater sample-to-sample variation than the HID-Ion PGM™ in terms of read counts for all the 83 SNP markers. Allele coverage ratio (ACR) values show generally balanced heterozygous reads for both platforms. Two and four SNP markers from the MiSeq ® FGx™ and HID-Ion PGM™, respectively, have average ACR values lower than the recommended value of 0.67. Comparison of genotype calls showed 99.7% concordance between the two platforms.

  19. Nucleotide sequence analyses of the MRP1 gene in four populations suggest negative selection on its coding region

    Directory of Open Access Journals (Sweden)

    Ryan Stephen

    2006-05-01

    Full Text Available Abstract Background The MRP1 gene encodes the 190 kDa multidrug resistance-associated protein 1 (MRP1/ABCC1 and effluxes diverse drugs and xenobiotics. Sequence variations within this gene might account for differences in drug response in different individuals. To facilitate association studies of this gene with diseases and/or drug response, exons and flanking introns of MRP1 were screened for polymorphisms in 142 DNA samples from four different populations. Results Seventy-one polymorphisms, including 60 biallelic single nucleotide polymorphisms (SNPs, ten insertions/deletions (indel and one short tandem repeat (STR were identified. Thirty-four of these polymorphisms have not been previously reported. Interestingly, the STR polymorphism at the 5' untranslated region (5'UTR occurs at high but different frequencies in the different populations. Frequencies of common polymorphisms in our populations were comparable to those of similar populations in HAPMAP or Perlegen. Nucleotide diversity indices indicated that the coding region of MRP1 may have undergone negative selection or recent population expansion. SNPs E10/1299 G>T (R433S and E16/2012 G>T (G671V which occur at low frequency in only one or two of four populations examined were predicted to be functionally deleterious and hence are likely to be under negative selection. Conclusion Through in silico approaches, we identified two rare SNPs that are potentially negatively selected. These SNPs may be useful for studies associating this gene with rare events including adverse drug reactions.

  20. Forensically informative nucleotide sequencing (FINS) for the first time authentication of Indian Varanus species: implication in wildlife forensics and conservation.

    Science.gov (United States)

    Rajpoot, Ankita; Kumar, Ved Prakash; Bahuguna, Archana; Kumar, Dhyanendra

    2017-11-01

    Monitor lizards are Varanus species widely distributed, endangered reptile in the IUCN red data list. In India, based on the morphological and ecological characteristic, it is divided into four species viz. Bengal monitor lizard, Yellow monitor lizard, Desert monitor lizard and Water monitor lizard. These four species listed as Schedule I species in Indian Wildlife (Protection) Act 1972. This paper first attempt to present Forensically Informative Nucleotide Sequencing (FINS) for the Indian Varanus based on three mitochondrial genes. The molecular framework will be useful for the identification of Indian Varanus species and trade products derived from monitors and as such, have important applications for wildlife management and conservation. Here, we used known 14 individual skin pieces of four species of monitor lizards; the partial fragment of three mitochondrial genes (Cyt b, 12S rRNA, and 16S rRNA) were amplified for genetic study. In Cyt b, 12S rRNA and 16s rRNA, we observed, 5, 5 and 4 Haplotypes; 71, 69, and 43 Variables sites; 90, 89, and 50 Parsimony Informative sites within four species of Indian monitor lizards, respectively. Despite it, the nucleotide composition was T 26.4, C 32.8, A 29.2 and G11.6; T 18.8, C 29.7, A 34.0 and G 17.5; T 21.7, C 27.3, A 32.5 and G 18.5 in Cyt b, 12S rRNA and 16S rRNA, respectively. The neighbor joining phylogenetic tree and maximum parsimony tree of three mitochondrial genes, showed similar results and reveal that, there are two major clades are present in Indian monitor lizards.

  1. Inhibition of Cell Growth and Shoot Development by a Specific Nucleotide Sequence in a Noncoding Viroid RNA

    Science.gov (United States)

    Qi, Yijun; Ding, Biao

    2003-01-01

    Viroids are small noncoding and infectious RNAs that replicate autonomously and move systemically throughout an infected plant. The RNAs of the family Pospiviroidae contain a central conserved region (CCR) that has long been thought to be involved in replication. Here, we report that the CCR of Potato spindle tuber viroid (PSTVd) also plays a role in pathogenicity. A U257A change in the CCR converted the intermediate strain PSTVdInt to a lethal strain that caused severe growth stunting and premature death of infected plants. PSTVd with nucleotide U257 changed to C or G did not cause such symptoms. The pathogenic effect of the U257A substitution was abolished by a C259U substitution in the same RNA. Analyses of the pathogenic effects of the U257A substitution in three other PSTVd variants established A257 as a new pathogenicity determinant that functions independently and synergistically with the classic pathogenicity domain. The U257A substitution did not alter PSTVd secondary structure, replication levels, or tissue tropism. The stunted growth of PSTVdIntU257A-infected tomato plants resulted from restricted cell expansion but not cell division or differentiation. This was correlated positively with the downregulated expression of an expansin gene, LeExp2. Our results demonstrate that specific nucleotides in a noncoding, pathogenic RNA have a profound effect in altering distinct cellular responses, which then lead to well-defined alterations in plant growth and developmental patterns. The feasibility of correlating viroid RNA sequence/structure with the altered expression of specific host genes, cellular processes, and developmental patterns makes viroid infection a valuable system in which to investigate host factors for symptom expression and perhaps also to characterize the mechanisms of RNA regulation of gene expression in plants. PMID:12782729

  2. Neuropeptidergic Signaling in the American Lobster Homarus americanus: New Insights from High-Throughput Nucleotide Sequencing.

    Directory of Open Access Journals (Sweden)

    Andrew E Christie

    Full Text Available Peptides are the largest and most diverse class of molecules used for neurochemical communication, playing key roles in the control of essentially all aspects of physiology and behavior. The American lobster, Homarus americanus, is a crustacean of commercial and biomedical importance; lobster growth and reproduction are under neuropeptidergic control, and portions of the lobster nervous system serve as models for understanding the general principles underlying rhythmic motor behavior (including peptidergic neuromodulation. While a number of neuropeptides have been identified from H. americanus, and the effects of some have been investigated at the cellular/systems levels, little is currently known about the molecular components of neuropeptidergic signaling in the lobster. Here, a H. americanus neural transcriptome was generated and mined for sequences encoding putative peptide precursors and receptors; 35 precursor- and 41 receptor-encoding transcripts were identified. We predicted 194 distinct neuropeptides from the deduced precursor proteins, including members of the adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin C, bursicon, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone (CHH, CHH precursor-related peptide, diuretic hormone 31, diuretic hormone 44, eclosion hormone, FLRFamide, GSEFLamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, proctolin, pyrokinin, SIFamide, sulfakinin and tachykinin-related peptide families. While some of the predicted peptides are known H. americanus isoforms, most are novel identifications, more than doubling the extant lobster neuropeptidome. The deduced receptor proteins are the first descriptions of H. americanus neuropeptide receptors, and include ones for most of the peptide groups mentioned earlier, as well as those for ecdysis-triggering hormone, red pigment

  3. The κB transcriptional enhancer motif and signal sequences of V(DJ recombination are targets for the zinc finger protein HIVEP3/KRC: a site selection amplification binding study

    Directory of Open Access Journals (Sweden)

    Wu Lai-Chu

    2002-08-01

    Full Text Available Abstract Background The ZAS family is composed of proteins that regulate transcription via specific gene regulatory elements. The amino-DNA binding domain (ZAS-N and the carboxyl-DNA binding domain (ZAS-C of a representative family member, named κB DNA binding and recognition component (KRC, were expressed as fusion proteins and their target DNA sequences were elucidated by site selection amplification binding assays, followed by cloning and DNA sequencing. The fusion proteins-selected DNA sequences were analyzed by the MEME and MAST computer programs to obtain consensus motifs and DNA elements bound by the ZAS domains. Results Both fusion proteins selected sequences that were similar to the κB motif or the canonical elements of the V(DJ recombination signal sequences (RSS from a pool of degenerate oligonucleotides. Specifically, the ZAS-N domain selected sequences similar to the canonical RSS nonamer, while ZAS-C domain selected sequences similar to the canonical RSS heptamer. In addition, both KRC fusion proteins selected oligonucleoties with sequences identical to heptamer and nonamer sequences within endogenous RSS. Conclusions The RSS are cis-acting DNA motifs which are essential for V(DJ recombination of antigen receptor genes. Due to its specific binding affinity for RSS and κB-like transcription enhancer motifs, we hypothesize that KRC may be involved in the regulation of V(DJ recombination.

  4. Complete nucleotide sequence of a South African isolate of Grapevine fanleaf virus and its associated satellite RNA.

    Science.gov (United States)

    Lamprecht, Renate L; Spaltman, Monique; Stephan, Dirk; Wetzel, Thierry; Burger, Johan T

    2013-07-17

    The complete sequences of RNA1, RNA2 and satellite RNA have been determined for a South African isolate of Grapevine fanleaf virus (GFLV-SACH44). The two RNAs of GFLV-SACH44 are 7,341 nucleotides (nt) and 3,816 nt in length, respectively, and its satellite RNA (satRNA) is 1,104 nt in length, all excluding the poly(A) tail. Multiple sequence alignment of these sequences showed that GFLV-SACH44 RNA1 and RNA2 were the closest to the South African isolate, GFLV-SAPCS3 (98.2% and 98.6% nt identity, respectively), followed by the French isolate, GFLV-F13 (87.3% and 90.1% nt identity, respectively). Interestingly, the GFLV-SACH44 satRNA is more similar to three Arabis mosaic virus satRNAs (85%-87.4% nt identity) than to the satRNA of GFLV-F13 (81.8% nt identity) and was most distantly related to the satRNA of GFLV-R2 (71.0% nt identity). Full-length infectious clones of GFLV-SACH44 satRNA were constructed. The infectivity of the clones was tested with three nepovirus isolates, GFLV-NW, Arabis mosaic virus (ArMV)-NW and GFLV-SAPCS3. The clones were mechanically inoculated in Chenopodium quinoa and were infectious when co-inoculated with the two GFLV helper viruses, but not when co-inoculated with ArMV-NW.

  5. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

    Science.gov (United States)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-07-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

  6. Identities among actin-encoding cDNAs of the Nile tilapia (Oreochromis niloticus and other eukaryote species revealed by nucleotide and amino acid sequence analyses

    Directory of Open Access Journals (Sweden)

    Andréia B. Poletto

    2008-01-01

    Full Text Available Actin-encoding cDNAs of Nile tilapia (Oreochromis niloticus were isolated by RT-PCR using total RNA samples of different tissues and further characterized by nucleotide sequencing and in silico amino acid (aa sequence analysis. Comparisons among the actin gene sequences of O. niloticus and those of other species evidenced that the isolated genes present a high similarity to other fish and other vertebrate actin genes. The highest nucleotide resemblance was observed between O. niloticus and O. mossambicus a-actin and b-actin genes. Analysis of the predicted aa sequences revealed two distinct types of cytoplasmic actins, one cardiac muscle actin type and one skeletal muscle actin type that were expressed in different tissues of Nile tilapia. The evolutionary relationships between the Nile tilapia actin genes and diverse other organisms is discussed.

  7. The role of nucleotide sequence in the immune-active structure photochemically induced in double-stranded DNA by ultraviolet irradiation

    International Nuclear Information System (INIS)

    Wakizaka, Akira; Okuhara, Eiji

    1982-01-01

    Pyrimidine, purine, and mixed sequence oligonucleotides from ultraviolet-irradiated DNA were tested for their inhibitory activities on the interaction of [ 3 H]ultraviolet-irradiated DNA with its antibody raised in rabbit. Thymine dimer containing pyrimidine oligonucleotides from irradiated DNA failed to inhibit the interaction, while mixed sequence oligonucleotides, especially those with 8 or more nucleotides, exhibited potent inhibition. Purine clusters from irradiated DNA and mixed sequence oligomers from unirradiated DNA showed no inhibition. Dimerized thymine, which appears to be a critical part of the antigenic determinant, did not inhibit the interaction by itself. The same observations were made for ultraviolet-irradiated thymidine and thymidylic acid. The results suggest that a structure composed of a mixed pyrimidine and purine sequence with a certain chain length seems to be essential for the antigenicity induced in the irradiated DNA. On this nucleotide chain backbone, photochemically modified bases (mostly thymine dimer) can form an immune-active structure. (author)

  8. A resource of single-nucleotide polymorphisms for rainbow trout generated by restriction-site associated DNA sequencing of doubled haploids

    Science.gov (United States)

    Salmonid genomes are considered to be in a pseudo-tetraploid state as a result of an evolutionarily recent genome duplication event. This situation complicates single nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and ...

  9. Cloning of the gene encoding Streptococcin A-FF22, a novel lantibiotic produced by Streptococcus pyogenes, and determination of its nucleotide sequence.

    OpenAIRE

    Hynes, W L; Ferretti, J J; Tagg, J R

    1993-01-01

    Streptococcin A-FF22 (SA-FF22) is a lantibiotic produced by Streptococcus pyogenes FF22. The nucleotide sequence of the SA-FF22 structural gene (scnA) was determined and shown to encode a 51-amino-acid prepeptide. The proteolytic processing site of the SA-FF22 prepeptide differs from that which characterizes other type A lantibiotics.

  10. Coding assignment and nucleotide sequence of simian rotavirus SA11 gene segment 10: location of glycosylation sites suggests that the signal peptide is not cleaved.

    OpenAIRE

    Both, G W; Siegman, L J; Bellamy, A R; Atkinson, P H

    1983-01-01

    A cloned DNA copy of simian rotavirus SA11 genomic segment 10 was used to confirm the assignment of the nonstructural glycoprotein NCVP5 to this gene. Determination of the nucleotide sequence for gene 10 indicated that NCVP5 is 175 amino acids in length and has an N-terminal hydrophobic region with the characteristics of a signal sequence for membrane translocation. Unexpectedly, this region was also the location for the only two potential glycosylation sites within the molecule, asparagine r...

  11. Complete Nucleotide Sequence and Organization of the Atrazine Catabolic Plasmid pADP-1 from Pseudomonas sp. Strain ADP

    Science.gov (United States)

    Martinez, Betsy; Tomkins, Jeffrey; Wackett, Lawrence P.; Wing, Rod; Sadowsky, Michael J.

    2001-01-01

    The complete 108,845-nucleotide sequence of catabolic plasmid pADP-1 from Pseudomonas sp. strain ADP was determined. Plasmid pADP-1 was previously shown to encode AtzA, AtzB, and AtzC, which catalyze the sequential hydrolytic removal of s-triazine ring substituents from the herbicide atrazine to yield cyanuric acid. Computational analyses indicated that pADP-1 encodes 104 putative open reading frames (ORFs), which are predicted to function in catabolism, transposition, and plasmid maintenance, transfer, and replication. Regions encoding transfer and replication functions of pADP-1 had 80 to 100% amino acid sequence identity to pR751, an IncPβ plasmid previously isolated from Enterobacter aerogenes. pADP-1 was shown to contain a functional mercury resistance operon with 99% identity to Tn5053. Complete copies of transposases with 99% amino acid sequence identity to TnpA from IS1071 and TnpA from Pseudomonas pseudoalcaligenes were identified and flank each of the atzA, atzB, and atzC genes, forming structures resembling nested catabolic transposons. Functional analyses identified three new catabolic genes, atzD, atzE, and atzF, which participate in atrazine catabolism. Crude extracts from Escherichia coli expressing AtzD hydrolyzed cyanuric acid to biuret. AtzD showed 58% amino acid sequence identity to TrzD, a cyanuric acid amidohydrolase, from Pseudomonas sp. strain NRRLB-12227. Two other genes encoding the further catabolism of cyanuric acid, atzE and atzF, reside in a contiguous cluster adjacent to a potential LysR-type transcriptional regulator. E. coli strains bearing atzE and atzF were shown to encode a biuret hydrolase and allophanate hydrolase, respectively. atzDEF are cotranscribed. AtzE and AtzF are members of a common amidase protein family. These data reveal the complete structure of a catabolic plasmid and show that the atrazine catabolic genes are dispersed on three disparate regions of the plasmid. These results begin to provide insight into how

  12. Nucleotide sequence and characterization of toxR: a gene involved in exotoxin A regulation in Pseudomonas aeruginosa.

    Science.gov (United States)

    Wozniak, D J; Cram, D C; Daniels, C J; Galloway, D R

    1987-01-01

    We have previously reported the discovery and subsequent cloning of a regulatory gene, designated toxR, which appears to regulate the expression of the exotoxin A (ETA) structural gene toxA. Subsequent work by this laboratory has resulted in the subcloning of the toxR gene and its transfer to a high copy number plasmid (pGW28). Functional analysis of the toxR gene using a Tn5 insertion along with toxR deletions indicates that inactivation of toxR results in a dramatic reduction of ETA production. Nucleotide sequence analysis of pGW28 has revealed a 675 bp major open reading frame (225 codons) which could encode for a protein of 24,626 daltons. Using S1 nuclease mapping, the toxR RNA transcript has been shown to originate 20 bp upstream of the presumptive translation initiation codon. Experiments using a toxA specific probe have revealed the the toxR gene product appears to regulate the expression of ETA at the transcriptional level. Images PMID:3031589

  13. Empirical Comparison of Simple Sequence Repeats and Single Nucleotide Polymorphisms in Assessment of Maize Diversity and Relatedness

    Science.gov (United States)

    Hamblin, Martha T.; Warburton, Marilyn L.; Buckler, Edward S.

    2007-01-01

    While Simple Sequence Repeats (SSRs) are extremely useful genetic markers, recent advances in technology have produced a shift toward use of single nucleotide polymorphisms (SNPs). The different mutational properties of these two classes of markers result in differences in heterozygosities and allele frequencies that may have implications for their use in assessing relatedness and evaluation of genetic diversity. We compared analyses based on 89 SSRs (primarily dinucleotide repeats) to analyses based on 847 SNPs in individuals from the same 259 inbred maize lines, which had been chosen to represent the diversity available among current and historic lines used in breeding. The SSRs performed better at clustering germplasm into populations than did a set of 847 SNPs or 554 SNP haplotypes, and SSRs provided more resolution in measuring genetic distance based on allele-sharing. Except for closely related pairs of individuals, measures of distance based on SSRs were only weakly correlated with measures of distance based on SNPs. Our results suggest that 1) large numbers of SNP loci will be required to replace highly polymorphic SSRs in studies of diversity and relatedness and 2) relatedness among highly-diverged maize lines is difficult to measure accurately regardless of the marker system. PMID:18159250

  14. Genetic differentiation between fake abalone and genuine Haliotis species using the forensically informative nucleotide sequencing (FINS) method.

    Science.gov (United States)

    Ha, Wai Y; Reid, David G; Kam, Wan L; Lau, Yuk Y; Sham, Wing C; Tam, Silvia Y K; Sin, Della W M; Mok, Chuen S

    2011-05-25

    Abalones ( Haliotis species) are a popular delicacy and commonly preserved in dried form either whole or in slices or small pieces for consumption in Asian countries. Driven by the huge profit from trading abalones, dishonest traders may substitute other molluscan species for processed abalone, of which the morphological characteristics are frequently lost in the processed form. For protection of consumer rights and law enforcement against fraud, there is a need for an effective methodology to differentiate between fake and genuine abalone. This paper describes a method (validated according to the international forensic guidelines provided by SWGDAM) for the identification of fake abalone species using forensically informative nucleotide sequence (FINS) analysis. A study of the local market revealed that many claimed "abalone slice" samples on sale are not genuine. The fake abalone samples were found to be either volutids of the genus Cymbium (93%) or the muricid Concholepas concholepas (7%). This is the first report of Cymbium species being used for the preparation and sale as "abalone" in dried sliced form in Hong Kong.

  15. Development of Prevotella intermedia-specific PCR primers based on the nucleotide sequences of a DNA probe Pig27.

    Science.gov (United States)

    Kim, Min Jung; Hwang, Kyung Hwan; Lee, Young-Seok; Park, Jae-Yoon; Kook, Joong-Ki

    2011-03-01

    The aim of this study was to develop Prevotella intermedia-specific PCR primers based on the P. intermedia-specific DNA probe. The P. intermedia-specific DNA probe was screened by inverted dot blot hybridization and confirmed by Southern blot hybridization. The nucleotide sequences of the species-specific DNA probes were determined using a chain termination method. Southern blot analysis showed that the DNA probe, Pig27, detected only the genomic DNA of P. intermedia strains. PCR showed that the PCR primers, Pin-F1/Pin-R1, had species-specificity for P. intermedia. The detection limits of the PCR primer sets were 0.4pg of the purified genomic DNA of P. intermedia ATCC 49046. These results suggest that the PCR primers, Pin-F1/Pin-R1, could be useful in the detection of P. intermedia as well as in the development of a PCR kit in epidemiological studies related to periodontal diseases. Crown Copyright © 2010. Published by Elsevier B.V. All rights reserved.

  16. Nucleotide sequence analysis of a human monoclonal antibody TONO-1 with cytotoxic potential for T-leukemia/lymphoma cells.

    Science.gov (United States)

    Numasaki, M; Nakamura, K; Fukuoka, Y; Saeki, H; Hanai, N; Kudo, T

    2001-01-15

    A human monoclonal antibody (HuMab) TONO-1 (IgM, lambda) recognizes cell surface antigens associated primarily with human T-leukemia/lymphoma cells. In this study, we investigated the reactivity against T-leukemia/lymphoma cells in detail, cytotoxic potential and primary nucleotide and deduced amino acid sequences of the rearranged heavy and light chains of the HuMab TONO-1. Expression of the molecules (TONO-1 Ags) detected by a HuMab TONO-1 was significantly heterogeneous even in the same T-leukemia/lymphoma cell lines HPB-MLT and MOLT-4F. The flow cytometric curves showed an unusual broad-based spread of fluorescence intensity. HuMab TONO-1 was shown to have the ability to kill the T-leukernia/lymphoma cells efficiently in the presence of rabbit complements. However, HuMab TONO-1 did not demonstrate significant antibody-dependent cellular cytotoxic activity. Furthermore, HuMab TONO-1 heavy and light chain variable regions were cloned, sequenced and analyzed. HuMab TONO-1 uses a V(H) gene member of the V(H)IV gene family V(H)71-4, and is productively rearranged with the germ line D(H) gene D(XP')1, and the germ line J(H)5 gene with multiple somatic mutations. HuMab TONO-1 Vlambda belongs to the lambda light chain variable subgroup I family and is derived from the Vlambdalc germ line gene Humlv1042, and germ line gene Jlambda1 without somatic mutations. The results reveal that the production of HuMab TONO-1, with cytotoxic potential for human T-leukemia/lymphoma cells, is achieved by rearrangement of the V(H)71-4/Humlv1042 germ line variable region gene combination, that is associated with the autoimmune repertoire.

  17. Analysis of mitochondrial control region nucleotide sequences from Baffin Bay beluga, (Delphinapterus leucas: detecting pods or sub-populations?

    Directory of Open Access Journals (Sweden)

    Per Jakob Palsbøll

    2002-07-01

    Full Text Available We report the results of an analysis of the variation in the nucleotide sequence of the mitochondrial control region obtained in 218 samples collected from belugas, Delphinapterus leucas, around the Baffin Bay. We detected multiple instances of significant heterogeneity in the distribution of genetic variation among the analyzed mitochondrial control region sequences on a spatial as well as temporal scale indicating a high degree of maternal population structure. The detection of significant levels of heterogeneity between samples collected in different years but within the same area and season was unexpected. Re-examination of earlier results presented by Brown Gladden and coworkers also revealed temporal genetic heterogeneity within the one area where sufficient (n>15 samples were collected in multiple years. These findings suggest that non-random breeding and maternally directed site-fidelity are not the sole causes of genetic heterogeneity among belugas but that a matrilineal pod structure might cause significant levels of genetic heterogeneity as well, even within the same area. We propose that a maternal pod structure, which has been shown to be the cause of significant genetic heterogeneity in other odontocetes, may add to the overall level of heterogeneity in the maternally inherited DNA and hence that much of the spatial heterogeneity observed in this and previous studies might be attributed to pod rather than population structure. Our findings suggest that it is important to estimate the contribution of pod structure to overall heterogeneity before defining populations or management units in order to avoid interpreting heterogeneity due to sampling of different pods as different populations/management units.

  18. Nucleotide sequence of an external transcribed spacer in Xenopus laevis rDNA: sequences flanking the 5' and 3' ends of 18S rRNA are non-complementary.

    OpenAIRE

    Maden, B E; Moss, M; Salim, M

    1982-01-01

    We have sequenced the external transcribed spacer (ETS) of a ribosomal transcription unit from Xenopus laevis, together with sections of the preceding non-transcribed spacer. Our analysis was carried out on the same cloned transcription unit as that from which the internal transcribed spacers (ITS) were previously sequenced. The ETS is approximately 712 nucleotides long and, like the ITS regions, is generally very rich in C plus G. Features of the sequence include an excess of oligo-C tracts ...

  19. F-Type Lectins: A Highly Diversified Family of Fucose-Binding Proteins with a Unique Sequence Motif and Structural Fold, Involved in Self/Non-Self-Recognition

    Directory of Open Access Journals (Sweden)

    Gerardo R. Vasta

    2017-11-01

    Full Text Available The F-type lectin (FTL family is one of the most recent to be identified and structurally characterized. Members of the FTL family are characterized by a fucose recognition domain [F-type lectin domain (FTLD] that displays a novel jellyroll fold (“F-type” fold and unique carbohydrate- and calcium-binding sequence motifs. This novel lectin family comprises widely distributed proteins exhibiting single, double, or greater multiples of the FTLD, either tandemly arrayed or combined with other structurally and functionally distinct domains, yielding lectin subunits of pleiotropic properties even within a single species. Furthermore, the extraordinary variability of FTL sequences (isoforms that are expressed in a single individual has revealed genetic mechanisms of diversification in ligand recognition that are unique to FTLs. Functions of FTLs in self/non-self-recognition include innate immunity, fertilization, microbial adhesion, and pathogenesis, among others. In addition, although the F-type fold is distinctive for FTLs, a structure-based search revealed apparently unrelated proteins with minor sequence similarity to FTLs that displayed the FTLD fold. In general, the phylogenetic analysis of FTLD sequences from viruses to mammals reveals clades that are consistent with the currently accepted taxonomy of extant species. However, the surprisingly discontinuous distribution of FTLDs within each taxonomic category suggests not only an extensive structural/functional diversification of the FTLs along evolutionary lineages but also that this intriguing lectin family has been subject to frequent gene duplication, secondary loss, lateral transfer, and functional co-option.

  20. Enzyme-Linked Electrochemical Detection of PCR-Amplified Nucleotide Sequences Using Disposable Screen-Printed Sensors. Applications in Gene Expression Monitoring

    Directory of Open Access Journals (Sweden)

    Miroslav Fojta

    2008-01-01

    Full Text Available Electrochemical enzyme-linked techniques for sequence-specific DNA sensingare presented. These techniques are based on attachment of streptavidin-alkalinephosphatase conjugate to biotin tags tethered to DNA immobilized at the surface ofdisposable screen-printed carbon electrodes (SPCE, followed by production andelectrochemical determination of an electroactive indicator, 1-naphthol. Via hybridizationof SPCE surface-confined target DNAs with end-biotinylated probes, highly specificdiscrimination between complementary and non-complementary nucleotide sequences wasachieved. The enzyme-linked DNA hybridization assay has been successfully applied inanalysis of PCR-amplified real genomic DNA sequences, as well as in monitoring of planttissue-specific gene expression. In addition, we present an alternative approach involvingsequence-specific incorporation of biotin-labeled nucleotides into DNA by primerextension. Introduction of multiple biotin tags per probe primer resulted in considerableenhancement of the signal intensity and improvement of the specificity of detection.

  1. cWords - systematic microRNA regulatory motif discovery from mRNA expression data

    DEFF Research Database (Denmark)

    Rasmussen, Simon Horskjær; Jacobsen, Anders; Krogh, Anders

    2013-01-01

    BACKGROUND:Post-transcriptional regulation of gene expression by small RNAs and RNA binding proteins is of fundamental importance in development of complex organisms, and dysregulation of regulatory RNAs can influence onset, progression and potentially be target for treatment of many diseases. Post...... increasingly important tools for the identification of post-transcriptional regulatory motifs and the inference of the regulators and their targets. RESULTS:cWords is a method designed for regulatory motif discovery in differential case-control mRNA expression datasets. We have improved the algorithms......-transcriptional regulation by small RNAs is mediated through partial complementary binding to messenger RNAs leaving nucleotide signatures or motifs throughout the entire transcriptome. Computational methods for discovery and analysis of sequence motifs in high-throughput mRNA expression profiling experiments are becoming...

  2. Complete nucleotide sequence and analysis of two conjugative broad host range plasmids from a marine microbial biofilm.

    Directory of Open Access Journals (Sweden)

    Peter Norberg

    Full Text Available The complete nucleotide sequence of plasmids pMCBF1 and pMCBF6 was determined and analyzed. pMCBF1 and pMCBF6 form a novel clade within the IncP-1 plasmid family designated IncP-1 ς. The plasmids were exogenously isolated earlier from a marine biofilm. pMCBF1 (62 689 base pairs; bp and pMCBF6 (66 729 bp have identical backbones, but differ in their mercury resistance transposons. pMCBF1 carries Tn5053 and pMCBF6 carries Tn5058. Both are flanked by 5 bp direct repeats, typical of replicative transposition. Both insertions are in the vicinity of a resolvase gene in the backbone, supporting the idea that both transposons are "res-site hunters" that preferably insert close to and use external resolvase functions. The similarity of the backbones indicates recent insertion of the two transposons and the ongoing dynamics of plasmid evolution in marine biofilms. Both plasmids also carry the insertion sequence ISPst1, albeit without flanking repeats. ISPs1is located in an unusual site within the control region of the plasmid. In contrast to most known IncP-1 plasmids the pMCBF1/pMCBF6 backbone has no insert between the replication initiation gene (trfA and the vegetative replication origin (oriV. One pMCBF1/pMCBF6 block of about 2.5 kilo bases (kb has no similarity with known sequences in the databases. Furthermore, insertion of three genes with similarity to the multidrug efflux pump operon mexEF and a gene from the NodT family of the tripartite multi-drug resistance-nodulation-division (RND system in Pseudomonas aeruginosa was found. They do not seem to confer antibiotic resistance to the hosts of pMCBF1/pMCBF6, but the presence of RND on promiscuous plasmids may have serious implications for the spread of antibiotic multi-resistance.

  3. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  4. Insertion sequence element single nucleotide polymorphism typing provides insights into the population structure and evolution of Mycobacterium ulcerans across Africa.

    Science.gov (United States)

    Vandelannoote, Koen; Jordaens, Kurt; Bomans, Pieter; Leirs, Herwig; Durnez, Lies; Affolabi, Dissou; Sopoh, Ghislain; Aguiar, Julia; Phanzu, Delphin Mavinga; Kibadi, Kapay; Eyangoh, Sara; Manou, Louis Bayonne; Phillips, Richard Odame; Adjei, Ohene; Ablordey, Anthony; Rigouts, Leen; Portaels, Françoise; Eddyani, Miriam; de Jong, Bouke C

    2014-02-01

    Buruli ulcer is an indolent, slowly progressing necrotizing disease of the skin caused by infection with Mycobacterium ulcerans. In the present study, we applied a redesigned technique to a vast panel of M. ulcerans disease isolates and clinical samples originating from multiple African disease foci in order to (i) gain fundamental insights into the population structure and evolutionary history of the pathogen and (ii) disentangle the phylogeographic relationships within the genetically conserved cluster of African M. ulcerans. Our analyses identified 23 different African insertion sequence element single nucleotide polymorphism (ISE-SNP) types that dominate in different areas where Buruli ulcer is endemic. These ISE-SNP types appear to be the initial stages of clonal diversification from a common, possibly ancestral ISE-SNP type. ISE-SNP types were found unevenly distributed over the greater West African hydrological drainage basins. Our findings suggest that geographical barriers bordering the basins to some extent prevented bacterial gene flow between basins and that this resulted in independent focal transmission clusters associated with the hydrological drainage areas. Different phylogenetic methods yielded two well-supported sister clades within the African ISE-SNP types. The ISE-SNP types from the "pan-African clade" were found to be widespread throughout Africa, while the ISE-SNP types of the "Gabonese/Cameroonian clade" were much rarer and found in a more restricted area, which suggested that the latter clade evolved more recently. Additionally, the Gabonese/Cameroonian clade was found to form a strongly supported monophyletic group with Papua New Guinean ISE-SNP type 8, which is unrelated to other Southeast Asian ISE-SNP types.

  5. A 1204-single nucleotide polymorphism and insertion-deletion polymorphism panel for massively parallel sequencing analysis of DNA mixtures.

    Science.gov (United States)

    Hwa, Hsiao-Lin; Chung, Wan-Chia; Chen, Pei-Lung; Lin, Chih-Peng; Li, Huei-Ying; Yin, Hsiang-I; Lee, James Chun-I

    2018-01-01

    Massively parallel sequencing (MPS) technology enables the simultaneous analysis of a huge number of single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (indels). MPS also enables the detection of the alleles of minor contributors in a highly unbalanced DNA mixture. In this study, we established a 1204-marker panel optimized for MPS consisting of 987 autosomal markers (964 SNPs and 23 indels), 27 X-chromosome SNPs, 61 Y-chromosome markers (56 SNPs and 5 indels), and 129 mitochondrial SNPs. The DNA samples of six unrelated individuals (two men and four women), 26 nondegraded DNA mixtures (with minor to major ratios of 1:29, 1:39, 1:79, and 1:99), and eight highly artificially degraded DNA mixtures (with minor to major ratios of 1:29, 1:39, 1:79, and 1:99) were analyzed through MPS by using the panel. A scoring system was developed to determine the minor contributors in DNA mixtures based on the genotypes identified using MPS. The genotypes of the 1204 markers were successfully profiled through MPS by using the custom-designed panel. The efficiency of MPS for analyzing these highly degraded samples was lower than that for analyzing nondegraded samples. All minor contributors in the 26 nondegraded and 8 degraded DNA mixtures were accurately assigned using this scoring system based on 964 autosomal SNPs. An association between the observed reads ratio and theoretical ratio of the minor component was noted for nondegraded mixtures. In conclusion, we established a 1204-marker individual identification panel for MPS that successfully analyzed autosomal, X-chromosome, Y-chromosome, and mitochondrial SNPs and indels simultaneously. In combination with the newly developed scoring system, the panel can accurately identify minor contributors in nondegraded and highly degraded DNA mixtures. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Complete nucleotide sequence of Bacillus subtilis (natto) bacteriophage PM1, a phage associated with disruption of food production.

    Science.gov (United States)

    Umene, Kenichi; Shiraishi, Atsushi

    2013-06-01

    "Natto", considered a traditional food, is made by fermenting boiled soybeans with Bacillus subtilis (natto), which is a natto-producing strain related to B. subtilis. The production of natto is disrupted by phage infections of B. subtilis (natto); hence, it is necessary to control phage infections. PM1, a phage of B. subtilis (natto), was isolated during interrupted natto production in a factory. In a previous study, PM1 was classified morphologically into the family Siphoviridae, and its genome, comprising approximately 50 kbp of linear double-stranded DNA, was assumed to be circularly permuted. In the present study, the complete nucleotide sequence of the PM1 genomic DNA of 50,861 bp (41.3 %G+C) was determined, and 86 open reading frames (ORFs) were deduced. Forty-one ORFs of PM1 shared similarities with proteins deduced from the genome of phages reported so far. Twenty-three ORFs of PM1 were associated with functions related to the phage multiplication process of gene control, DNA replication/modification, DNA packaging, morphogenesis, and cell lysis. Bacillus subtilis (natto) produces a capsular polypeptide of glutamate with a γ-linkage (called poly-γ-glutamate), which appears to serve as a physical barrier to phage adsorption. One ORF of PM1 had similarity with a poly-γ-glutamate hydrolase, which is assumed to degrade the capsular barrier to allow phage progenies to infect encapsulated host cells. The genome analysis of PM1 revealed the characteristics of the phage that are consistent as Bacillus subtilis (natto)-infecting phage.

  7. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage...

  8. Identification of cyclic nucleotide gated channels using regular expressions

    KAUST Repository

    Zelman, Alice K.

    2013-09-03

    Cyclic nucleotide-gated channels (CNGCs) are nonselective cation channels found in plants, animals, and some bacteria. They have a six-transmembrane/one- pore structure, a cytosolic cyclic nucleotide-binding domain, and a cytosolic calmodulin-binding domain. Despite their functional similarities, the plant CNGC family members appear to have different conserved amino acid motifs within corresponding functional domains than animal and bacterial CNGCs do. Here we describe the development and application of methods employing plant CNGC-specific sequence motifs as diagnostic tools to identify novel candidate channels in different plants. These methods are used to evaluate the validity of annotations of putative orthologs of CNGCs from plant genomes. The methods detail how to employ regular expressions of conserved amino acids in functional domains of annotated CNGCs and together with Web tools such as PHI-BLAST and ScanProsite to identify novel candidate CNGCs in species including Physcomitrella patens. © Springer Science+Business Media New York 2013.

  9. Single nucleotide polymorphism discovery in cutthroat trout subspecies using genome reduction, barcoding, and 454 pyro-sequencing

    Directory of Open Access Journals (Sweden)

    Houston Derek D

    2012-12-01

    Full Text Available Abstract Background Salmonids are popular sport fishes, and as such have been subjected to widespread stocking throughout western North America. Historically, stocking was done with little regard for genetic variation among populations and has resulted in genetic mixing among species and subspecies in many areas, thus putting the genetic integrity of native salmonid populations at risk and creating a need to assess the genetic constitution of native salmonid populations. Cutthroat trout is a salmonid species with pronounced geographic structure (there are 10 extant subspecies and a recent history of hybridization with introduced rainbow trout in many populations. Genetic admixture has also occurred among cutthroat trout subspecies in areas where introductions have brought two or more subspecies into contact. Consequently, management agencies have increased their efforts to evaluate the genetic composition of cutthroat trout populations to identify populations that remain uncompromised and manage them accordingly, but additional genetic markers are needed to do so effectively. Here we used genome reduction, MID-barcoding, and 454-pyrosequencing to discover single nucleotide polymorphisms that differentiate cutthroat trout subspecies and can be used as a rapid, cost-effective method to characterize the genetic composition of cutthroat trout populations. Results Thirty cutthroat and six rainbow trout individuals were subjected to genome reduction and next-generation sequencing. A total of 1,499,670 reads averaging 379 base pairs in length were generated by 454-pyrosequencing, resulting in 569,060,077 total base pairs sequenced. A total of 43,558 putative SNPs were identified, and of those, 125 SNP primers were developed that successfully amplified 96 cutthroat trout and rainbow trout individuals. These SNP loci were able to differentiate most cutthroat trout subspecies using distance methods and Structure analyses. Conclusions Genomic and

  10. Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

    DEFF Research Database (Denmark)

    Poulsen, Peter; Jensen, Kaj Frank; Valentin-Hansen, Poul

    1983-01-01

    Orotate phosphoribosyltransferase (EC 2.4.2.10) was purified to electrophoretic homogeneity from a strain of Escherichia coli containing the pyrE gene cloned on a multicopy plasmid. The relative molecular masses (Mr) of the native enzyme and its subunit were estimated by means of gel filtration....... From the results the following conclusions may be drawn. Orotate phosphoribosyltransferase is a dimeric protein with subunits of Mr 23 326 consisting of 211 amino acid residues. The pyrE gene is transcribed in a counter-clockwise direction from the E. coli chromosome as an mRNA with a considerable...... and electrophoresis in the presence of dodecyl sulfate. The amino acid sequences at the N and C termini, as well as the amino acid composition, were determined. The nucleotide sequence of the structural pyrE gene, including 394 nucleotide residues preceding the beginning of the coding frame, was also established...

  11. Genotyping of human parvovirus B19 in clinical samples from Brazil and Paraguay using heteroduplex mobility assay, single-stranded conformation polymorphism and nucleotide sequencing

    Directory of Open Access Journals (Sweden)

    Marcos César Lima de Mendonça

    2011-06-01

    Full Text Available Heteroduplex mobility assay, single-stranded conformation polymorphism and nucleotide sequencing were utilised to genotype human parvovirus B19 samples from Brazil and Paraguay. Ninety-seven serum samples were collected from individuals presenting with abortion or erythema infectiosum, arthropathies, severe anaemia and transient aplastic crisis; two additional skin samples were collected by biopsy. After the procedure, all clinical samples were classified as genotype 1.

  12. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

    DEFF Research Database (Denmark)

    Foulk, M. S.; Urban, J. M.; Casella, Cinzia

    2015-01-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (lambda-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent...... strands intact. We used genomics and biochemical approaches to determine if lambda-exo digests all parental DNA sequences equally. We report that lambda-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, lambda-exo digestion of nonreplicating genomic DNA (LexoG0) enriches...... GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent lambda-exo biases in NSseq and validated this approach at the rDNA locus. The lambda-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s...

  13. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds

    OpenAIRE

    Stafuzza, Nedenia Bonvino; Zerlotini, Adhemar; Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Dan?sio Prado; Garrick, Dorian J.; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto

    2017-01-01

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of whi...

  14. Importance of purine and pyrimidine content of local nucleotide sequences (six bases long) for evolution of the human immunodeficiency virus type 1.

    Science.gov (United States)

    Doi, H

    1991-10-15

    Human immunodeficiency virus type 1 evolves rapidly, and random base change is thought to act as a major factor in this evolution. However, segments of the viral genome differ in their variability: there is the highly variable env gene, particularly hypervariable regions located within env, and, in contrast, the conservative gag and pol genes. Computer analysis of the nucleotide sequences of human immunodeficiency virus type 1 isolates reveals that base substitution in this virus is nonrandom and affected by local nucleotide sequences. Certain local sequences 6 base pairs long are excessively frequent in the hypervariable regions. These sequences exhibit base-substitution hotspots at specific positions in their 6 bases. The hotspots tend to be nonsilent letters of codons in the hypervariable regions--thus leading to marked amino acid substitutions there. Conversely, in the conservative gag and pol genes the hotspots tend to be silent letters because of a difference in codon frame from the hypervariable regions. Furthermore, base substitutions in the local sequences that frequently appear in the conservative genes occurred at a low level, even within the variable env. Thus, despite the high variability of this virus, the conservative genes and their products could be conserved. These may be some of the strategies evolved in human immunodeficiency virus type 1 to allow for positive-selection pressures, such as the host immune system, and negative-selection pressures on the conservative gene products.

  15. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

    DEFF Research Database (Denmark)

    Liu, Siyang; Huang, Shujia; Rao, Junhua

    2015-01-01

    present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...

  16. Complete nucleotide sequence and organization of the mitogenome of the red-spotted apollo butterfly, Parnassius bremeri (Lepidoptera: Papilionidae) and comparison with other lepidopteran insects.

    Science.gov (United States)

    Kim, Man Il; Baek, Jee Yeon; Kim, Min Jee; Jeong, Heon Cheon; Kim, Ki-Gyoung; Bae, Chang Hwan; Han, Yeon Soo; Jin, Byung Rae; Kim, Iksoo

    2009-10-31

    The 15,389-bp long complete mitogenome of the endangered red-spotted apollo butterfly, Parnassius bremeri (Lepidoptera: Papilionidae) was determined in this study. The start codon for the COI gene in insects has been extensively discussed, and has long remained a matter of some controversy. Herein, we propose that the CGA (arginine) sequence functions as the start codon for the COI gene in lepidopteran insects, on the basis of complete mitogenome sequences of lepidopteran insects, including P. bremeri, as well as additional sequences of the COI start region from a diverse taxonomic range of lepidopteran species (a total of 53 species from 15 families). In our extensive search for a tRNA-like structure in the A+T-rich region, one tRNA(Trp)-like sequence and one tRNA(Leu) (UUR)-like sequence were detected in the P. bremeri A+T-rich region, and one or more tRNA-like structures were detected in the A+T-rich region of the majority of other sequenced lepidopteran insects, thereby indicating that such features occur frequently in the lepidopteran mitogenomes. Phylogenetic analysis using the concatenated 13 amino acid sequences and nucleotide sequences of PCGs of the four macrolepidopteran superfamilies together with the Tortricoidea and Pyraloidea resulted in the successful recovery of a monophyly of Papilionoidea and a monophyly of Bombycoidea. However, the Geometroidea were unexpectedly identified as a sister group of the Bombycoidea, rather than the Papilionoidea.

  17. Main: Nucleotide Analysis [KOME

    Lifescience Database Archive (English)

    Full Text Available Nucleotide Analysis Japonica genome blast search result Result of blastn search against jap...onica genome sequence kome_japonica_genome_blast_search_result.zip kome_japonica_genome_blast_search_result ...

  18. The MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2010-01-01

    In vertebrates, the onset of cellular immune reactions is controlled by presentation of peptides in complex with major histocompatibility complex (MHC) molecules to T cell receptors. In humans, MHCs are called human leukocyte antigens (HLAs). Different MHC molecules present different subsets...... of peptides, and knowledge of their binding specificities is important for understanding differences in the immune response between individuals. Algorithms predicting which peptides bind a given MHC molecule have recently been developed with high prediction accuracy. The utility of these algorithms...... is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  19. Recombination-Independent Recognition of DNA Homology for Repeat-Induced Point Mutation (RIP Is Modulated by the Underlying Nucleotide Sequence.

    Directory of Open Access Journals (Sweden)

    Eugene Gladyshev

    2016-05-01

    Full Text Available Haploid germline nuclei of many filamentous fungi have the capacity to detect homologous nucleotide sequences present on the same or different chromosomes. Once recognized, such sequences can undergo cytosine methylation or cytosine-to-thymine mutation specifically over the extent of shared homology. In Neurospora crassa this process is known as Repeat-Induced Point mutation (RIP. Previously, we showed that RIP did not require MEI-3, the only RecA homolog in Neurospora, and that it could detect homologous trinucleotides interspersed with a matching periodicity of 11 or 12 base-pairs along participating chromosomal segments. This pattern was consistent with a mechanism of homology recognition that involved direct interactions between co-aligned double-stranded (ds DNA molecules, where sequence-specific dsDNA/dsDNA contacts could be established using no more than one triplet per turn. In the present study we have further explored the DNA sequence requirements for RIP. In our previous work, interspersed homologies were always examined in the context of a relatively long adjoining region of perfect homology. Using a new repeat system lacking this strong interaction, we now show that interspersed homologies with overall sequence identity of only 36% can be efficiently detected by RIP in the absence of any perfect homology. Furthermore, in this new system, where the total amount of homology is near the critical threshold required for RIP, the nucleotide composition of participating DNA molecules is identified as an important factor. Our results specifically pinpoint the triplet 5'-GAC-3' as a particularly efficient unit of homology recognition. Finally, we present experimental evidence that the process of homology sensing can be uncoupled from the downstream mutation. Taken together, our results advance the notion that sequence information can be compared directly between double-stranded DNA molecules during RIP and, potentially, in other processes

  20. Nucleotide sequence of an external transcribed spacer in Xenopus laevis rDNA: sequences flanking the 5' and 3' ends of 18S rRNA are non-complementary.

    Science.gov (United States)

    Maden, B E; Moss, M; Salim, M

    1982-04-10

    We have sequenced the external transcribed spacer (ETS) of a ribosomal transcription unit from Xenopus laevis, together with sections of the preceding non-transcribed spacer. Our analysis was carried out on the same cloned transcription unit as that from which the internal transcribed spacers (ITS) were previously sequenced. The ETS is approximately 712 nucleotides long and, like the ITS regions, is generally very rich in C plus G. Features of the sequence include an excess of oligo-C tracts over oligo-G tracts and a tract of 37 nucleotides consisting almost entirely of G and A residues. Parts of the sequence can give rise to stable internal secondary structures. However, in contrast to Escherichia coli, there is no potential for major base-pairing between the 18S flanking regions of the ETS and ITS. Further findings are that there are no initiation (ATG) codons in the ETS and that, as in other X.laevis rDNA cloned units, the sequence preceding the ETS is duplicated, with a few changes, in the "Bam island" sequence of the non-transcribed spacer.

  1. Identification and nucleotide sequence of a gene in equine herpesvirus 1 analogous to the herpes simplex virus gene encoding the major envelope glycoprotein gB.

    Science.gov (United States)

    Whalley, J M; Robertson, G R; Scott, N A; Hudson, G C; Bell, C W; Woodworth, L M

    1989-02-01

    A gene in equine herpesvirus 1 (EHV-1; equine abortion virus) equivalent to the gB glycoprotein gene of herpes simplex virus (HSV) has been identified by DNA hybridization and nucleotide sequencing. A 4.3 kbp EHV-1 PstI-ClaI sequence (0.40 to 0.43 map units) contained an open reading frame flanked by appropriate control elements and was capable of encoding a polypeptide of 980 amino acids. This had 50 to 60% identity over a 617 amino acid conserved region with the gB gene products of HSV and three other alphaherpesviruses, and 20 to 30% identity with those of human cytomegalovirus and Epstein-Barr virus. Analysis of the amino acid sequence predicts a long signal peptide, hydrophobic and hydrophilic domains and N-glycosylation sites, and has identified a probable internal proteolytic cleavage site. The EHV-1 gB open reading frame appears to be overlapped at its 5' end by 135 nucleotides of the 3' end of an upstream open reading frame the potential translation product of which has approximately 50% identity with HSV gene ICP 18.5 and VZV gene 30 products.

  2. Karyological characterization and identification of four repetitive element groups (the 18S – 28S rRNA gene, telomeric sequences, microsatellite repeat motifs, Rex retroelements) of the Asian swamp eel (Monopterus albus)

    Science.gov (United States)

    Suntronpong, Aorarat; Thapana, Watcharaporn; Twilprawat, Panupon; Prakhongcheep, Ornjira; Somyong, Suthasinee; Muangmai, Narongrit; Surin Peyachoknagul; Srikulnath, Kornsorn

    2017-01-01

    Abstract Among teleost fishes, Asian swamp eel (Monopterus albus Zuiew, 1793) possesses the lowest chromosome number, 2n = 24. To characterize the chromosome constitution and investigate the genome organization of repetitive sequences in M. albus, karyotyping and chromosome mapping were performed with the 18S – 28S rRNA gene, telomeric repeats, microsatellite repeat motifs, and Rex retroelements. The 18S – 28S rRNA genes were observed to the pericentromeric region of chromosome 4 at the same position with large propidium iodide and C-positive bands, suggesting that the molecular structure of the pericentromeric regions of chromosome 4 has evolved in a concerted manner with amplification of the 18S – 28S rRNA genes. (TTAGGG)n sequences were found at the telomeric ends of all chromosomes. Eight of 19 microsatellite repeat motifs were dispersedly mapped on different chromosomes suggesting the independent amplification of microsatellite repeat motifs in M. albus. Monopterus albus Rex1 (MALRex1) was observed at interstitial sites of all chromosomes and in the pericentromeric regions of most chromosomes whereas MALRex3 was scattered and localized to all chromosomes and MALRex6 to several chromosomes. This suggests that these retroelements were independently amplified or lost in M. albus. Among MALRexs (MALRex1, MALRex3, and MALRex6), MALRex6 showed higher interspecific sequence divergences from other teleost species in comparison. This suggests that the divergence of Rex6 sequences of M. albus might have occurred a relatively long time ago. PMID:29093797

  3. Complete genetic organization and functional aspects of the Escherichia coli S fimbrial adhesin determinant: nucleotide sequence of the genes sfaB, C, D, E, F.

    OpenAIRE

    Schmoll, T.; Morschhäuser, J.; Ott, M.; Ludwig, B.; Van Die, I.; Hacker, Jörg

    2011-01-01

    The S fimbrial adhesin (sfa) determinant of E. co/i comprises nine genes situated on a stretch of 7.9 kilobases (kb) DNA. Here the nucleotide sequence of the genes sfa B and sfaC situated proximal to the main structural gene sfaA is described. Sfa-LacZ fusions show that the two genes are transcribed in opposite directions. The isolation of mutants in the proximal region of the sfa gene cluster, the construction of sfa-phoA gene fusions and subsequent transcomplementation sturlies indicated th...

  4. Complete nucleotide sequence of the self-transmissible TOL plasmid pD2RT provides new insight into arrangement of toluene catabolic plasmids

    DEFF Research Database (Denmark)

    Jutkina, Jekaterina; Hansen, Lars H.; Li, Lili

    2013-01-01

    In the present study we report the complete nucleotide sequence of the toluene catabolic plasmid pD2RT of Pseudomonas migulae strain D2RT isolated from Baltic Sea water. The pD2RT is 129,894 base pairs in size with an average G+ C content of 53.75%. A total of 135 open reading frames (ORFs) were ...... predicted to encode proteins, among them genes for catabolism of toluene, plasmid replication, maintenance and conjugative transfer. ORFs encoding proteins with putative functions in stress response, transposition and site- ...

  5. Complete nucleotide sequence of pGA45, a 140,698-bp incFIIY plasmid encoding blaIMI-3-mediated carbapenem resistance, from river sediment

    Directory of Open Access Journals (Sweden)

    Bingjun eDang

    2016-02-01

    Full Text Available Plasmid pGA45 was isolated from the sediment of Haihe River using E. coli CV601 (gfp-tagged as recipients and indigenous bacteria from sediment as donors. This plasmid confers reduced susceptibility to imipenem which belongs to carbapenem group. Plasmid pGA45 was fully sequenced on an Illumina HiSeq 2000 sequencing system. The complete sequence of plasmid pGA45 was 140,698 bp in length with an average G+C content of 52.03%. Sequence analysis shows that pGA45 belongs to incFIIY group and harbors a backbone region shares high homology and gene synteny to several other incF plasmids including pNDM1_EC14653, pYDC644, pNDM-Ec1GN574, pRJF866, pKOX_NDM1 and pP10164-NDM. In addition to the backbone region, plasmid pGA45 harbors two notable features including one blaIMI-3-containing region and one type VI secretion system region. The blaIMI-3-containing region is responsible for bacteria carbapenem resistance and the type VI secretion system region is probably involved in bacteria virulence, respectively. Plasmid pGA45 represents the first complete nucleotide sequence of the blaIMI-harboring plasmid from environment sample and the sequencing of this plasmid provided insight into the architecture used for the dissemination of blaIMI carbapenemase genes.

  6. Nucleotide sequence analyses of coat protein gene of peanut stunt virus isolates from alfalfa and different hosts show a new tentative subgroup from Iran.

    Science.gov (United States)

    Amid-Motlagh, Mohammad Hadi; Massumi, Hossein; Heydarnejad, Jahangir; Mehrvar, Mohsen; Hajimorad, Mohammad Reza

    2017-09-01

    Alfalfa cultivars grown in 14 provinces in Iran were surveyed for the relative incidence of peanut stunt virus (PSV) during 2013-2016. PSV were detected in 41.89% of symptomatic alfalfa samples and a few alternate hosts by plate-trapped antigen ELISA. Among other hosts tested only Chenopodium album , Robinia pseudoacacia and Arachis hypogaea were found naturally infected with PSV. Twenty five isolates of PSV were chosen for biological and molecular characterizations based on their geographical distributions. There was not any differences in experimental host range of these isolates; however, variation in systemic symptoms observed on Nicotiana glutinosa . Total RNA from 25 of viral isolates were subjected to reverse transcription polymerase chain reaction analysis using primers directed against coat protein (CP) gene. The CP genes of 25 Iranian PSV isolates were either 651 or 666 nucleotides long. The nucleotide and amino acid identities for CP gene among Iranian PSV isolates were 79.3-99.7 and 72-100%, respectively. They also shared between 67.4 and 82.4% pairwise nucleotide identity with other PSV isolates reported elsewhere in the world. Phylogenetic analyses of CP gene sequences showed formation of a new subgroup comprising only the Iranian isolates. Natural infection of a few alternate hosts with PSV is reported for the first time from Iran.

  7. The nucleotide sequence of metallothioneins (MT) in liver of the Kafue lechwe (Kobus leche kafuensis) and their potential as biomarkers of heavy metal pollution of the Kafue River.

    Science.gov (United States)

    M'kandawire, Ethel; Syakalima, Michelo; Muzandu, Kaampwe; Pandey, Girja; Simuunza, Martin; Nakayama, Shouta M M; Kawai, Yusuke K; Ikenaka, Yoshinori; Ishizuka, Mayumi

    2012-09-15

    The study determined heavy metal concentrations and MT1 nucleotide sequence [phylogeny] in liver of the Kafue lechwe. Applicability of MT1 as a biomarker of pollution was assessed. cDNA-encoding sequences for lechwe MT1 were amplified by RT-PCR to characterize the sequence of MT1 which was subjected to BLAST searching at NCBI. Phylogenetic relationships were based on pairwise matrix of sequence divergences calculated by Clustal W. Phylogenetic tree was constructed by NJ method using PHILLIP program. Metals were extracted by acid digestion and concentrations of Cr, Co, Cu, Zn, Cd, Pb, and Ni were determined using an AAS. MT1 mRNA expression levels were measured by quantitative comparative real-time RT-PCR. Lechwe MT1 has a length of 183bp, which encode for MT1 proteins of 61AA, which include 20 cysteines. Nucleotide sequence of lechwe MT1 showed identity with sheep MT (97%) and cattle MT1E (97%). Phylogenetic tree revealed that lechwe MT1 was clustered with sheep MT and cattle MT1E. Cu and Ni concentrations and MT1 mRNA expression levels of lechwe from Blue Lagoon were significantly higher than those from Lochinvar (p<0.05). Concentrations of Cd and Cu, Co and Cu, Co and Pb, Ni and Cu, and Ni and Cr were positively correlated. Spearman's rank correlations also showed positive correlations between Cu and Co concentrations and MT mRNA expression. PCA further suggested that MT mRNA expression was related to Zn and Cd concentrations. Hepatic MT1 mRNA expression in lechwe can be used as biomarker of heavy metal pollution. Copyright © 2012 Elsevier B.V. All rights reserved.

  8. Target motifs affecting natural immunity by a constitutive CRISPR-Cas system in Escherichia coli.

    Directory of Open Access Journals (Sweden)

    Cristóbal Almendros

    Full Text Available Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR and CRISPR associated (cas genes conform the CRISPR-Cas systems of various bacteria and archaea and produce degradation of invading nucleic acids containing sequences (protospacers that are complementary to repeat intervening spacers. It has been demonstrated that the base sequence identity of a protospacer with the cognate spacer and the presence of a protospacer adjacent motif (PAM influence CRISPR-mediated interference efficiency. By using an original transformation assay with plasmids targeted by a resident spacer here we show that natural CRISPR-mediated immunity against invading DNA occurs in wild type Escherichia coli. Unexpectedly, the strongest activity is observed with protospacer adjoining nucleotides (interference motifs that differ from the PAM both in sequence and location. Hence, our results document for the first time native CRISPR activity in E. coli and demonstrate that positions next to the PAM in invading DNA influence their recognition and degradation by these prokaryotic immune systems.

  9. Identification and Analysis of Informative Single Nucleotide Polymorphisms in 16S rRNA Gene Sequences of the Bacillus cereus Group.

    Science.gov (United States)

    Hakovirta, Janetta R; Prezioso, Samantha; Hodge, David; Pillai, Segaran P; Weigel, Linda M

    2016-11-01

    Analysis of 16S rRNA genes is important for phylogenetic classification of known and novel bacterial genera and species and for detection of uncultivable bacteria. PCR amplification of 16S rRNA genes with universal primers produces a mixture of amplicons from all rRNA operons in the genome, and the sequence data generally yield a consensus sequence. Here we describe valuable data that are missing from consensus sequences, variable effects on sequence data generated from nonidentical 16S rRNA amplicons, and the appearance of data displayed by different software programs. These effects are illustrated by analysis of 16S rRNA genes from 50 strains of the Bacillus cereus group, i.e., Bacillus anthracis, Bacillus cereus, Bacillus mycoides, and Bacillus thuringiensis These species have 11 to 14 rRNA operons, and sequence variability occurs among the multiple 16S rRNA genes. A single nucleotide polymorphism (SNP) previously reported to be specific to B. anthracis was detected in some B. cereus strains. However, a different SNP, at position 1139, was identified as being specific to B. anthracis, which is a biothreat agent with high mortality rates. Compared with visual analysis of the electropherograms, basecaller software frequently missed gene sequence variations or could not identify variant bases due to overlapping basecalls. Accurate detection of 16S rRNA gene sequences that include intragenomic variations can improve discrimination among closely related species, improve the utility of 16S rRNA databases, and facilitate rapid bacterial identification by targeted DNA sequence analysis or by whole-genome sequencing performed by clinical or reference laboratories. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  10. Complete nucleotide sequence of little cherry virus 1 (LChV-1) infecting sweet cherry in China

    Science.gov (United States)

    Little cherry virus 1 (LChV-1), associated with little cherry disease (LCD), has a significant impact on fruit quality of infected sweet cherry trees. We report the full genome sequence of an isolate of LChV-1 from China, detected by small RNA deep sequencing and amplified by overlapping RT-PCR. The...

  11. The nucleotide sequence of the RNA-2 of an isolate of the English serotype of tomato black ring virus: RNA recombination in the history of nepoviruses.

    Science.gov (United States)

    Le Gall, O L; Lanneau, M; Candresse, T; Dunez, J

    1995-05-01

    The RNA-2 of a carrot isolate from the English serotype of tomato black ring nepovirus (TBRV-ED) has been sequenced. It is 4618 nucleotides long and contains one open reading frame encoding a polypeptide of 1344 amino acids. The 5' non-coding region contains three repetitions of a stem-loop structure also conserved in TBRV-Scottish and grapevine chrome mosaic nepovirus (GCMV). The coat protein domain was mapped to the carboxy-terminal one-third of the polyprotein. Sequence comparisons indicate that TBRV-ED RNA-2 probably arose by an RNA recombination event that resulted in the exchange of the putative movement protein gene between TBRV and GCMV.

  12. Nucleotide sequence of medium-chain acyl-CoA dehydrogenase mRNA and its expression in enzyme-deficient human tissue

    Energy Technology Data Exchange (ETDEWEB)

    Kelly, D.P.; Kim, J.J.; Billadello, J.J.; Hainline, B.E.; Chu, T.W.; Strauss, A.W.

    1987-06-01

    Medium-chain acyl-CoA dehydrogenase is one of three similar enzymes that catalyze the initial step of fatty acid ..beta..-oxidation. Definition of the primary structure of MCAD and the tissue distribution of its mRNA is of biochemical and clinical importance because of the recent recognition of inherited MCAD deficiency in humans. The MCAD mRNA nucleotide sequence was determined from two overlapping cDNA clones isolated from human liver and placental cDNA libraries, respectively. The MCAD mRNA includes a 1263-base-pair coding region and a 738-base-pair 3'-nontranslated region. A partial amino acid sequence (137 residues) determined on peptides derived from MCAD purified from porcine liver confirmed the identity of the cDNA clone. Comparison of the amino acid sequence predicted from the human MCAD cDNA with the partial protein sequence of the porcine MCAD revealed a high degree (88%) of interspecies sequence identity. RNA blot analysis shows that MCAD mRNA is expressed in a variety of rat (2.2 kilobases) and human (2.4 kilobases) tissues. Blot hybridization of RNA prepared from cultured skin fibroblasts from a patient with MCAD deficiency disclosed that mRNA was present and of similar size of MCAD mRNA derived from control fibroblasts. The isolation and characterization of MCAD cDNA is an important step in the definition of the defect underlying its metabolic consequences.

  13. Detection of de novo single nucleotide variants in offspring of atomic-bomb survivors close to the hypocenter by whole-genome sequencing.

    Science.gov (United States)

    Horai, Makiko; Mishima, Hiroyuki; Hayashida, Chisa; Kinoshita, Akira; Nakane, Yoshibumi; Matsuo, Tatsuki; Tsuruda, Kazuto; Yanagihara, Katsunori; Sato, Shinya; Imanishi, Daisuke; Imaizumi, Yoshitaka; Hata, Tomoko; Miyazaki, Yasushi; Yoshiura, Koh-Ichiro

    2018-03-01

    Ionizing radiation released by the atomic bombs at Hiroshima and Nagasaki, Japan, in 1945 caused many long-term illnesses, including increased risks of malignancies such as leukemia and solid tumours. Radiation has demonstrated genetic effects in animal models, leading to concerns over the potential hereditary effects of atomic bomb-related radiation. However, no direct analyses of whole DNA have yet been reported. We therefore investigated de novo variants in offspring of atomic-bomb survivors by whole-genome sequencing (WGS). We collected peripheral blood from three trios, each comprising a father (atomic-bomb survivor with acute radiation symptoms), a non-exposed mother, and their child, none of whom had any past history of haematological disorders. One trio of non-exposed individuals was included as a control. DNA was extracted and the numbers of de novo single nucleotide variants in the children were counted by WGS with sequencing confirmation. Gross structural variants were also analysed. Written informed consent was obtained from all participants prior to the study. There were 62, 81, and 42 de novo single nucleotide variants in the children of atomic-bomb survivors, compared with 48 in the control trio. There were no gross structural variants in any trio. These findings are in accord with previously published results that also showed no significant genetic effects of atomic-bomb radiation on second-generation survivors.

  14. Complete nucleotide sequence of the multidrug resistance IncA/C plasmid pR55 from Klebsiella pneumoniae isolated in 1969.

    Science.gov (United States)

    Doublet, Benoît; Boyd, David; Douard, Gregory; Praud, Karine; Cloeckaert, Axel; Mulvey, Michael R

    2012-10-01

    To determine the complete nucleotide sequence of the multidrug resistance IncA/C plasmid pR55 from a clinical Klebsiella pneumoniae strain that was isolated from a urinary tract infection in 1969 in a French hospital and compare it with those of contemporary emerging IncA/C plasmids. The plasmid was purified and sequenced using a 454 sequencing approach. After draft assembly, additional PCRs and walking reads were performed for gap closure. Sequence comparisons and multiple alignments with other IncA/C plasmids were done using the BLAST algorithm and CLUSTAL W, respectively. Plasmid pR55 (170 810 bp) revealed a shared plasmid backbone (>99% nucleotide identity) with current members of the IncA/C(2) multidrug resistance plasmid family that are widely disseminating antibiotic resistance genes. Nevertheless, two specific multidrug resistance gene arrays probably acquired from other genetic elements were identified inserted at conserved hotspot insertion sites in the IncA/C backbone. A novel transposon named Tn6187 showed an atypical mixed transposon configuration composed of two mercury resistance operons and two transposition modules that are related to Tn21 and Tn1696, respectively, and an In0-type integron. IncA/C(2) multidrug resistance plasmids have a broad host range and have been implicated in the dissemination of antibiotic resistance among Enterobacteriaceae from humans and animals. This typical IncA/C(2) genetic scaffold appears to carry various multidrug resistance gene arrays and is now also a successful vehicle for spreading AmpC-like cephalosporinase and metallo-β-lactamase genes, such as bla(CMY) and bla(NDM), respectively.

  15. Complete nucleotide sequence and genome analysis of bacteriophage BFK20 — A lytic phage of the industrial producer Brevibacterium flavum

    Czech Academy of Sciences Publication Activity Database

    Bukovska, G.; Klucar, L.; Vlček, Čestmír; Adamovic, J.; Turna, J.; Timko, J.

    2006-01-01

    Roč. 348, č. 1 (2006), s. 57-71 ISSN 0042-6822 Grant - others:Slovenská akademie věd(SK) VEGA2/5068/25; Science and Technology Assistance Agency(SK) APVT-51-025004 Institutional research plan: CEZ:AV0Z50520514 Keywords : Bacteriophage * Complete genome sequence * Sequence analysis Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.525, year: 2006

  16. Extended region of nodulation genes in Rhizobium meliloti 1021. II. Nucleotide sequence, transcription start sites and protein products

    International Nuclear Information System (INIS)

    Fisher, R.F.; Swanson, J.A.; Mulligan, J.T.; Long, S.R.

    1987-01-01

    The authors have established the DNA sequence and analyzed the transcription and translation products of a series of putative nodulation (nod) genes in Rhizobium meliloti strain 1021. Four loci have been designated nodF, nodE, nodG and nodH. The correlation of transposon insertion positions with phenotypes and open reading frames was confirmed by sequencing the insertion junctions of the transposons. The protein products of these nod genes were visualized by in vitro expression of cloned DNA segments in a R. meliloti transcription-translation system. In addition, the sequence for nodG was substantiated by creating translational fusions in all three reading frames at several points in the sequence; the resulting fusions were expressed in vitro in both E. coli and R. meliloti transcription-translation systems. A DNA segment bearing several open reading frames downstream of nodG corresponds to the putative nod gene mutated in strain nod-216. The transcription start sites of nodF and nodH were mapped by primer extension of RNA from cells induced with the plant flavone, luteolin. Initiation of transcription occurs approximately 25 bp downstream from the conserved sequence designated the nod box, suggesting that this conserved sequence acts as an upstream regulator of inducible nod gene expression. Its distance from the transcription start site is more suggestive of an activator binding site rather than an RNA polymerase binding site

  17. Strand bias in complementary single-nucleotide polymorphisms of transcribed human sequences: evidence for functional effects of synonymous polymorphisms

    Directory of Open Access Journals (Sweden)

    Majewski Jacek

    2006-08-01

    Full Text Available Abstract Background Complementary single-nucleotide polymorphisms (SNPs may not be distributed equally between two DNA strands if the strands are functionally distinct, such as in transcribed genes. In introns, an excess of A↔G over the complementary C↔T substitutions had previously been found and attributed to transcription-coupled repair (TCR, demonstrating the valuable functional clues that can be obtained by studying such asymmetry. Here we studied asymmetry of human synonymous SNPs (sSNPs in the fourfold degenerate (FFD sites as compared to intronic SNPs (iSNPs. Results The identities of the ancestral bases and the direction of mutations were inferred from human-chimpanzee genomic alignment. After correction for background nucleotide composition, excess of A→G over the complementary T→C polymorphisms, which was observed previously and can be explained by TCR, was confirmed in FFD SNPs and iSNPs. However, when SNPs were separately examined according to whether they mapped to a CpG dinucleotide or not, an excess of C→T over G→A polymorphisms was found in non-CpG site FFD SNPs but was absent from iSNPs and CpG site FFD SNPs. Conclusion The genome-wide discrepancy of human FFD SNPs provides novel evidence for widespread selective pressure due to functional effects of sSNPs. The similar asymmetry pattern of FFD SNPs and iSNPs that map to a CpG can be explained by transcription-coupled mechanisms, including TCR and transcription-coupled mutation. Because of the hypermutability of CpG sites, more CpG site FFD SNPs are relatively younger and have confronted less selection effect than non-CpG FFD SNPs, which can explain the asymmetric discrepancy of CpG site FFD SNPs vs. non-CpG site FFD SNPs.

  18. Complete nucleotide sequence, genome organization, and biological properties of human immunodeficiency virus type 1 in vivo: evidence for limited defectiveness and complementation.

    Science.gov (United States)

    Li, Y; Hui, H; Burgess, C J; Price, R W; Sharp, P M; Hahn, B H; Shaw, G M

    1992-11-01

    Previous studies of the genetic and biologic characteristics of human immunodeficiency virus type 1 (HIV-1) have by necessity used tissue culture-derived virus. We recently reported the molecular cloning of four full-length HIV-1 genomes directly from uncultured human brain tissue (Y. Li, J. C. Kappes, J. A. Conway, R. W. Price, G. M. Shaw, and B. H. Hahn, J. Virol. 65:3973-3985, 1991). In this report, we describe the biologic properties of these four clones and the complete nucleotide sequences and genome organization of two of them. Clones HIV-1YU-2 and HIV-1YU-10 were 9,174 and 9,176 nucleotides in length, differed by 0.26% in nucleotide sequence, and except for a frameshift mutation in the pol gene in HIV-1YU-10, contained open reading frames corresponding to 5'-gag-pol-vif-vpr-tat-rev-vpu-env-nef-3' flanked by long terminal repeats. HIV-1YU-2 was fully replication competent, while HIV-1YU-10 and two other clones, HIV-1YU-21 and HIV-1YU-32, were defective. All three defective clones, however, when transfected into Cos-1 cells in any pairwise combination, yielded virions that were replication competent and transmissible by cell-free passage. The cellular host range of HIV-1YU-2 was strictly limited to primary T lymphocytes and monocyte-macrophages, a property conferred by its external envelope glycoprotein. Phylogenetic analyses of HIV-1YU-2 gene sequences revealed this virus to be a member of the North American/European HIV-1 subgroup, with specific similarity to other monocyte-tropic viruses in its V3 envelope amino acid sequence. These results indicate that HIV-1 infection of brain is characterized by the persistence of mixtures of fully competent, minimally defective, and more substantially altered viral forms and that complementation among them is readily attainable. In addition, the limited degree of genotypic heterogeneity observed among HIV-1YU and other brain-derived viruses and their preferential tropism for monocyte-macrophages suggest that viral

  19. Nucleotide sequence of the gene coding for human factor VII, a vitamin K-dependent protein participating in blood coagulation

    International Nuclear Information System (INIS)

    O'Hara, P.J.; Grant, F.J.; Haldeman, B.A.; Gray, C.L.; Insley, M.Y.; Hagen, F.S.; Murray, M.J.

    1987-01-01

    Activated factor VII (factor VIIa) is a vitamin K-dependent plasma serine protease that participates in a cascade of reactions leading to the coagulation of blood. Two overlapping genomic clones containing sequences encoding human factor VII were isolated and characterized. The complete sequence of the gene was determined and found to span about 12.8 kilobases. The mRNA for factor VII as demonstrated by cDNA cloning is polyadenylylated at multiple sites but contains only one AAUAAA poly(A) signal sequence. The mRNA can undergo alternative splicing, forming one transcript containing eight segments as exons and another with an additional exon that encodes a larger prepro leader sequence. The latter transcript has no known counterpart in the other vitamin K-dependent proteins. The positions of the introns with respect to the amino acid sequence encoded by the eight essential exons of factor VII are the same as those present in factor IX, factor X, protein C, and the first three exons of prothrombin. These exons code for domains generally conserved among members of this gene family. The comparable introns in these genes, however, are dissimilar with respect to size and sequence, with the exception of intron C in factor VII and protein C. The gene for factor VII also contains five regions made up of tandem repeats of oligonucleotide monomer elements. More than a quarter of the intron sequences and more than a third of the 3' untranslated portion of the mRNA transcript consist of these minisatellite tandem repeats

  20. Molecular study and nucleotide sequencing of Chlamydia abortus isolated from aborted sheep fetuses ewes of Alborz province

    Directory of Open Access Journals (Sweden)

    amirreza ebadi

    2015-02-01

    Full Text Available Chlamydia is an obligate intracellular and gram negative coccobacilli and one of the most important causes of abortion in ruminants especially in ewes. This investigation was performed with the purpose of molecular study and sequencing of Chlamydia abortus isolated from aborted sheep fetuses of Alborz Province. In this study, DNA extraction was performed on 100 samples from aborted fetuses of 32 sheep flocks from different areas of Alborz province. Then using specific primers of gene IGS-Sr- RNA, polymerase chain reaction was conducted and 10 samples were selected randomly from the positive cases were sent to Macrogene company in Korea for sequencing. In this study, 37 samples from a total of 100 aborted fetuses were positive for Chlamydia abortus. After sequencing, more than 99 percent of the positive samples were similar with sequences in gene bank. The sequencing results indicated that the samples were very similar to isolates LN554882/1, AF051935/1 and CR848038/1 of the gene bank and were in the same cluster. Also, this investigation indicated that Chlamydia abortus is one of the main reasons of ewe abortion in Alborz province.

  1. Cloning and nucleotide sequence analysis of pepV, a carnosinase gene from Lactobacillus delbrueckii subsp. lactis DSM 7290, and partial characterization of the enzyme.

    Science.gov (United States)

    Vongerichten, K F; Klein, J R; Matern, H; Plapp, R

    1994-10-01

    Cell extracts of Lactobacillus delbrueckii subsp. lactis DSM 7290 were found to exhibit unique peptolytic ability against unusual beta-alanyl-dipeptides. In order to clone the gene encoding this activity, designated pepV, a gene library of strain DSM 7290 genomic DNA, prepared in the low-copy-number plasmid pLG339, was screened for heterologous expression in Escherichia coli. Recombinant clones harbouring pepV were identified by their ability to allow the utilization of carnosine (beta-alanyl-histidine) as a source of histidine by the E. coli mutant strain UK197 (pepD, hisG). Complementation was observed in a colony harbouring a recombinant plasmid (pKV101), carrying pepV. A 2.4 kb fragment containing pepV was subcloned and its nucleotide sequence revealed an open reading frame (ORF) of 1413 nucleotides, corresponding to a protein with predicted molecular mass of 51998 Da. A single transcription initiation site 71 bp upstream of the ATG translational start codon was identified by primer extension. No significant homology was detected between pepV or its deduced amino acid sequence with any entry in the databases. The only similarity was found in a region conserved in the ArgE/DapE/CPG2/YscS family of proteins. This observation, and protease inhibitor studies, indicated that pepV is of the metalloprotease type. A second ORF present in the sequenced fragment showed extensive homology to a variety of amino acid permeases from E. coli and Saccharomyces cerevisiae.

  2. Nucleotide sequence and genome organization of a member of a new and distinct Caulimovirus species from dahlia.

    Science.gov (United States)

    Pappu, H R; Druffel, K L; Miglino, R; van Schadewijk, A R

    2008-01-01

    A distinct caulimovirus, associated with dahlia mosaic, was cloned and sequenced. The caulimovirus, tentatively designated as dahlia common mosaic virus (DCMV), had a double-stranded DNA genome of ca. 8 kb. The genome organization of DCMV was found to be typical of members of the genus Caulimovirus and consisted of six major open reading frames (ORFs), ORFs I-VI, and one minor ORF, ORF VII. Sequence comparisons with the DNA genomes of two known caulimoviruses isolated from dahlia, Dahlia mosaic virus (DMV) and an endogenous caulimovirus, DMV-D10, showed that DCMV is a member of a distinct caulimovirus species, with sequence identities among various ORFs ranging from 25 to 80%.

  3. Complete Nucleotide Sequences of Two VIM-1-Encoding Plasmids from Klebsiella pneumoniae and Leclercia adecarboxylata Isolates of Czech Origin.

    Science.gov (United States)

    Papousek, Ivo; Papagiannitsis, Costas C; Medvecky, Matej; Hrabak, Jaroslav; Dolejska, Monika

    2017-05-01

    Two multidrug resistance (MDR) plasmids, carrying the VIM-1-encoding integron In110, were characterized. Plasmid pLec-476cz (311,758 bp), from a Leclercia adecarboxylata isolate, consisted of an IncHI1 backbone, a MDR region, and two accessory elements. Plasmid pKpn-431cz (142,876 bp), from a sequence type 323 (ST323) Klebsiella pneumoniae isolate, comprised IncFII Y -derived and pKPN3-like sequences and a mosaic region. A 40,400-bp sequence of pKpn-431cz was identical to the MDR region of pLec-476cz, indicating the en bloc acquisition of the VIM-1-encoding region from one plasmid by the other. Copyright © 2017 American Society for Microbiology.

  4. Nucleotide sequence and genetic organization of a 7.3 kb region (map unit 47 to 52.5) of Autographa californica nuclear polyhedrosis virus fragment EcoRI-C

    NARCIS (Netherlands)

    Kool, M.; Broer, R.; Zuidema, D.; Goldbach, R. W.; Vlak, J. M.

    1994-01-01

    The nucleotide sequence and genetic organization of a 7297 bp region within the EcoRI-C fragment of Autographa californica multiple nucleocapsid nuclear polyhedrosis virus (AcMNPV) are presented. Eight putative open reading frames were found and their respective amino acid sequences compared with a

  5. Completion of the nucleotide sequence of the central region of Tn5 confirms the presence of three resistance genes.

    OpenAIRE

    Mazodier, P; Cossart, P; Giraud, E; Gasser, F

    1985-01-01

    The DNA sequence of the region located downstream from the kanamycin resistance gene of Tn5 up to the right inverted repeat IS50R has been determined. This completes the determination of the sequence of Tn5 which is 5818 bp long. The 2.7 Kb central region contains three resistance genes: the kanamycin-neomycin resistance gene, a gene coding for resistance to CL990 an antimitotic-antibiotic compound of the bleomycin family and a third gene that confers streptomycin resistance in some bacterial...

  6. Nucleotide sequence analysis of the Legionella micdadei mip gene, encoding a 30-kilodalton analog of the Legionella pneumophila Mip protein

    DEFF Research Database (Denmark)

    Bangsborg, Jette Marie; Cianciotto, N P; Hindersson, P

    1991-01-01

    After the demonstration of analogs of the Legionella pneumophila macrophage infectivity potentiator (Mip) protein in other Legionella species, the Legionella micdadei mip gene was cloned and expressed in Escherichia coli. DNA sequence analysis of the L. micdadei mip gene contained in the plasmid p...

  7. Cloning, Nucleotide Sequencing and Bioinformatics Study of NcSRS2 Gene, an Immunogen from Iranian Isolate of Neospora Caninum

    Directory of Open Access Journals (Sweden)

    M Soltani

    2013-03-01

    Full Text Available Background: Neosporosis is caused by an obligate intracellular parasitic protozoa Neospora caninum which infect variety of hosts. NcSRS2 is an immuno-dominant antigen of N. caninum which is consi­dered as one of the most promising targets for a recombinant or DNA vaccine against neosporosis. As no study has been carried out to identify the molecular structure of N. caninum in Iran, as first step, we prepared a scheme to identify this gene in this parasite in Iran.Methods: Tachyzoite total RNA was extracted and cDNA was synthesized and NcSRS2 gene was amplified using cDNA as template. Then the PCR product was cloned into pTZ57R/T vector and transformed into E. coli (DH5α strain. Finally, the recombinant plasmid was extracted from trans­formed E. coli and sequenced. Bioinformatics analysis also carried out.Results: The PCR product of NcSRS2 gene was sequenced and recorded in GenBank. The deduced amino acid sequence of NcSRS2 in current study was compared with other N. caninum NcSRS2 and showed some identities and differences.Conclusion: NcSRS2 gene of N. caninum successfully cloned in pTZ57R/T. Recombinant plasmid was confirmed by sequencing, colony PCR and enzymatic digestion. It is ready to express recombi­nant protein for further studies.

  8. Complete nucleotide sequences of seven soybean mosaic viruses (SMV), isolated from wild soybeans (Glycine soja) in China.

    Science.gov (United States)

    Chen, Yun-Xia; Wu, Mian; Ma, Fang-Fang; Chen, Jian-Qun; Wang, Bin

    2017-03-01

    Soybean mosaic virus (SMV) is a devastating plant virus classified in the family Potyviridae, and known to infect cultivated soybeans (Glycine max). In this study, seven new SMVs were isolated from wild soybean samples and analyzed by whole-genome sequencing. An updated SMV phylogeny was built with the seven new and 83 known SMV genomic sequences. Results showed that three northeastern SMV isolates were distributed in clade III and IV, while four southern SMVs were grouped together in clade II and all contained a recombinant BCMV fragment (~900 bp) in the upstream part of the genome. This work revealed that wild soybeans in China also act as important SMV hosts and play a role in the transmission and diversity of SMVs.

  9. Sequence-based separation of single-stranded DNA using nucleotides in capillary electrophoresis: focus on phosphate.

    Science.gov (United States)

    Zhang, Xueru; McGown, Linda B

    2013-06-01

    DNA analysis has widespread applicability in biology, medicine, biotechnology, and forensics. DNA separation by length is readily achieved using sieving gels in electrophoresis. Separation by sequence is less simple, generally requiring adequate differences in native or induced conformation or differences in thermal or chemical stability of the strands that are hybridized prior to measurement. We previously demonstrated separation of four single-stranded DNA 76-mers that differ by only a few A-G substitutions based solely on sequence using guanosine-5'-monophosphate (GMP) in the running buffer. We attributed separation to the unique self-assembly of GMP to form higher order structures. Here, we examine an expanded set of 76-mers designed to probe the mechanism of the separation and effects of experimental conditions. We were surprised to find that other ribonucleotides achieved the similar separation to GMP, and that some separation was achieved using sodium phosphate instead of GMP. Potassium phosphate achieved almost as good separations as the ribonucleotides. This suggests that the separation medium provides a physicochemical environment for the DNA that effects strand migration in a sequence-selective manner. Further investigation is needed to determine whether the mechanism involves specific interactions between the phosphates and the DNA strands or is a result of other properties of the separation medium. Phosphate generally has been avoided in DNA separations by capillary gel electrophoresis because its high ionic strength exacerbates Joule heating. Our results suggest that phosphate compounds should be examined for separation of DNA based on sequence. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  11. Absence of zero-temperature transmission rate of a double-chain tight-binding model for DNA with random sequence of nucleotides in thermodynamic limit

    International Nuclear Information System (INIS)

    Xiong Gang; Wang, X.R.

    2005-01-01

    The zero-temperature transmission rate spectrum of a double-chain tight-binding model for real DNA is calculated. It is shown that a band of extended-like states exists only for finite chain length with strong inter-chain coupling. While the whole spectrum tends to zero in thermodynamic limit, regardless of the strength of inter-chain coupling. It is also shown that a more faithful model for real DNA with periodic sugar-phosphate chains in backbone structures can be mapped into the above simple double-chain tight-binding model. Combined with above results, the transmission rate of real DNA with long random sequence of nucleotides is expected to be poor

  12. The complete nucleotide sequence of the genome of Barley yellow dwarf virus-RMV reveals it to be a new Polerovirus distantly related to other yellow dwarf viruses

    Directory of Open Access Journals (Sweden)

    Elizabeth N. Krueger

    2013-07-01

    Full Text Available The yellow dwarf viruses (YDVs of the Luteoviridae family represent the most widespread group of cereal viruses worldwide. They include the Barley yellow dwarf viruses (BYDVs of genus Luteovirus, the Cereal yellow dwarf viruses (CYDVs and Wheat yellow dwarf virus (WYDV of genus Polerovirus. All of these viruses are obligately aphid transmitted and phloem-limited. The first described YDVs (initially all called BYDV were classified by their most efficient vector. One of these viruses, BYDV-RMV, is transmitted most efficiently by the corn leaf aphid, Rhopalosiphum maidis. Here we report the complete 5612 nucleotide sequence of the genomic RNA of a Montana isolate of BYDV-RMV (isolate RMV MTFE87, Genbank accession no. KC921392. The sequence revealed that BYDV-RMV is a polerovirus, but it is quite distantly related to the CYDVs or WYDV, which are very closely related to each other. Nor is BYDV-RMV closely related to any other particular polerovirus. Depending on the gene that is compared, different poleroviruses (none of them a YDV share the most sequence similarity to BYDV-RMV. Because of its distant relationship to other YDVs, and because it commonly infects maize via its vector, R. maidis, we propose that BYDV-RMV be renamed Maize yellow dwarf virus-RMV (MYDV-RMV.

  13. The gene for indole-3-acetyl-L-aspartic acid hydrolase from Enterobacter agglomerans: molecular cloning, nucleotide sequence, and expression in Escherichia coli.

    Science.gov (United States)

    Chou, J C; Mulbry, W W; Cohen, J D

    1998-08-01

    A 5.5-kb DNA fragment containing the indole-3-acetyl-aspartic acid (IAA-asp) hydrolase gene (iaaspH) was isolated from Enterobacter agglomerans strain GK12 using a hybridization probe based on the N-terminal amino acid sequence of the protein. The DNA sequence of a 2.4-kb region of this fragment was determined and revealed a 1311-nucleotide ORF large enough to encode the 45-kDa IAA-asp hydrolase. A 1.5-kb DNA fragment containing iaaspH was subcloned into the Escherichia coli expression plasmid pTTQ8 to yield plasmid pJCC2. Extracts of IPTG-induced E. coli cultures containing the pJCC2 recombinant plasmid showed IAA-asp hydrolase levels 5 to 10-fold higher than those in E. agglomerans extracts. Homology searches revealed that the IAA-asp hydrolase was similar to a variety of amidohydrolases. In addition, IAA-asp hydrolase showed 70% sequence identity to a putative thermostable carboxypeptidase of E. coli.

  14. The influence of selection on the evolutionary distance estimated from the base changes observed between homologous nucleotide sequences.

    Science.gov (United States)

    Otsuka, J; Kawai, Y; Sugaya, N

    2001-11-21

    In most studies of molecular evolution, the nucleotide base at a site is assumed to change with the apparent rate under functional constraint, and the comparison of base changes between homologous genes is thought to yield the evolutionary distance corresponding to the site-average change rate multiplied by the divergence time. However, this view is not sufficiently successful in estimating the divergence time of species, but mostly results in the construction of tree topology without a time-scale. In the present paper, this problem is investigated theoretically by considering that observed base changes are the results of comparing the survivals through selection of mutated bases. In the case of weak selection, the time course of base changes due to mutation and selection can be obtained analytically, leading to a theoretical equation showing how the selection has influence on the evolutionary distance estimated from the enumeration of base changes. This result provides a new method for estimating the divergence time more accurately from the observed base changes by evaluating both the strength of selection and the mutation rate. The validity of this method is verified by analysing the base changes observed at the third codon positions of amino acid residues with four-fold codon degeneracy in the protein genes of mammalian mitochondria; i.e. the ratios of estimated divergence times are fairly well consistent with a series of fossil records of mammals. Throughout this analysis, it is also suggested that the mutation rates in mitochondrial genomes are almost the same in different lineages of mammals and that the lineage-specific base-change rates indicated previously are due to the selection probably arising from the preference of transfer RNAs to codons.

  15. Complete nucleotide sequence of the Coturnix chinensis (blue-breasted quail) mitochondrial genome and a phylogenetic analysis with related species.

    Science.gov (United States)

    Nishibori, M; Tsudzuki, M; Hayashi, T; Yamamoto, Y; Yasue, H

    2002-01-01

    Coturnix chinensis (blue-breasted quail) has been classically grouped in Galliformes Phasianidae Coturnix, based on morphologic features and biochemical evidence. Since the blue-breasted quail has the smallest body size among the species of Galliformes, in addition to a short generation time and an excellent reproductive performance, it is a possible model fowl for breeding and physiological studies of the Coturnix japonica (Japanese quail) and Gallus gallus domesticus (chicken), which are classified in the same family as blue-breasted quail. However, since its phylogenetic position in the family Phasianidae has not been determined conclusively, the sequence of the entire blue-breasted quail mitochondria (mt) genome was obtained to provide genetic information for phylogenetic analysis in the present study. The blue-breasted quail mtDNA was found to be a circular DNA of 16,687 base pairs (bp) with the same genomic structure as the mtDNAs of Japanese quail and chicken, though it is smaller than Japanese quail and chicken mtDNAs by 10 bp and 88 bp, respectively. The sequence identity of all mitochondrial genes, including those for 12S and 16S ribosomal RNAs, between blue-breasted quail and Japanese quail ranged from 84.5% to 93.5%; between blue-breasted quail and chicken, sequence identity ranged from 78.0% to 89.6%. In order to obtain information on the phylogenetic position of blue-breasted quail in Galliformes Phasianidae, the 2,184 bp sequence comprising NADH dehydrogenase subunit 2 and cytochrome b genes available for eight species in Galliformes [Japanese quail, chicken, Gallus varius (green junglefowl), Bambusicola thoracica (Chinese bamboo partridge), Pavo cristatus (Indian peafowl), Perdix perdix (gray partridge), Phasianus colchicus (ring-neck pheasant), and Tympanchus phasianellus (sharp-tailed grouse)] together with that of Aythya americana (redhead) were examined using a maximum likelihood (ML) method. The ML analyses on the first/second codon positions

  16. Pengembangan Motif Batik Khas Bali

    Directory of Open Access Journals (Sweden)

    Irfa'ina Rohana Salma

    2016-04-01

    Full Text Available ABSTRAKIndustri batik berkembang pesat di Bali, namun motif-motif batiknya tidak mencerminkan identitas khas daerah. Oleh karena itu perlu diciptakan desain motif batik khas Bali yang sumber inspirasinya digali budaya dan alam Bali. Tujuan penelitian dan penciptaan seni ini adalah untuk menghasilkan motif batik yang mempunyai bentuk  unik dan karakteristik sehingga dapat mencerminkan budaya dan alam Bali. Metode yang digunakan yaitu pengumpulan data, perancangan motif, perwujudan menjadi batik, serta uji estetikanya. Dari penciptaan seni ini berhasil diciptakan 5 motif batik yaitu: (1 Motif Jepun Alit; (2 Motif Jepun Ageng; (3 Motif Sekar Jagad Bali; (4 Motif Teratai Banji; dan (5 Motif Poleng Biru. Berdasarkan hasil penilaian “Selera Estetika” diketahui bahwa motif yang paling banyak disukai adalah Motif Jepun Alit, Motif Sekar Jagad Bali,  dan Motif Teratai Banji. Kata kunci: Motif Jepun Alit, Motif Jepun Ageng, Motif Sekar Jagad Bali, Motif Teratai Banji, Motif Poleng Biru ABSTRACT Batik industry is growing rapidly in Bali, but its batik motifs do not reflect the typical regional identities. Therefore, it is necessary to create a distinctive design motif source of Bali excavated  from the repertoire of traditional Balinese arts and culture. The purpose of this research and its art creation is to produce batik motifs that have a unique shape and characteristics  to reflect the Balinese culture and natural surroundings. The method used by gathering and collecting data, designing motifs to  become the embodiment of batik. From the creation of this art had been created 5 motifs, namely: (1 Motif Jepun Alit; (2 Motif Jepun Ageng; (3 Motif Sekar Jagad Bali; (4 Motif Teratai Banji; and (5 Motif Poleng Biru. Based on the results of aesthetical assessment known that the most preferred motif are  Motif Jepun Alit, Motif Sekar Jagad Bali, and Motif Teratai Banji. Key words: Motif Jepun Alit, Motif Jepun Ageng, Motif Sekar Jagad Bali, Motif

  17. Nucleotide Metabolism

    DEFF Research Database (Denmark)

    Martinussen, Jan; Willemoës, M.; Kilstrup, Mogens

    2011-01-01

    Metabolic pathways are connected through their utilization of nucleotides as supplier of energy, allosteric effectors, and their role in activation of intermediates. Therefore, any attempt to exploit a given living organism in a biotechnological process will have an impact on nucleotide metabolism...

  18. TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine.

    Science.gov (United States)

    Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun

    2016-10-01

    As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.

  19. Single-nucleotide variant in multiple copies of a deleted in azoospermia (DAZ) sequence - a human Y chromosome quantitative polymorphism.

    Science.gov (United States)

    Szmulewicz, Martin N; Ruiz, Luis M; Reategui, Erika P; Hussini, Saeed; Herrera, Rene J

    2002-01-01

    The evolution of the deleted in azoospermia (DAZ) gene family supports prevalent theories on the origin and development of sex chromosomes and sexual dimorphism. The ancestral DAZL gene in human chromosome 3 is known to be involved in germline development of both males and females. The available phylogenetic data suggest that some time after the divergence of the New World and Old World monkey lineages, the DAZL gene, which is found in all mammals, was copied to the Y chromosome of an ancestor to the Old World monkeys, but not New World monkeys. In modern man, the Y-linked DAZ gene complex is located on the distal part of the q arm. It is thought that after being copied to the Y chromosome, and after the divergence of the human and great ape lineages, the DAZ gene in the former underwent internal rearrangements. This included tandem duplications as well as a T > C transition altering an MboI restriction enzyme site in a duplicated sequence. In this study, we report on the ratios of MboI-/MboI+ variant sequences in individuals from seven worldwide human populations (Basque, Benin, Egypt, Formosa, Kungurtug, Oman and Rwanda) in the DAZ complex. The ratio of PCR MboI- and MboI+ amplicons can be used to characterize individuals and populations. Our results show a nonrandom distribution of MboI-/MboI+ sequence ratios in all populations examined, as well as significant differences in ratios between populations when compared pairwise. The multiple ratios imply that there have been more than one recent reorganization events at this locus. Considering the dynamic nature of this locus and its involvement in male fertility, we investigated the extent and distribution of this polymorphism. Copyright 2002 S. Karger AG, Basel

  20. Effect of intercalator substituent and nucleotide sequence on the stability of DNA- and RNA-naphthalimide complexes.

    Science.gov (United States)

    Johnson, Charles A; Hudson, Graham A; Hardebeck, Laura K E; Jolley, Elizabeth A; Ren, Yi; Lewis, Michael; Znosko, Brent M

    2015-07-01

    DNA intercalators are commonly used as anti-cancer and anti-tumor agents. As a result, it is imperative to understand how changes in intercalator structure affect binding affinity to DNA. Amonafide and mitonafide, two naphthalimide derivatives that are active against HeLa and KB cells in vitro, were previously shown to intercalate into DNA. Here, a systematic study was undertaken to change the 3-substituent on the aromatic intercalator 1,8-naphthalimide to determine how 11 different functional groups with a variety of physical and electronic properties affect binding of the naphthalimide to DNA and RNA duplexes of different sequence compositions and lengths. Wavelength scans, NMR titrations, and circular dichroism were used to investigate the binding mode of 1,8-naphthalimide derivatives to short synthetic DNA. Optical melting experiments were used to measure the change in melting temperature of the DNA and RNA duplexes due to intercalation, which ranged from 0 to 19.4°C. Thermal stabilities were affected by changing the substituent, and several patterns and idiosyncrasies were identified. By systematically varying the 3-substituent, the binding strength of the same derivative to various DNA and RNA duplexes was compared. The binding strength of different derivatives to the same DNA and RNA sequences was also compared. The results of these comparisons shed light on the complexities of site specificity and binding strength in DNA-intercalator complexes. For example, the consequences of adding a 5'-TpG-3' or 5'-GpT-3' step to a duplex is dependent on the sequence composition of the duplex. When added to a poly-AT duplex, naphthalimide binding was enhanced by 5.6-11.5°C, but when added to a poly-GC duplex, naphthalimide binding was diminished by 3.2-6.9°C. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Nucleotide sequence of Phaseolus vulgaris L. alcohol dehydrogenase encoding cDNA and three-dimensional structure prediction of the deduced protein.

    Science.gov (United States)

    Amelia, Kassim; Khor, Chin Yin; Shah, Farida Habib; Bhore, Subhash J

    2015-01-01

    Common beans (Phaseolus vulgaris L.) are widely consumed as a source of proteins and natural products. However, its yield needs to be increased. In line with the agenda of Phaseomics (an international consortium), work of expressed sequence tags (ESTs) generation from bean pods was initiated. Altogether, 5972 ESTs have been isolated. Alcohol dehydrogenase (AD) encoding gene cDNA was a noticeable transcript among the generated ESTs. This AD is an important enzyme; therefore, to understand more about it this study was undertaken. The objective of this study was to elucidate P. vulgaris L. AD (PvAD) gene cDNA sequence and to predict the three-dimensional (3D) structure of deduced protein. positive and negative strands of the PvAD cDNA clone were sequenced using M13 forward and M13 reverse primers to elucidate the nucleotide sequence. Deduced PvAD cDNA and protein sequence was analyzed for their basic features using online bioinformatics tools. Sequence comparison was carried out using bl2seq program, and tree-view program was used to construct a phylogenetic tree. The secondary structures and 3D structure of PvAD protein were predicted by using the PHYRE automatic fold recognition server. The sequencing results analysis showed that PvAD cDNA is 1294 bp in length. It's open reading frame encodes for a protein that contains 371 amino acids. Deduced protein sequence analysis showed the presence of putative substrate binding, catalytic Zn binding, and NAD binding sites. Results indicate that the predicted 3D structure of PvAD protein is analogous to the experimentally determined crystal structure of s-nitrosoglutathione reductase from an Arabidopsis species. The 1294 bp long PvAD cDNA encodes for 371 amino acid long protein that contains conserved domains required for biological functions of AD. The predicted deduced PvAD protein's 3D structure reflects the analogy with the crystal structure of Arabidopsis thaliana s-nitrosoglutathione reductase. Further study is required

  2. Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution.

    Science.gov (United States)

    Pausch, Hubert; Emmerling, Reiner; Gredler-Grandl, Birgit; Fries, Ruedi; Daetwyler, Hans D; Goddard, Michael E

    2017-11-09

    Genotyping and whole-genome sequencing data have been generated for hundreds of thousands of cattle. International consortia used these data to compile imputation reference panels that facilitate the imputation of sequence variant genotypes for animals that have been genotyped using dense microarrays. Association studies with imputed sequence variant genotypes allow for the characterization of quantitative trait loci (QTL) at nucleotide resolution particularly when individuals from several breeds are included in the mapping populations. We imputed genotypes for 28 million sequence variants in 17,229 cattle of the Braunvieh, Fleckvieh and Holstein breeds in order to compile large mapping populations that provide high power to identify QTL for milk production traits. Association tests between imputed sequence variant genotypes and fat and protein percentages in milk uncovered between six and thirteen QTL (P < 1e-8) per breed. Eight of the detected QTL were significant in more than one breed. We combined the results across breeds using meta-analysis and identified a total of 25 QTL including six that were not significant in the within-breed association studies. Two missense mutations in the ABCG2 (p.Y581S, rs43702337, P = 4.3e-34) and GHR (p.F279Y, rs385640152, P = 1.6e-74) genes were the top variants at QTL on chromosomes 6 and 20. Another known causal missense mutation in the DGAT1 gene (p.A232K, rs109326954, P = 8.4e-1436) was the second top variant at a QTL on chromosome 14 but its allelic substitution effects were inconsistent across breeds. It turned out that the conflicting allelic substitution effects resulted from flaws in the imputed genotypes due to the use of a multi-breed reference population for genotype imputation. Many QTL for milk production traits segregate across breeds and across-breed meta-analysis has greater power to detect such QTL than within-breed association testing. Association testing between imputed sequence variant genotypes and

  3. Human uroporphyrinogen III synthase: Molecular cloning, nucleotide sequence, and expression of a full-length cDNA

    International Nuclear Information System (INIS)

    Tsai, Shihfeng; Bishop, D.F.; Desnick, R.J.

    1988-01-01

    Uroporphyrinogen III synthase, the fourth enzyme in the heme biosynthetic pathway, is responsible for conversion of the linear tetrapyrrole, hydroxymethylbilane, to the cyclic tetrapyrrole, uroporphyrinogen III. The deficient activity of URO-synthase is the enzymatic defect in the autosomal recessive disorder congenital erythropoietic porphyria. To facilitate the isolation of a full-length cDNA for human URO-synthase, the human erythrocyte enzyme was purified to homogeneity and 81 nonoverlapping amino acids were determined by microsequencing the N terminus and four tryptic peptides. Two synthetic oligonucleotide mixtures were used to screen 1.2 x 10 6 recombinants from a human adult liver cDNA library. Eight clones were positive with both oligonucleotide mixtures. Of these, dideoxy sequencing of the 1.3 kilobase insert from clone pUROS-2 revealed 5' and 3' untranslated sequences of 196 and 284 base pairs, respectively, and an open reading frame of 798 base pairs encoding a protein of 265 amino acids with a predicted molecular mass of 28,607 Da. The isolation and expression of this full-length cDNA for human URO-synthase should facilitate studies of the structure, organization, and chromosomal localization of this heme biosynthetic gene as well as the characterization of the molecular lesions causing congenital erythropoietic porphyria

  4. Characterisation of purified parvalbumin from five fish species and nucleotide sequencing of this major allergen from Pacific pilchard, Sardinops sagax.

    Science.gov (United States)

    Beale, Janine E; Jeebhay, Mohamed F; Lopata, Andreas L

    2009-09-01

    IgE-mediated allergic reaction to seafood is a common cause of food allergy including anaphylactic reactions. Parvalbumin, the major fish allergen, has been shown to display IgE cross-reactivity among fish species consumed predominantly in Europe and the Far East. However, cross-reactivity studies of parvalbumin from fish species widely consumed in the Southern hemisphere are limited as is data relating to immunological and molecular characterisation. In this study, antigenic cross-reactivity and the presence of oligomers and isomers of parvalbumin from five highly consumed fish species in Southern Africa were assessed by immunoblotting using purified parvalbumin and crude fish extracts. Pilchard (Sardinops sagax) parvalbumin was found to display the strongest IgE reactivity among 10 fish-allergic consumers. The cDNA sequence of the beta-form of pilchard parvalbumin was determined and designated Sar sa 1.0101 (accession number FM177701 EMBL/GenBank/DDBJ databases). Oligomeric forms of parvalbumin were observed in all fish species using a monoclonal anti-parvalbumin antibody and subject's sera. Isoforms varied between approximately 10-13 kDa. A highly cross-reactive allergenic isoform of parvalbumin was identified and sequenced, providing a successful primary step towards the generation of a recombinant form that could be used for diagnostic and potential therapeutic use in allergic individuals.

  5. Ambiguous allele combinations in HLA Class I and Class II sequence-based typing: when precise nucleotide sequencing leads to imprecise allele identification

    Directory of Open Access Journals (Sweden)

    Larsen Paula

    2004-09-01

    Full Text Available Abstract Sequence-based typing (SBT is one of the most comprehensive methods utilized for HLA typing. However, one of the inherent problems with this typing method is the interpretation of ambiguous allele combinations which occur when two or more different allele combinations produce identical sequences. The purpose of this study is to investigate the probability of this occurrence. We performed HLA-A,-B SBT for Exons 2 and 3 on 676 donors. Samples were analyzed with a capillary sequencer. The racial distribution of the donors was as follows: 615-Caucasian, 13-Asian, 23-African American, 17-Hispanic and 8-Unknown. 672 donors were analyzed for HLA-A locus ambiguities and 666 donors were analyzed for HLA-B locus ambiguities. At the HLA-A locus a total of 548 total ambiguous allele combinations were identified (548/1344 = 41%. Most (278/548 = 51% of these ambiguities were due to the fact that Exon 4 analysis was not performed. At the HLA-B locus 322 total ambiguous allele combinations were found (322/1332 = 24%. The HLA-B*07/08/15/27/35/44 antigens, common in Caucasians, produced a large portion of the ambiguities (279/322 = 87%. A large portion of HLA-A and B ambiguous allele combinations can be addressed by utilizing a group-specific primary amplification approach to produce an unambiguous homozygous sequence. Therefore, although the prevalence of ambiguous allele combinations is high, if the resolution of these ambiguities is clinically warranted, methods exist to compensate for this problem.

  6. Average nucleotide identity of genome sequences supports the description of Rhizobium lentis sp. nov., Rhizobium bangladeshense sp. nov. and Rhizobium binae sp. nov. from lentil (Lens culinaris) nodules.

    Science.gov (United States)

    Rashid, M Harun-or; Young, J Peter W; Everall, Isobel; Clercx, Pia; Willems, Anne; Santhosh Braun, Markus; Wink, Michael

    2015-09-01

    Rhizobial strains isolated from effective root nodules of field-grown lentil (Lens culinaris) from different parts of Bangladesh were previously analysed using sequences of the 16S rRNA gene, three housekeeping genes (recA, atpD and glnII) and three nodulation genes (nodA, nodC and nodD), DNA fingerprinting and phenotypic characterization. Analysis of housekeeping gene sequences and DNA fingerprints indicated that the strains belonged to three novel clades in the genus Rhizobium. In present study, a representative strain from each clade was further characterized by determination of cellular fatty acid compositions, carbon substrate utilization patterns and DNA-DNA hybridization and average nucleotide identity (ANI) analyses from whole-genome sequences. DNA-DNA hybridization showed 50-62% relatedness to their closest relatives (the type strains of Rhizobium etli and Rhizobium phaseoli) and 50-60% relatedness to each other. These results were further supported by ANI values, based on genome sequencing, which were 87-92% with their close relatives and 88-89% with each other. On the basis of these results, three novel species, Rhizobium lentis sp. nov. (type strain BLR27(T) = LMG 28441(T) = DSM 29286(T)), Rhizobium bangladeshense sp. nov. (type strain BLR175(T) = LMG 28442(T) = DSM 29287(T)) and Rhizobium binae sp. nov. (type strain BLR195(T) = LMG 28443(T) = DSM 29288(T)), are proposed. These species share common nodulation genes (nodA, nodC and nodD) that are similar to those of the symbiovar viciae.

  7. Complete nucleotide sequence of a functional HLA-DP beta gene and the region between the DP beta 1 and DP alpha 1 genes: comparison of the 5' ends of HLA class II genes.

    OpenAIRE

    Kelly, A; Trowsdale, J

    1985-01-01

    The complete nucleotide sequence of an HLA-DP beta 1 gene and part of the adjacent DP alpha 1 gene, up to and including the signal sequence exon, were determined. The sequence of the DP beta 1 gene identified it as the DPw4 allele. The six exons of the DP beta 1 gene spanned over 11,000 bp of sequence. The arrangement of the gene was broadly analogous to genes of other class II beta chains. The beta 1 exon was flanked by introns of over 4 kb. Comparisons with published sequences of cDNA clone...

  8. Full-length genomic sequence analysis of new subtype 3k hepatitis E virus isolates with 99.97% nucleotide identity obtained from two consecutive acute hepatitis patients in a city in northeast Japan.

    Science.gov (United States)

    Miura, Masahito; Inoue, Jun; Tsuruoka, Mio; Nishizawa, Tsutomu; Nagashima, Shigeo; Takahashi, Masaharu; Shimosegawa, Tooru; Okamoto, Hiroaki

    2017-06-01

    Full-length genomic sequences of hepatitis E virus (HEV) obtained from two consecutive cases of acute self-limiting hepatitis E in a city in northeast Japan were determined. Interestingly, two HEV isolates from each patient shared nucleotide identity of 99.97% in 7 225 nucleotides, and a phylogenetic analysis showed that they formed a cluster of Japanese isolates that is considered as a new HEV subtype 3k. The high similarity of HEV sequences of two isolates from these patients in this study suggested that a subtype 3k HEV strain had spread via a commonly distributed food in the city, possibly pig liver. © 2016 Wiley Periodicals, Inc.

  9. The mitochondrial genome sequence of the ciliate Paramecium caudatum reveals a shift in nucleotide composition and codon usage within the genus Paramecium

    Directory of Open Access Journals (Sweden)

    Berendonk Thomas U

    2011-05-01

    Full Text Available Abstract Background Despite the fact that the organization of the ciliate mitochondrial genome is exceptional, only few ciliate mitochondrial genomes have been sequenced until today. All ciliate mitochondrial genomes are linear. They are 40 kb to 47 kb long and contain some 50 tightly packed genes without introns. Earlier studies documented that the mitochondrial guanine + cytosine contents are very different between Paramecium tetraurelia and all studied Tetrahymena species. This raises the question of whether the high mitochondrial G+C content observed in P. tetraurelia is a characteristic property of Paramecium mtDNA, or whether it is an exception of the ciliate mitochondrial genomes known so far. To test this question, we determined the mitochondrial genome sequence of Paramecium caudatum and compared the gene content and sequence properties to the closely related P. tetraurelia. Results The guanine + cytosine content of the P. caudatum mitochondrial genome was significantly lower than that of P. tetraurelia (22.4% vs. 41.2%. This difference in the mitochondrial nucleotide composition was accompanied by significantly different codon usage patterns in both species, i.e. within P. caudatum clearly A/T ending codons dominated, whereas for P. tetraurelia the synonymous codons were more balanced with a higher number of G/C ending codons. Further analyses indicated that the nucleotide composition of most members of the genus Paramecium resembles that of P. caudatum and that the shift observed in P. tetraurelia is restricted to the P. aurelia species complex. Conclusions Surprisingly, the codon usage bias in the P. caudatum mitochondrial genome, exemplified by the effective number of codons, is more similar to the distantly related T. pyriformis and other single-celled eukaryotes such as Chlamydomonas, than to the closely related P. tetraurelia. These differences in base composition and codon usage bias were, however, not reflected in the amino

  10. 'Genome order index' should not be used for defining compositional constraints in nucleotide sequences - a case study of the Z-curve

    Directory of Open Access Journals (Sweden)

    Josić Krešimir

    2010-02-01

    Full Text Available Abstract Background The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a "genome order index," was proposed, which is defined as S = a2+ c2+t2+g2, where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/ contains almost all points corresponding to various genomes, implying that S r2. The distribution of the points P obtained by S was studied using the Z-curve. Results In this work, we studied the basic properties of the Z-curve using the "genome order index" as a case study. We show that (1 the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect, (2 the S index is narrowly distributed, (3 based on the second parity rule, the S index can be derived directly from the Shannon entropy and is, therefore, redundant, and (4 the Z-curve suffers from over dimensionality, and the dimension stands for GC content alone suffices to represent any given genome. Conclusion The "genome order index" S does not represent a constraint on nucleotide composition. Moreover, S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant. Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively. Reviewers This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich (nominated by Itai Yanai.

  11. Genetic relatedness among indigenous rice varieties in the Eastern Himalayan region based on nucleotide sequences of the Waxy gene.

    Science.gov (United States)

    Choudhury, Baharul I; Khan, Mohammed L; Dayanandan, Selvadurai

    2014-12-29

    Indigenous rice varieties in the Eastern Himalayan region of Northeast India are traditionally classified into sali, boro and jum ecotypes based on geographical locality and the season of cultivation. In this study, we used DNA sequence data from the Waxy (Wx) gene to infer the genetic relatedness among indigenous rice varieties in Northeast India and to assess the genetic distinctiveness of ecotypes. The results of all three analyses (Bayesian, Maximum Parsimony and Neighbor Joining) were congruent and revealed two genetically distinct clusters of rice varieties in the region. The large group comprised several varieties of sali and boro ecotypes, and all agronomically improved varieties. The small group consisted of only traditionally cultivated indigenous rice varieties, which included one boro, few sali and all jum varieties. The fixation index analysis revealed a very low level of differentiation between sali and boro (F(ST) = 0.005), moderate differentiation between sali and jum (F(ST) = 0.108) and high differentiation between jum and boro (F(ST) = 0.230) ecotypes. The genetic relatedness analyses revealed that sali, boro and jum ecotypes are genetically heterogeneous, and the current classification based on cultivation type is not congruent with the genetic background of rice varieties. Indigenous rice varieties chosen from genetically distinct clusters could be used in breeding programs to improve genetic gain through heterosis, while maintaining high genetic diversity.

  12. Nucleotide sequence and functional analysis of the tet (M)-carrying conjugative transposon Tn5251 of Streptococcus pneumoniae.

    Science.gov (United States)

    Santoro, Francesco; Oggioni, Marco R; Pozzi, Gianni; Iannelli, Francesco

    2010-07-01

    The Tn916-like genetic element Tn5251 is part of the composite conjugative transposon (CTn) Tn5253 of Streptococcus pneumoniae, a 64.5-kb chromosomal element originally called Omega(cat-tet) BM6001. DNA sequence analysis showed that Tn5251 is 18 033-bp long and contains 22 ORFs, 20 of which have the same direction of transcription. Annotation was possible for 11 out of 22 ORFs, including the tet(M) tetracycline resistance gene and int and xis involved in the integration/excision process. Autonomous copies of Tn5251 were generated during matings of Tn5253-containing donors with S. pneumoniae and Enterococcus faecalis. Tn5251 was shown to integrate at different sites in the bacterial chromosome. It behaves as a fully functional CTn capable of independent conjugal transfer to a variety of bacterial species including S. pneumoniae, Streptococcus gordonii, Streptococcus pyogenes, Streptococcus agalactiae, E. faecalis and Bacillus subtilis. The excision of Tn5251 produces a circular intermediate and a deletion in Tn5253 at a level of 1.2 copies per 10(5) chromosomes.

  13. Population genetic structure in farm and feral American mink (Neovison vison) inferred from RAD sequencing-generated single nucleotide polymorphisms.

    Science.gov (United States)

    Thirstrup, J P; Ruiz-Gonzalez, A; Pujolar, J M; Larsen, P F; Jensen, J; Randi, E; Zalewski, A; Pertoldi, C

    2015-08-01

    Feral American mink populations (), derived from mink farms, are widespread in Europe. In this study we investigated genetic diversity and genetic differentiation between feral and farm mink using a panel of genetic markers (194 SNP) generated from RAD sequencing data. Sampling included a total of 211 individuals from 14 populations, 4 feral and 10 from farms, the latter including a total of 7 color types (Brown, Black, Mahogany, Sapphire, White, Pearl, and Silver). Our study revealed similar low levels of genetic diversity in both farm and feral mink. Results are consistent with small effective population size as a consequence of line selection in the farms and founder effects of a few escapees from the farms in feral populations. Moderately high genetic differentiation was found between farm and feral animals, suggesting a scenario in which wild populations were founded from farm escapes a few decades ago. Currently, escapes and gene flow are probably limited. Genetic differentiation was higher among farm color types than among farms, consistent with line selection using few individuals to create the lines. Finally, no indications of inbreeding were found in either farm or feral samples, with significant negative values found in most farm samples, showing farms are successful in avoiding inbreeding.

  14. Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2014-10-01

    Full Text Available As a direct result of intense heat and aridity, deserts are thought to be among the most harsh of environments, particularly for their mammalian inhabitants. Given that osmoregulation can be challenging for these animals, with failure resulting in death, strong selection should be observed on genes related to the maintenance of water and solute balance. One such animal, Peromyscus eremicus, is native to the desert regions of the southwest United States and may live its entire life without oral fluid intake. As a first step toward understanding the genetics that underlie this phenotype, we present a characterization of the P. eremicus transcriptome. We assay four tissues (kidney, liver, brain, testes from a single individual and supplement this with population level renal transcriptome sequencing from 15 additional animals. We identified a set of transcripts undergoing both purifying and balancing selection based on estimates of Tajima’s D. In addition, we used the branch-site test to identify a transcript—Slc2a9, likely related to desert osmoregulation—undergoing enhanced selection in P. eremicus relative to a set of related non-desert rodents.

  15. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  16. [Conserved motifs in the primary and secondary ITS1 structures in bryophytes].

    Science.gov (United States)

    Milyutina, I A; Ignatov, M S

    2015-01-01

    A study of the ITS1 nucleotide sequences of 1000 moss species of 62 families, 11 liverwort species from five orders, and one hornwort Anthoceros agrestis identified five highly conserved motifs (CM1-CM5), which are presumably involved in pre-rRNA processing. Although the ITS1 sequences substantially differ in length and the extent of divergence, the conserved motifs are found in all of them. ITS1 secondary structures were constructed for 76 mosses, and main regularities at conserved motif positioning were observed. The positions of processing sites in the ITS1 secondary structure of the yeast Saccharomyces cerevisiae were found to be similar to the positions of the conserved motifs in the ITS1 secondary structures of mosses and liverworts. In addition, a potential hairpin formation in the putative secondary structure of a pre-rRNA fragment was considered for the region between ITS1 CM4-CM5 and a highly conserved region between hairpins 49 and 50 (H49 and H50) of the 18S rRNA.

  17. Diagnosis of bovine foot and mouth disease virus by real-time polymerase chain reaction and nucleotide sequencing from outbreak herd samples in Ilesha Baruba, Kwara state, Nigeria

    Directory of Open Access Journals (Sweden)

    Olatunde Hamza Olabode

    2014-10-01

    Full Text Available Aim: Molecular diagnosis of bovine foot and mouth disease virus (FMDV from outbreak herd in Bukaru-Rontuwa, Sinawu/Tumbunya ward of Ilesha Baruba, in Kwara state-Nigeria was conducted to establish the associated serotypes and disease control plan. Materials and Methods: Purposive study was conducted in cattle outbreak herds during the dry season of January-March, 2011. Random sampling of blood and observed epithelial tissues was collected, stored in accordance with standard methods and subjected to RNA extraction and real-time reverse transcription polymerase chain reaction (rRT-PCR. Positive samples for FMDV were further subjected to reverse transcription polymerase chain reaction (RT-PCR, nucleotide sequencing using sequence primers of serotypes O, A, SAT 1-3 and gel electrophoresis. Obtained data were interpreted based on NCBI BLASTN program. Results: Foot and mouth disease (FMD-RNA extract was not found in all the blood tested with beta-actin range of Ct = 30-34. rRT-PCR assay showed two positive samples with Ct values of 18.79 and 15.28. Gel electrophoresis identified sequenced PCR amplicons as serotype A and SAT 2 respectively. Direct product sequencing confirmed SAT 2 serotype was closely related to SAT 2 isolate LIB/7/2003. Cloned RT-PCR product in pGEM-T easy vector confirmed serotype A as closely related to sequence of A/NIG/21/2009, though multiple NIG/2009 sequences were also identified as closely related. Both isolates showed marked genetic homogeneity with >93% genetic identity in the VP1 region which confirmed heterogeneity and antigenic variation nature of FMDV. Conclusion: Quasi species and subtypes of FMD serotypes A and SAT 2 similar to A/NIG/21/2009 and SAT 2/LIB/7/2003 respectively caused the reported FMD outbreaks in Fulani livestock herds investigated. A combined real-time and optimized RT-PCR protocols that would facilitate effective and timely FMD outbreak control plan based on identified serotypes is thus suggested.

  18. DNA Sequence Variation and Selection of Tag Single-Nucleotide Polymorphisms at Candidate Genes for Drought-Stress Response in Pinus taeda L.

    Science.gov (United States)

    González-Martínez, Santiago C.; Ersoz, Elhan; Brown, Garth R.; Wheeler, Nicholas C.; Neale, David B.

    2006-01-01

    Genetic association studies are rapidly becoming the experimental approach of choice to dissect complex traits, including tolerance to drought stress, which is the most common cause of mortality and yield losses in forest trees. Optimization of association mapping requires knowledge of the patterns of nucleotide diversity and linkage disequilibrium and the selection of suitable polymorphisms for genotyping. Moreover, standard neutrality tests applied to DNA sequence variation data can be used to select candidate genes or amino acid sites that are putatively under selection for association mapping. In this article, we study the pattern of polymorphism of 18 candidate genes for drought-stress response in Pinus taeda L., an important tree crop. Data analyses based on a set of 21 putatively neutral nuclear microsatellites did not show population genetic structure or genomewide departures from neutrality. Candidate genes had moderate average nucleotide diversity at silent sites (πsil = 0.00853), varying 100-fold among single genes. The level of within-gene LD was low, with an average pairwise r2 of 0.30, decaying rapidly from ∼0.50 to ∼0.20 at 800 bp. No apparent LD among genes was found. A selective sweep may have occurred at the early-response-to-drought-3 (erd3) gene, although population expansion can also explain our results and evidence for selection was not conclusive. One other gene, ccoaomt-1, a methylating enzyme involved in lignification, showed dimorphism (i.e., two highly divergent haplotype lineages at equal frequency), which is commonly associated with the long-term action of balancing selection. Finally, a set of haplotype-tagging SNPs (htSNPs) was selected. Using htSNPs, a reduction of genotyping effort of ∼30–40%, while sampling most common allelic variants, can be gained in our ongoing association studies for drought tolerance in pine. PMID:16387885

  19. Nucleotide variation and identification of novel blast resistance alleles of Pib by allele mining strategy.

    Science.gov (United States)

    Ramkumar, G; Madhav, M S; Devi, S J S Rama; Prasad, M S; Babu, V Ravindra

    2015-04-01

    Pib is one of significant rice blast resistant genes, which provides resistance to wide range of isolates of rice blast pathogen, Magnaporthe oryzae. Identification and isolation of novel and beneficial alleles help in crop enhancement. Allele mining is one of the best strategies for dissecting the allelic variations at candidate gene and identification of novel alleles. Hence, in the present study, Pib was analyzed by allele mining strategy, and coding and non-coding (upstream and intron) regions were examined to identify novel Pib alleles. Allelic sequences comparison revealed that nucleotide polymorphisms at coding regions affected the amino acid sequences, while the polymorphism at upstream (non-coding) region affected the motifs arrangements. Pib alleles from resistant landraces, Sercher and Krengosa showed better resistance than Pib donor variety, might be due to acquired mutations, especially at LRR region. The evolutionary distance, Ka/Ks and phylogenetic analyzes also supported these results. Transcription factor binding motif analysis revealed that Pib (Sr) had a unique motif (DPBFCOREDCDC3), while five different motifs differentiated the resistance and susceptible Pib alleles. As the Pib is an inducible gene, the identified differential motifs helps to understand the Pib expression mechanism. The identified novel Pib resistant alleles, which showed high resistance to the rice blast, can be used directly in blast resistance breeding program as alternative Pib resistant sources.

  20. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  1. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  2. Single Nucleotide Polymorphism

    DEFF Research Database (Denmark)

    Børsting, Claus; Pereira, Vania; Andersen, Jeppe Dyrberg

    2014-01-01

    Single nucleotide polymorphisms (SNPs) are the most frequent DNA sequence variations in the genome. They have been studied extensively in the last decade with various purposes in mind. In this chapter, we will discuss the advantages and disadvantages of using SNPs for human identification...

  3. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    Science.gov (United States)

    Stafuzza, Nedenia Bonvino; Zerlotini, Adhemar; Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto

    2017-01-01

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  4. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    Directory of Open Access Journals (Sweden)

    Nedenia Bonvino Stafuzza

    Full Text Available Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose, Gyr, Girolando and Holstein (dairy production. A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs and 3,828,041 insertions/deletions (InDels were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  5. Nucleotide sequence of the leading region adjacent to the origin of transfer on plasmid F and its conservation among conjugative plasmids.

    Science.gov (United States)

    Loh, S; Cram, D; Skurray, R

    1989-10-01

    The leading region of the Escherichia coli K12 F plasmid is the first segment of DNA to be transferred into the recipient cell during conjugal transfer. We report the nucleotide sequence of the 64.20-66.77F portion of the leading region immediately adjacent to the origin of transfer, oriT. The 2582 bp region encodes three open reading frames, ORF95, ORF169 and ORF273; the product of ORF273, is equivalent in size and map location to the 35 kDa protein, 6d, previously described (Cram et al. 1984). S1 nuclease analyses of mRNA transcripts have identified a potential promoter for ORF95 and ORF273 and indicated that these ORFs are transcribed as a single transcript; in contrast, ORF169 appears to be transcribed from two overlapping promoters on the complementary DNA strand. The products of ORF95 and ORF273 are mainly hydrophilic and are probably located in the cytoplasm. ORF273 shares some homology with DNA-binding proteins. There is a signal peptide sequence at the NH2-terminus of ORF169 and the mature form of ORF169 probably resides in the periplasm due to its hydrophilic nature. Both ORF273 and ORF169 are well conserved among conjugative F-like and a few non-F-like plasmids. On the other hand, ORF95 sequences are only present on some of these plasmids. Several primosome and integration host factor recognition sites are present implicating this region in DNA metabolism and/or replication functions.

  6. Genome-wide association study using high-density single nucleotide polymorphism arrays and whole-genome sequences for clinical mastitis traits in dairy cattle.

    Science.gov (United States)

    Sahana, G; Guldbrandtsen, B; Thomsen, B; Holm, L-E; Panitz, F; Brøndum, R F; Bendixen, C; Lund, M S

    2014-11-01

    Mastitis is a mammary disease that frequently affects dairy cattle. Despite considerable research on the development of effective prevention and treatment strategies, mastitis continues to be a significant issue in bovine veterinary medicine. To identify major genes that affect mastitis in dairy cattle, 6 chromosomal regions on Bos taurus autosome (BTA) 6, 13, 16, 19, and 20 were selected from a genome scan for 9 mastitis phenotypes using imputed high-density single nucleotide polymorphism arrays. Association analyses using sequence-level variants for the 6 targeted regions were carried out to map causal variants using whole-genome sequence data from 3 breeds. The quantitative trait loci (QTL) discovery population comprised 4,992 progeny-tested Holstein bulls, and QTL were confirmed in 4,442 Nordic Red and 1,126 Jersey cattle. The targeted regions were imputed to the sequence level. The highest association signal for clinical mastitis was observed on BTA 6 at 88.97 Mb in Holstein cattle and was confirmed in Nordic Red cattle. The peak association region on BTA 6 contained 2 genes: vitamin D-binding protein precursor (GC) and neuropeptide FF receptor 2 (NPFFR2), which, based on known biological functions, are good candidates for affecting mastitis. However, strong linkage disequilibrium in this region prevented conclusive determination of the causal gene. A different QTL on BTA 6 located at 88.32 Mb in Holstein cattle affected mastitis. In addition, QTL on BTA 13 and 19 were confirmed to segregate in Nordic Red cattle and QTL on BTA 16 and 20 were confirmed in Jersey cattle. Although several candidate genes were identified in these targeted regions, it was not possible to identify a gene or polymorphism as the causal factor for any of these regions. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  7. Localization of Daucus carota NMCP1 to the nuclear periphery: the role of the N-terminal region and an NLS-linked sequence motif, RYNLRR, in the tail domain

    Science.gov (United States)

    Kimura, Yuta; Fujino, Kaien; Ogawa, Kana; Masuda, Kiyoshi

    2014-01-01

    Recent ultrastructural studies revealed that a structure similar to the vertebrate nuclear lamina exists in the nuclei of higher plants. However, plant genomes lack genes for lamins and intermediate-type filament proteins, and this suggests that plant-specific nuclear coiled-coil proteins make up the lamina-like structure in plants. NMCP1 is a protein, first identified in Daucus carota cells, that localizes exclusively to the nuclear periphery in interphase cells. It has a tripartite structure comprised of head, rod, and tail domains, and includes putative nuclear localization signal (NLS) motifs. We identified the functional NLS of DcNMCP1 (carrot NMCP1) and determined the protein regions required for localizing to the nuclear periphery using EGFP-fused constructs transiently expressed in Apium graveolens epidermal cells. Transcription was driven under a CaMV35S promoter, and the genes were introduced into the epidermal cells by a DNA-coated microprojectile delivery system. Of the NLS motifs, KRRRK and RRHK in the tail domain were highly functional for nuclear localization. Addition of the N-terminal 141 amino acids from DcNMCP1 shifted the localization of a region including these NLSs from the entire nucleus to the nuclear periphery. Using this same construct, the replacement of amino acids in RRHK or its preceding sequence, YNL, with alanine residues abolished localization to the nuclear periphery, while replacement of KRRRK did not affect localization. The sequence R/Q/HYNLRR/H, including YNL and the first part of the sequence of RRHK, is evolutionarily conserved in a subclass of NMCP1 sequences from many plant species. These results show that NMCP1 localizes to the nuclear periphery by a combined action of a sequence composed of R/Q/HYNLRR/H, NLS, and the N-terminal region including the head and a portion of the rod domain, suggesting that more than one binding site is implicated in localization of NMCP1. PMID:24616728

  8. Localization of Daucus carota NMCP1 to the nuclear periphery: the role of the N-terminal region and an NLS-linked sequence motif, RYNLRR, in the tail domain

    Directory of Open Access Journals (Sweden)

    Yuta eKimura

    2014-02-01

    Full Text Available Recent ultrastructural studies revealed that a structure similar to the vertebrate nuclear lamina exists in the nuclei of higher plants. However, plant genomes lack genes for lamins and intermediate-type filament proteins, and this suggests that plant-specific nuclear coiled-coil proteins make up the lamina-like structure in plants. NMCP1 is a protein, first identified in Daucus carota cells, that localizes exclusively to the nuclear periphery in interphase cells. It has a tripartite structure comprised of head, rod, and tail domains, and includes putative nuclear localization signal (NLS motifs. We identified the functional NLS of DcNMCP1 (carrot NMCP1 and determined the protein regions required for localizing to the nuclear periphery using EGFP-fused constructs transiently expressed in Apium graveolens epidermal cells. Transcription was driven under a CaMV35S promoter, and the genes were introduced into the epidermal cells by a DNA-coated microprojectile delivery system. Of the NLS motifs, KRRRK and RRHK in the tail domain were highly functional for nuclear localization. Addition of the N-terminal 141 amino acids from DcNMCP1 shifted the localization of a region including these NLSs from the entire nucleus to the nuclear periphery. Using this same construct, the replacement of amino acids in RRHK or its preceding sequence, YNL, with alanine residues abolished localization to the nuclear periphery, while replacement of KRRRK did not affect localization. The sequence R/Q/HYNLRR/H, including YNL and the first part of the sequence of RRHK, is evolutionarily conserved in a subclass of NMCP1 sequences from many plant species. These results show that NMCP1 localizes to the nuclear periphery by a combined action of a sequence composed of R/Q/HYNLRR/H, NLS, and the N-terminal region including the head and a portion of the rod domain, suggesting that more than one binding site is implicated in localization of NMCP1.

  9. NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole

    2011-01-01

    of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users...... associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can...

  10. Detection of Culex flavivirus and Aedes flavivirus nucleotide sequences in mosquitoes from parks in the city of São Paulo, Brazil.

    Science.gov (United States)

    Fernandes, Licia Natal; de Paula, Marcia Bicudo; Araújo, Alessandra Bergamo; Gonçalves, Elisabeth Fernandes Bertoletti; Romano, Camila Malta; Natal, Delsio; Malafronte, Rosely dos Santos; Marrelli, Mauro Toledo; Levi, José Eduardo

    2016-05-01

    The dengue viruses are widespread in Brazil and are a major public health concern. Other flaviviruses also cause diseases in humans, although on a smaller scale. The city of São Paulo is in a highly urbanized area with few green spaces apart from its parks, which are used for recreation and where potential vertebrate hosts and mosquito vectors of pathogenic Flavivirus species can be found. Although this scenario can contribute to the transmission of Flavivirus to humans, little is known about the circulation of members of this genus in these areas. In light of this, the present study sought to identify Flavivirus infection in mosquitoes (Diptera: Culicidae) collected in parks in the city of São Paulo. Seven parks in different sectors of the city were selected. Monthly mosquito collections were carried out in each park from March 2011 to February 2012 using aspiration and traps (Shannon and CD C-CO2). Nucleic acids were extracted from the mosquitoes collected and used for reverse-transcriptase and real-time polymerase chain reactions with genus-specific primers targeting a 200-nucleotide region in the Flavivirus NS5 gene. Positive samples were sequenced, and phylogenetic analyses were performed. Culex and Aedes were the most frequent genera of Culicidae collected. Culex flavivirus (CxFV)-related and Aedes flavivirus (AEFV)- related nucleotide sequences were detected in 17 pools of Culex and two pools of Aedes mosquitoes, respectively, among the 818 pools of non-engorged females analyzed. To the best of our knowledge, this is the first report of CxFV and AEFV in the city of São Paulo and Latin America, respectively. Both viruses are insect- specific flaviviruses, a group known to replicate only in mosquito cells and induce a cytopathic effect in some situations. Hence, our data suggests that CxFV and AEFV are present in Culex and Aedes mosquitoes, respectively, in parks in the city of São Paulo. Even though Flavivirus species of medical importance were not

  11. Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci.

    Science.gov (United States)

    Li, Ying-Hui; Li, Wei; Zhang, Chen; Yang, Liang; Chang, Ru-Zhen; Gaut, Brandon S; Qiu, Li-Juan

    2010-10-01

    • The study of genetic diversity between a crop and its wild relatives may yield fundamental insights into evolutionary history and the process of domestication. • In this study, we genotyped a sample of 303 accessions of domesticated soybean (Glycine max) and its wild progenitor Glycine soja with 99 microsatellite markers and 554 single-nucleotide polymorphism (SNP) markers. • The simple sequence repeat (SSR) loci averaged 21.5 alleles per locus and overall Nei's gene diversity of 0.77. The SNPs had substantially lower genetic diversity (0.35) than SSRs. A SSR analyses indicated that G. soja exhibited higher diversity than G. max, but SNPs provided a slightly different snapshot of diversity between the two taxa. For both marker types, the primary division of genetic diversity was between the wild and domesticated accessions. Within taxa, G. max consisted of four geographic regions in China. G. soja formed six subgroups. Genealogical analyses indicated that cultivated soybean tended to form a monophyletic clade with respect to G. soja. • G. soja and G. max represent distinct germplasm pools. Limited evidence of admixture was discovered between these two species. Overall, our analyses are consistent with the origin of G. max from regions along the Yellow River of China.

  12. Next-Generation Sequencing Approaches in Genome-Wide Discovery of Single Nucleotide Polymorphism Markers Associated with Pungency and Disease Resistance in Pepper

    Directory of Open Access Journals (Sweden)

    Abinaya Manivannan

    2018-01-01

    Full Text Available Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.

  13. DIYABC v2.0: a software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data.

    Science.gov (United States)

    Cornuet, Jean-Marie; Pudlo, Pierre; Veyssier, Julien; Dehne-Garcia, Alexandre; Gautier, Mathieu; Leblois, Raphaël; Marin, Jean-Michel; Estoup, Arnaud

    2014-04-15

    DIYABC is a software package for a comprehensive analysis of population history using approximate Bayesian computation on DNA polymorphism data. Version 2.0 implements a number of new features and analytical methods. It allows (i) the analysis of single nucleotide polymorphism data at large number of loci, apart from microsatellite and DNA sequence data, (ii) efficient Bayesian model choice using linear discriminant analysis on summary statistics and (iii) the serial launching of multiple post-processing analyses. DIYABC v2.0 also includes a user-friendly graphical interface with various new options. It can be run on three operating systems: GNU/Linux, Microsoft Windows and Apple Os X. Freely available with a detailed notice document and example projects to academic users at http://www1.montpellier.inra.fr/CBGP/diyabc CONTACT: estoup@supagro.inra.fr Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Molecular Comparison and Evolutionary Analyses of VP1 Nucleotide Sequences of New African Human Enterovirus 71 Isolates Reveal a Wide Genetic Diversity

    Science.gov (United States)

    Nougairède, Antoine; Joffret, Marie-Line; Deshpande, Jagadish M.; Dubot-Pérès, Audrey; Héraud, Jean-Michel

    2014-01-01

    Most circulating strains of Human enterovirus 71 (EV-A71) have been classified primarily into three genogroups (A to C) on the basis of genetic divergence between the 1D gene, which encodes the VP1 capsid protein. The aim of the present study was to provide further insights into the diversity of the EV-A71 genogroups following the recent description of highly divergent isolates, in particular those from African countries, including Madagascar. We classified recent EV-A71 isolates by a large comparison of 3,346 VP1 nucleotidic sequences collected from GenBank. Analysis of genetic distances and phylogenetic investigations indicated that some recently-reported isolates did not fall into the genogroups A-C and clustered into three additional genogroups, including one Indian genogroup (genogroup D) and 2 African ones (E and F). Our Bayesian phylogenetic analysis provided consistent data showing that the genogroup D isolates share a recent common ancestor with the members of genogroup E, while the isolates of genogroup F evolved from a recent common ancestor shared with the members of the genogroup B. Our results reveal the wide diversity that exists among EV-A71 isolates and suggest that the number of circulating genogroups is probably underestimated, particularly in developing countries where EV-A71 epidemiology has been poorly studied. PMID:24598878

  15. Domain structures and molecular evolution of class I and class II major histocompatibility gene complex (MHC) products deduced from amino acid and nucleotide sequence homologies

    Science.gov (United States)

    Ohnishi, Koji

    1984-12-01

    Domain structures of class I and class II MHC products were analyzed from a viewpoint of amino acid and nucleotide sequence homologies. Alignment statistics revealed that class I (transplantation) antigen H chains consist of four mutually homologous domains, and that class II (HLA-DR) antigen β and α chains are both composed of three mutually homologous ones. The N-terminal three and two domains of class I and class II (both β and α) gene products, respectively, all of which being ˜90 residues long, were concluded to be homologous to β2-microglobulin (β2M). The membraneembedded C-terminal shorter domains of these MHC products were also found to be homologous to one another and to the third domain of class I H chains. Class I H chains were found to be more closely related to class II α chains than to class II β chains. Based on these findings, an exon duplication history from a common ancestral gene encoding a β2M-like primodial protein of one-domain-length up to the contemporary MHC products was proposed.

  16. Complete nucleotide sequence of a plasmid containing the botulinum neurotoxin gene in Clostridium botulinum type B strain 111 isolated from an infant patient in Japan.

    Science.gov (United States)

    Hosomi, Koji; Sakaguchi, Yoshihiko; Kohda, Tomoko; Gotoh, Kazuyoshi; Motooka, Daisuke; Nakamura, Shota; Umeda, Kaoru; Iida, Tetsuya; Kozaki, Shunji; Mukamoto, Masafumi

    2014-12-01

    Botulinum neurotoxins (BoNTs) are highly potent toxins that are produced by Clostridium botulinum. We determined the complete nucleotide sequence of a plasmid containing the botulinum neurotoxin gene in C. botulinum type B strain 111 in order to obtain an insight into the toxigenicity and evolution of the bont gene in C. botulinum. Group I C. botulinum type B strain 111 was isolated from the first case of infant botulism in Japan in 1995. In previous studies, botulinum neurotoxin subtype B2 (BoNT/B2) produced by strain 111 exhibited different antigenic properties from those of authentic BoNT/B1 produced by strain Okra. We have recently shown that the isolates of strain 111 that lost toxigenicity were cured of the plasmid containing the bont/B2 gene. In the present study, the plasmid (named pCB111) was circular 265,575 bp double-stranded DNA and contained 332 predicted open reading frames (ORFs). 85 gene products of these ORFs could be functionally assigned on the basis of sequence homology to known proteins. The bont/B2 complex genes were located on pCB111 and some gene products may be involved in the conjugative plasmid transfer and horizontal transfer of bont genes. pCB111 was similar to previously identified plasmids containing bont/B1, /B5, or/A3 complex genes in other group I C. botulinum strains. It was suggested that these plasmids had been derived from a common ancestor and had played important roles for the bont gene transfer between C. botulinum.

  17. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun

    2015-06-11

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  18. The kissing-loop motif is a preferred site of 5' leader recombination during replication of SL3-3 murine leukemia viruses in mice

    DEFF Research Database (Denmark)

    Lund, Anders Henrik; Mikkelsen, J G; Schmidt, J

    1999-01-01

    , and the upstream part of the 5' untranslated region, enabled us to map recombination sites, guided by distinct scattered nucleotide differences. In 30 of 44 analyzed sequences, recombination was mapped to a 33-nucleotide similarity window coinciding with the kissing-loop stem-loop motif implicated in dimerization...... of the diploid genome. Interestingly, the recombination pattern preference found in replication-competent viruses from T-cell tumors is very similar to the pattern previously reported for retroviral vectors in cell culture experiments. The data therefore sustain the hypothesis that the kissing loop, presumably...

  19. Nucleotide sequence of atpB, rbcL, trnR, dedB and psaI chloroplast genes from a fern Angiopteris lygodiifolia: a possible emergence of Spermatophyta lineage before the separation of Bryophyta and Pteridophyta.

    Science.gov (United States)

    Yoshinaga, K; Kubota, Y; Ishii, T; Wada, K

    1992-01-01

    To elucidate the evolutionary relationship between the Spermatophyta, Pteridophyta and Bryophyta, we cloned a fragment of chloroplast DNA from the fern Angiopteris lygodiifolia (Pteridophyta) and determined its nucleotide sequence. The fragment contained the atpB, rbcL, trnR-CCG, dedB and psaI genes. Comparisons of the deduced amino acid and nucleotide sequences of these genes from the three plant groups indicate that Angiopteris sequences are more closely related to those of Bryophyta species (85% identity on average) than to those of seed plants (76% identity on average), supporting a hypothesis that the Bryophyta and Pteridophyta diverged more recently from one another than their common progenitor diverged from that of the Spermatophyta.

  20. Clinical and molecular characterization of a cohort of patients with novel nucleotide alterations of the Dystrophin gene detected by direct sequencing

    Directory of Open Access Journals (Sweden)

    Corti Stefania

    2011-03-01

    Full Text Available Abstract Background Duchenne and Becker Muscular dystrophies (DMD/BMD are allelic disorders caused by mutations in the dystrophin gene, which encodes a sarcolemmal protein responsible for muscle integrity. Deletions and duplications account for approximately 75% of mutations in DMD and 85% in BMD. The implementation of techniques allowing complete gene sequencing has focused attention on small point mutations and other mechanisms underlying complex rearrangements. Methods We selected 47 patients (41 families; 35 DMD, 6 BMD without deletions and duplications in DMD gene (excluded by multiplex ligation-dependent probe amplification and multiplex polymerase chain reaction analysis. This cohort was investigated by systematic direct sequence analysis to study sequence variation. We focused our attention on rare mutational events which were further studied through transcript analysis. Results We identified 40 different nucleotide alterations in DMD gene and their clinical correlates; altogether, 16 mutations were novel. DMD probands carried 9 microinsertions/microdeletions, 19 nonsense mutations, and 7 splice-site mutations. BMD patients carried 2 nonsense mutations, 2 splice-site mutations, 1 missense substitution, and 1 single base insertion. The most frequent stop codon was TGA (n = 10 patients, followed by TAG (n = 7 and TAA (n = 4. We also analyzed the molecular mechanisms of five rare mutational events. They are two frame-shifting mutations in the DMD gene 3'end in BMD and three novel splicing defects: IVS42: c.6118-3C>A, which causes a leaky splice-site; c.9560A>G, which determines a cryptic splice-site activation and c.9564-426 T>G, which creates pseudoexon retention within IVS65. Conclusion The analysis of our patients' sample, carrying point mutations or complex rearrangements in DMD gene, contributes to the knowledge on phenotypic correlations in dystrophinopatic patients and can provide a better understanding of pre-mRNA maturation defects

  1. Whole Genome and Core Genome Multilocus Sequence Typing and Single Nucleotide Polymorphism Analyses of Listeria monocytogenes Isolates Associated with an Outbreak Linked to Cheese, United States, 2013

    Science.gov (United States)

    Luo, Yan; Carleton, Heather; Timme, Ruth; Melka, David; Muruvanda, Tim; Wang, Charles; Kastanis, George; Katz, Lee S.; Turner, Lauren; Fritzinger, Angela; Moore, Terence; Stones, Robert; Blankenship, Joseph; Salter, Monique; Parish, Mickey; Hammack, Thomas S.; Evans, Peter S.; Tarr, Cheryl L.; Allard, Marc W.; Strain, Errol A.; Brown, Eric W.

    2017-01-01

    ABSTRACT Epidemiological findings of a listeriosis outbreak in 2013 implicated Hispanic-style cheese produced by company A, and pulsed-field gel electrophoresis (PFGE) and whole genome sequencing (WGS) were performed on clinical isolates and representative isolates collected from company A cheese and environmental samples during the investigation. The results strengthened the evidence for cheese as the vehicle. Surveillance sampling and WGS 3 months later revealed that the equipment purchased by company B from company A yielded an environmental isolate highly similar to all outbreak isolates. The whole genome and core genome multilocus sequence typing and single nucleotide polymorphism (SNP) analyses results were compared to demonstrate the maximum discriminatory power obtained by using multiple analyses, which were needed to differentiate outbreak-associated isolates from a PFGE-indistinguishable isolate collected in a nonimplicated food source in 2012. This unrelated isolate differed from the outbreak isolates by only 7 to 14 SNPs, and as a result, the minimum spanning tree from the whole genome analyses and certain variant calling approach and phylogenetic algorithm for core genome-based analyses could not provide differentiation between unrelated isolates. Our data also suggest that SNP/allele counts should always be combined with WGS clustering analysis generated by phylogenetically meaningful algorithms on a sufficient number of isolates, and the SNP/allele threshold alone does not provide sufficient evidence to delineate an outbreak. The putative prophages were conserved across all the outbreak isolates. All outbreak isolates belonged to clonal complex 5 and serotype 1/2b and had an identical inlA sequence which did not have premature stop codons. IMPORTANCE In this outbreak, multiple analytical approaches were used for maximum discriminatory power. A PFGE-matched, epidemiologically unrelated isolate had high genetic similarity to the outbreak

  2. Comparing Enterovirus 71 with Coxsackievirus A16 by analyzing nucleotide sequences and antigenicity of recombinant proteins of VP1s and VP4s

    Directory of Open Access Journals (Sweden)

    Sun Yu

    2011-11-01

    Full Text Available Abstract Background Enterovirus 71 (EV71 and Coxsackievirus A16 (CA16 are two major etiological agents of Hand, Foot and Mouth Disease (HFMD. EV71 is associated with severe cases but not CA16. The mechanisms contributed to the different pathogenesis of these two viruses are unknown. VP1 and VP4 are two major structural proteins of these viruses, and should be paid close attention to. Results The sequences of vp1s from 14 EV71 and 14 CA16, and vp4s from 10 EV71 and 1 CA16 isolated in this study during 2007 to 2009 HFMD seasons were analyzed together with the corresponding sequences available in GenBank using DNAStar and MEGA 4.0. Phylogenetic analysis of complete vp1s or vp4s showed that EV71 isolated in Beijing belonged to C4 and CA16 belonged to lineage B2 (lineage C. VP1s and VP4s from 4 strains of viruses expressed in E. coli BL21 cells were used to detect IgM and IgG in human sera by Western Blot. The detection of IgM against VP1s of EV71 and CA16 showed consistent results with current infection, while none of the sera were positive against VP4s of EV71 and CA16. There was significant difference in the positive rates between EV71 VP1 and CA16 VP1 (χ2 = 5.02, P 2 = 15.30, P 2 = 26.47, P 2 = 16.78, P Conclusions EV71 and CA16 were highly diverse in the nucleotide sequences of vp1s and vp4s. The sera positive rates of VP1 and VP4 of EV71 were lower than those of CA16 respectively, which suggested a less exposure rate to EV71 than CA16 in Beijing population. Human serum antibodies detected by Western blot using VP1s and VP4s as antigen indicated that the immunological reaction to VP1 and VP4 of both EV71 and CA16 was different.

  3. Nucleotide sequencing and analysis of 16S rDNA and 16S-23S rDNA internal spacer region (ISR) of Taylorella equigenitalis, as an important pathogen for contagious equine metritis (CEM).

    Science.gov (United States)

    Kagawa, S; Nagano, Y; Tazumi, A; Murayama, O; Millar, B C; Moore, J E; Matsuda, M

    2006-05-01

    The primer set for 16S rDNA amplified an amplicon of about 1500 bp in length for three strains of Taylorella equigenitalis (NCTC11184(T), Kentucky188 and EQ59). Sequence differences of the 16S rDNA among the six sequences, including three reference sequences, occurred at only a few nucleotide positions and thus, an extremely high sequence similarity of the 16S rDNA was first demonstrated among the six sequences. In addition, the primer set for 16S-23S rDNA internal spacer region (ISR) amplified two amplicons about 1300 bp and 1200 bp in length for the three strains. The ISRs were estimated to be about 920 bp in length for large ISR-A and about 830 bp for small ISR-B. Sequence alignment of the ISR-A and ISR-B demonstrated about 10 base differences between NCTC11184(T) and EQ59 and between Kentucky188 and EQ59. However, only minor sequence differences were demonstrated between the ISR-A and ISR-B from NCTC11184(T) and Kentucky188, respectively. A typical order of the intercistronic tRNAs with the 29 nucleotide spacer of 5'-16S rDNA-tRNA(Ile)-tRNA(Ala)-23S rDNA-3' was demonstrated in the all ISRs. The ISRs may be useful for the discrimination amongst isolates of T. equigenitalis if sequencing is employed.

  4. Nucleotide patterns aiding in prediction of eukaryotic promoters

    Science.gov (United States)

    Triska, Martin; Solovyev, Victor; Baranova, Ancha; Kel, Alexander

    2017-01-01

    Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots. PMID:29141011

  5. Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L.

    Science.gov (United States)

    Allegre, Mathilde; Argout, Xavier; Boccara, Michel; Fouet, Olivier; Roguet, Yolande; Bérard, Aurélie; Thévenin, Jean Marc; Chauveau, Aurélie; Rivallan, Ronan; Clement, Didier; Courtois, Brigitte; Gramacho, Karina; Boland-Augé, Anne; Tahi, Mathias; Umaharan, Pathmanathan; Brunel, Dominique; Lanaud, Claire

    2012-01-01

    Theobroma cacao is an economically important tree of several tropical countries. Its genetic improvement is essential to provide protection against major diseases and improve chocolate quality. We discovered and mapped new expressed sequence tag-single nucleotide polymorphism (EST-SNP) and simple sequence repeat (SSR) markers and constructed a high-density genetic map. By screening 149 650 ESTs, 5246 SNPs were detected in silico, of which 1536 corresponded to genes with a putative function, while 851 had a clear polymorphic pattern across a collection of genetic resources. In addition, 409 new SSR markers were detected on the Criollo genome. Lastly, 681 new EST-SNPs and 163 new SSRs were added to the pre-existing 418 co-dominant markers to construct a large consensus genetic map. This high-density map and the set of new genetic markers identified in this study are a milestone in cocoa genomics and for marker-assisted breeding. The data are available at http://tropgenedb.cirad.fr. PMID:22210604

  6. Evolutionary and structural perspectives of plant cyclic nucleotide-gated cation channels

    KAUST Repository

    Zelman, Alice K.

    2012-05-29

    Ligand-gated cation channels are a frequent component of signaling cascades in eukaryotes. Eukaryotes contain numerous diverse gene families encoding ion channels, some of which are shared and some of which are unique to particular kingdoms. Among the many different types are cyclic nucleotide-gated channels (CNGCs). CNGCs are cation channels with varying degrees of ion conduction selectivity. They are implicated in numerous signaling pathways and permit diffusion of divalent and monovalent cations, including Ca2+ and K+. CNGCs are present in both plant and animal cells, typically in the plasma membrane; recent studies have also documented their presence in prokaryotes. All eukaryote CNGC polypeptides have a cyclic nucleotide-binding domain and a calmodulin binding domain as well as a six transmembrane/one pore tertiary structure. This review summarizes existing knowledge about the functional domains present in these cation-conducting channels, and considers the evidence indicating that plant and animal CNGCs evolved separately. Additionally, an amino acid motif that is only found in the phosphate binding cassette and hinge regions of plant CNGCs, and is present in all experimentally confirmed CNGCs but no other channels was identified. This CNGC-specific amino acid motif provides an additional diagnostic tool to identify plant CNGCs, and can increase confidence in the annotation of open reading frames in newly sequenced genomes as putative CNGCs. Conversely, the absence of the motif in some plant sequences currently identified as probable CNGCs may suggest that they are misannotated or protein fragments. 2012 Zelman, Dawe, Gehring and Berkowitz.

  7. n-Nucleotide circular codes in graph theory.

    Science.gov (United States)

    Fimmel, Elena; Michel, Christian J; Strüngmann, Lutz

    2016-03-13

    The circular code theory proposes that genes are constituted of two trinucleotide codes: the classical genetic code with 61 trinucleotides for coding the 20 amino acids (except the three stop codons {TAA,TAG,TGA}) and a circular code based on 20 trinucleotides for retrieving, maintaining and synchronizing the reading frame. It relies on two main results: the identification of a maximal C(3) self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses (Michel 2015 J. Theor. Biol. 380, 156-177. (doi:10.1016/j.jtbi.2015.04.009); Arquès & Michel 1996 J. Theor. Biol. 182, 45-58. (doi:10.1006/jtbi.1996.0142)) and the finding of X circular code motifs in tRNAs and rRNAs, in particular in the ribosome decoding centre (Michel 2012 Comput. Biol. Chem. 37, 24-37. (doi:10.1016/j.compbiolchem.2011.10.002); El Soufi & Michel 2014 Comput. Biol. Chem. 52, 9-17. (doi:10.1016/j.compbiolchem.2014.08.001)). The univerally conserved nucleotides A1492 and A1493 and the conserved nucleotide G530 are included in X circular code motifs. Recently, dinucleotide circular codes were also investigated (Michel & Pirillo 2013 ISRN Biomath. 2013, 538631. (doi:10.1155/2013/538631); Fimmel et al. 2015 J. Theor. Biol. 386, 159-165. (doi:10.1016/j.jtbi.2015.08.034)). As the genetic motifs of different lengths are ubiquitous in genes and genomes, we introduce a new approach based on graph theory to study in full generality n-nucleotide circular codes X, i.e. of length 2 (dinucleotide), 3 (trinucleotide), 4 (tetranucleotide), etc. Indeed, we prove that an n-nucleotide code X is circular if and only if the corresponding graph [Formula: see text] is acyclic. Moreover, the maximal length of a path in [Formula: see text] corresponds to the window of nucleotides in a sequence for detecting the correct reading frame. Finally, the graph theory of tournaments is applied to the study of dinucleotide circular codes. It has full equivalence between the combinatorics

  8. DNA mutation motifs in the genes associated with inherited diseases.

    Directory of Open Access Journals (Sweden)

    Michal Růžička

    Full Text Available Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected by chance are termed coldspots. Mutation hotspots are usually derived from a mutation spectrum, which reflects particular population where an effect of a common ancestor plays a role. To detect coldspots/hotspots unaffected by population bias, we analysed the presence of germline mutations obtained from HGMD database in the 5-nucleotide segments repeatedly occurring in genes associated with common inherited disorders, in particular, the PAH, LDLR, CFTR, F8, and F9 genes. Statistically significant sequences (mutational motifs rarely associated with mutations (coldspots and frequently associated with mutations (hotspots exhibited characteristic sequence patterns, e.g. coldspots contained purine tract while hotspots showed alternating purine-pyrimidine bases, often with the presence of CpG dinucleotide. Using molecular dynamics simulations and free energy calculations, we analysed the global bending properties of two selected coldspots and two hotspots with a G/T mismatch. We observed that the coldspots were inherently more flexible than the hotspots. We assume that this property might be critical for effective mismatch repair as DNA with a mutation recognized by MutSα protein is noticeably bent.

  9. DNA mutation motifs in the genes associated with inherited diseases.

    Science.gov (United States)

    Růžička, Michal; Kulhánek, Petr; Radová, Lenka; Čechová, Andrea; Špačková, Naďa; Fajkusová, Lenka; Réblová, Kamila

    2017-01-01

    Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected by chance are termed coldspots. Mutation hotspots are usually derived from a mutation spectrum, which reflects particular population where an effect of a common ancestor plays a role. To detect coldspots/hotspots unaffected by population bias, we analysed the presence of germline mutations obtained from HGMD database in the 5-nucleotide segments repeatedly occurring in genes associated with common inherited disorders, in particular, the PAH, LDLR, CFTR, F8, and F9 genes. Statistically significant sequences (mutational motifs) rarely associated with mutations (coldspots) and frequently associated with mutations (hotspots) exhibited characteristic sequence patterns, e.g. coldspots contained purine tract while hotspots showed alternating purine-pyrimidine bases, often with the presence of CpG dinucleotide. Using molecular dynamics simulations and free energy calculations, we analysed the global bending properties of two selected coldspots and two hotspots with a G/T mismatch. We observed that the coldspots were inherently more flexible than the hotspots. We assume that this property might be critical for effective mismatch repair as DNA with a mutation recognized by MutSα protein is noticeably bent.

  10. The genome of the THE I human transposable repetitive elements is composed of a basic motif homologous to an ancestral immunoglobulin gene sequence.

    OpenAIRE

    Hakim, I; Amariglio, N; Grossman, Z; Simoni-Brok, F; Ohno, S; Rechavi, G

    1994-01-01

    Amplification of rearranged human immunoglobulin heavy-chain genes using the polymerase chain reaction resulted unexpectedly in the amplification of human transposable repetitive element genomes. These were identified as members of the THE I (transposon-like human element I) transposable element family. Analysis of the THE I sequences revealed the presence of several copies of the ancestral building block described > 10 years ago by Ohno and coworkers as the primordial immunoglobulin sequence...

  11. Isolation, characterization, and nucleotide sequence of the Streptococcus mutans mannitol-phosphate dehydrogenase gene and the mannitol-specific factor III gene of the phosphoenolpyruvate phosphotransferase system.

    Science.gov (United States)

    Honeyman, A L; Curtiss, R

    1992-08-01

    Streptococcus mutans, the causative agent of dental caries, utilizes carbohydrates by means of the phosphoenolpyruvate-dependent phosphotransferase system (PTS). The PTS facilitates vectorial translocation of metabolizable carbohydrates to form the corresponding sugar-phosphates, which are subsequently converted to glycolytic intermediates. The PTS consists of both sugar-specific and sugar-independent components. Complementation of an Escherichia coli mtlD mutation with a streptococcal recombinant DNA library allowed isolation of the mannitol-1-phosphate dehydrogenase gene (mtlD) and the adjacent sugar-specific mannitol factor III gene (mtlF) from S. mutans. Subsequent transposon mutagenesis of the complementing DNA fragment with Tn5seq1 defined the region that encodes the mtlD-complementing activity, the streptococcal mtlD gene. Nucleotide sequence analysis of this region revealed two complete open reading frames (ORFs) from within the streptococcal mannitol PTS operon. One ORF encodes the mtlD gene product, a 43.0-kDa protein which exhibits similarity to the E. coli and Enterococcus faecalis mannitol-1-phosphate dehydrogenases. The second ORF encodes a 15.8-kDa protein which exhibits similarity to mannitol factor III proteins from several bacterial species. In vitro transcription-translation assays were used to produce proteins of the sizes predicted by the streptococcal ORFs. These data indicate that the S. mutans mannitol PTS utilizes an enzyme II-factor III complex similar to the mannitol system found in other gram-positive organisms, as opposed to that of E. coli, which utilizes an independent enzyme II system.

  12. Complete nucleotide sequence and genome structure of a Japanese isolate of hibiscus latent Fort Pierce virus, a unique tobamovirus that contains an internal poly(A) region in its 3' end.

    Science.gov (United States)

    Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou

    2014-11-01

    In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.

  13. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  14. Recoding method that removes inhibitory sequences and improves HIV gene expression

    Energy Technology Data Exchange (ETDEWEB)

    Rabadan, Raul; Krasnitz, Michael; Robins, Harlan; Witten, Daniela; Levine, Arnold

    2016-08-23

    The invention relates to inhibitory nucleotide signal sequences or "INS" sequences in the genomes of lentiviruses. In particular the invention relates to the AGG motif present in all viral genomes. The AGG motif may have an inhibitory effect on a virus, for example by reducing the levels of, or maintaining low steady-state levels of, viral RNAs in host cells, and inducing and/or maintaining in viral latency. In one aspect, the invention provides vaccines that contain, or are produced from, viral nucleic acids in which the AGG sequences have been mutated. In another aspect, the invention provides methods and compositions for affecting the function of the AGG motif, and methods for identifying other INS sequences in viral genomes.

  15. The PurR regulon in Lactococcus lactis – transcriptional regulation of the purine nucleotide metabolism and translational machinery

    DEFF Research Database (Denmark)

    Jendresen, Christian Bille; Martinussen, Jan; Kilstrup, Mogens

    2012-01-01

    to a conserved PurBox motif present on the DNA at a fixed distance from the promoter -10 element. PurR contains a PRPP-binding site, and activation occurs when the intracellular PRPP pool is high as a consequence of low exogenous purine nucleotide pools. By an iterative approach of bioinformatics searches...... and motif optimization, 21 PurR-regulated genes were identified and used in a redefinition of the PurBox consensus sequence. In the process a new motif, the double-PurBox, which is present in a number of promoters and contains two partly overlapping PurBox motifs, was established. Transcriptional fusions...... were used to analyse wild-type promoters and promoters with inactivating PurBox mutations to confirm the relevance of the PurBox motifs as PurR-binding sites. The promoters of several operons were shown to be devoid of any -35 sequence, and found to be completely dependent on PurR-mediated activation...

  16. DNA sequence polymorphisms within the bovine guanine nucleotide-binding protein Gs subunit alpha (Gsα-encoding (GNAS genomic imprinting domain are associated with performance traits

    Directory of Open Access Journals (Sweden)

    Mullen Michael P

    2011-01-01

    Full Text Available Abstract Background Genes which are epigenetically regulated via genomic imprinting can be potential targets for artificial selection during animal breeding. Indeed, imprinted loci have been shown to underlie some important quantitative traits in domestic mammals, most notably muscle mass and fat deposition. In this candidate gene study, we have identified novel associations between six validated single nucleotide polymorphisms (SNPs spanning a 97.6 kb region within the bovine guanine nucleotide-binding protein Gs subunit alpha gene (GNAS domain on bovine chromosome 13 and genetic merit for a range of performance traits in 848 progeny-tested Holstein-Friesian sires. The mammalian GNAS domain consists of a number of reciprocally-imprinted, alternatively-spliced genes which can play a major role in growth, development and disease in mice and humans. Based on the current annotation of the bovine GNAS domain, four of the SNPs analysed (rs43101491, rs43101493, rs43101485 and rs43101486 were located upstream of the GNAS gene, while one SNP (rs41694646 was located in the second intron of the GNAS gene. The final SNP (rs41694656 was located in the first exon of transcripts encoding the putative bovine neuroendocrine-specific protein NESP55, resulting in an aspartic acid-to-asparagine amino acid substitution at amino acid position 192. Results SNP genotype-phenotype association analyses indicate that the single intronic GNAS SNP (rs41694646 is associated (P ≤ 0.05 with a range of performance traits including milk yield, milk protein yield, the content of fat and protein in milk, culled cow carcass weight and progeny carcass conformation, measures of animal body size, direct calving difficulty (i.e. difficulty in calving due to the size of the calf and gestation length. Association (P ≤ 0.01 with direct calving difficulty (i.e. due to calf size and maternal calving difficulty (i.e. due to the maternal pelvic width size was also observed at the rs

  17. DNA sequence polymorphisms within the bovine guanine nucleotide-binding protein Gs subunit alpha (Gsα)-encoding (GNAS) genomic imprinting domain are associated with performance traits

    Science.gov (United States)

    2011-01-01

    Background Genes which are epigenetically regulated via genomic imprinting can be potential targets for artificial selection during animal breeding. Indeed, imprinted loci have been shown to underlie some important quantitative traits in domestic mammals, most notably muscle mass and fat deposition. In this candidate gene study, we have identified novel associations between six validated single nucleotide polymorphisms (SNPs) spanning a 97.6 kb region within the bovine guanine nucleotide-binding protein Gs subunit alpha gene (GNAS) domain on bovine chromosome 13 and genetic merit for a range of performance traits in 848 progeny-tested Holstein-Friesian sires. The mammalian GNAS domain consists of a number of reciprocally-imprinted, alternatively-spliced genes which can play a major role in growth, development and disease in mice and humans. Based on the current annotation of the bovine GNAS domain, four of the SNPs analysed (rs43101491, rs43101493, rs43101485 and rs43101486) were located upstream of the GNAS gene, while one SNP (rs41694646) was located in the second intron of the GNAS gene. The final SNP (rs41694656) was located in the first exon of transcripts encoding the putative bovine neuroendocrine-specific protein NESP55, resulting in an aspartic acid-to-asparagine amino acid substitution at amino acid position 192. Results SNP genotype-phenotype association analyses indicate that the single intronic GNAS SNP (rs41694646) is associated (P ≤ 0.05) with a range of performance traits including milk yield, milk protein yield, the content of fat and protein in milk, culled cow carcass weight and progeny carcass conformation, measures of animal body size, direct calving difficulty (i.e. difficulty in calving due to the size of the calf) and gestation length. Association (P ≤ 0.01) with direct calving difficulty (i.e. due to calf size) and maternal calving difficulty (i.e. due to the maternal pelvic width size) was also observed at the rs43101491 SNP. Following

  18. Nucleotide sequences within the U5 region of the viral RNA genome are the major determinants for an human immunodeficiency virus type 1 to maintain a primer binding site complementary to tRNA(His).

    Science.gov (United States)

    Zhang, Z; Kang, S M; LeBlanc, A; Hajduk, S L; Morrow, C D

    1996-12-15

    The initiation of reverse transcription of the human immunodeficiency virus type 1 (HIV-1) genome requires cellular tRNA(Lys,3) as a primer and occurs at a site in the viral RNA genome, designated as the primer binding site (PBS), which is complementary to the 3'-terminal 18 nucleotides of tRNA(Lys,3). We previously described an HIV-1 virus [designated as HXB2(His-AC)], which contained a sequence within the U5 region complementary to the anticodon region of tRNA(His) in addition to a PBS complementary to the 3'-terminal 18 nucleotides of the tRNA(His). That virus maintained a PBS complementary to tRNA(His) after extended in vitro culture (Wakefield et al., J. Virol. 70, 966-975, 1996). In the present study, we report that subcloning a 200-base-pair DNA fragment encompassing the U5 and PBS regions from an integrated provirus of HXB2(His-AC) back into the wild-type genome (pHXB2) resulted in an infectious virus, designated as HXB2(His-AC-gac), which again stably maintained a PBS complementary to tRNA(His). DNA sequence analysis of the 200-base-pair region revealed only three nucleotide changes from HXB2(His-AC): a T-to-G change at nucleotide 174, a G-to-A change at nucleotide 181, and a T-to-C change at nucleotide 200. The new mutant virus replicated in CD4+ Sup T1 cells similarly to the wild-type virus. Comparison of the nucleotide sequence of nucleocapsid gene of the wild-type and HXB2 (His-AC-gac) virus revealed no differences. Although we found numerous mutations in the reverse transcriptase gene in proviral clones derived from HXB2 (His-AC-gac), no common mutations were found among the 13 clones examined. Comparison of the virion-associated tRNAs of HXB2(His-AC-gac) with those of the wild type revealed that both viruses incorporated a similar subset of cellular tRNAs, with tRNA(Lys,3) being the predominant tRNA found within virions. There was no selective enrichment for tRNA(His) within virions of HXB2(His-AC-gac) virus which selectively use tRNA(His) to

  19. Import of desired nucleic acid sequences using addressing motif of mitochondrial ribosomal 5S-rRNA for fluorescent in vivo hybridization of mitochondrial DNA and RNA.

    Science.gov (United States)

    Zelenka, Jaroslav; Alán, Lukáš; Jabůrek, Martin; Ježek, Petr

    2014-04-01

    Based on the matrix-addressing sequence of mitochondrial ribosomal 5S-rRNA (termed MAM), which is naturally imported into mitochondria, we have constructed an import system for in vivo targeting of mitochondrial DNA (mtDNA) or mt-mRNA, in order to provide fluorescence hybridization of the desired sequences. Thus DNA oligonucleotides were constructed, containing the 5'-flanked T7 RNA polymerase promoter. After in vitro transcription and fluorescent labeling with Alexa Fluor(®) 488 or 647 dye, we obtained the fluorescent "L-ND5 probe" containing MAM and exemplar cargo, i.e., annealing sequence to a short portion of ND5 mRNA and to the light-strand mtDNA complementary to the heavy strand nd5 mt gene (5'-end 21 base pair sequence). For mitochondrial in vivo fluorescent hybridization, HepG2 cells were treated with dequalinium micelles, containing the fluorescent probes, bringing the probes proximally to the mitochondrial outer membrane and to the natural import system. A verification of import into the mitochondrial matrix of cultured HepG2 cells was provided by confocal microscopy colocalizations. Transfections using lipofectamine or probes without 5S-rRNA addressing MAM sequence or with MAM only were ineffective. Alternatively, the same DNA oligonucleotides with 5'-CACC overhang (substituting T7 promoter) were transcribed from the tetracycline-inducible pENTRH1/TO vector in human embryonic kidney T-REx®-293 cells, while mitochondrial matrix localization after import of the resulting unlabeled RNA was detected by PCR. The MAM-containing probe was then enriched by three-order of magnitude over the natural ND5 mRNA in the mitochondrial matrix. In conclusion, we present a proof-of-principle for mitochondrial in vivo hybridization and mitochondrial nucleic acid import.

  20. BayesMD: flexible biological modeling for motif discovery

    DEFF Research Database (Denmark)

    Tang, Man-Hung Eric; Krogh, Anders; Winther, Ole

    2008-01-01

    We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on trans......We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained...

  1. A natural grouping of motifs with an aspartate or asparagine residue forming two hydrogen bonds to residues ahead in sequence: their occurrence at alpha-helical N termini and in other situations.

    Science.gov (United States)

    Wan, W Y; Milner-White, E J

    1999-03-12

    Examination of the ways side-chain carboxylate and amide groups in high-resolution protein crystal structures form hydrogen bonds with main-chain atoms reveals that the most common category is a two-hydrogen-bond four to five residue motif with an aspartate or asparagine (Asx) at the first residue, for which we propose the name Asx-motif. Similar motifs with glutamate or glutamine residues at that position are rare. Asx-motifs occur typically as (1) a common feature of the N termini of alpha-helices called the Asx N-cap motif; (2) an independent motif, usually a beta-turn with an appropriately hydrogen-bonded Asx as the first residue; and (3) a motif incorporated in a beta-bulge loop. Asx-motifs are common, there being just under two-and-a-half in an average-sized protein subunit; of these, about 55 % are Asx N-cap motifs. Because they occur often in many situations, it seems that these motifs have an inherent propensity to form on their own rather than just being a feature stabilised at the end of a helix. Asx-motifs also occur in functionally interesting situations in aspartyl proteases, citrate synthase, EF hands, haemoglobins, lipocalins, glutathione reductase and the alpha/beta hydrolases. Copyright 1999 Academic Press.

  2. Modeling Small Noncanonical RNA Motifs with the Rosetta FARFAR Server.

    Science.gov (United States)

    Yesselman, Joseph D; Das, Rhiju

    2016-01-01

    Noncanonical RNA motifs help define the vast complexity of RNA structure and function, and in many cases, these loops and junctions are on the order of only ten nucleotides in size. Unfortunately, despite their small size, there is no reliable method to determine the ensemble of lowest energy structures of junctions and loops at atomic accuracy. This chapter outlines straightforward protocols using a webserver for Rosetta Fragment Assembly of RNA with Full Atom Refinement (FARFAR) ( http://rosie.rosettacommons.org/rna_denovo/submit ) to model the 3D structure of small noncanonical RNA motifs for use in visualizing motifs and for further refinement or filtering with experimental data such as NMR chemical shifts.

  3. Determination of the Nucleotide Sequences of Heat Shock Operon groESL and the Citrate Synthase Gene (gltA) of Anaplasma (Ehrlichia) platys for Phylogenetic and Diagnostic Studies

    Science.gov (United States)

    Inokuma, Hisashi; Fujii, Kaori; Okuda, Masaru; Onishi, Takafumi; Beaufils, Jean-Pierre; Raoult, Didier; Brouqui, Philippe

    2002-01-01

    The 1,670-bp nucleotide sequence of the heat shock operon groESL and the 1,236-bp sequence of the citrate synthase gene (gltA) of Anaplasma (Ehrlichia) platys were determined. The topology of the groEL- and gltA-based phylogenetic tree was similar to that derived from 16S rRNA gene analyses with distances. Both groESL- and gltA-based PCRs specific to A. platys were also developed based upon the alignment data. PMID:12204973

  4. Immunohistochemical staining patterns of p53 can serve as a surrogate marker for TP53 mutations in ovarian carcinoma: an immunohistochemical and nucleotide sequencing analysis.

    Science.gov (United States)

    Yemelyanova, Anna; Vang, Russell; Kshirsagar, Malti; Lu, Dan; Marks, Morgan A; Shih, Ie Ming; Kurman, Robert J

    2011-09-01

    Immunohistochemical staining for p53 is used as a surrogate for mutational analysis in the diagnostic workup of carcinomas of multiple sites including ovarian cancers. Strong and diffuse immunoexpression of p53 is generally interpreted as likely indicating a TP53 gene mutation. The immunoprofile that correlates with wild-type TP53, however, is not as clear. In particular, the significance of completely negative immunostaining is controversial. The aim of this study was to clarify the relationship of the immunohistochemical expression of p53 with the mutational status of the TP53 gene in ovarian cancer. A total of 57 ovarian carcinomas (43 high-grade serous ovarian/peritoneal carcinomas, 2 malignant mesodermal mixed tumors (carcinosarcomas), 2 low-grade serous carcinomas, 4 clear cell carcinomas, 1 well-differentiated endometrioid carcinoma, and 5 carcinomas with mixed epithelial differentiation) were analyzed for TP53 mutations by nucleotide sequencing (exons 4-9), and subjected to immunohistochemical analysis of p53 expression. Thirty six tumors contained functional mutations and 13 had wild type TP53. Five tumors were found to harbor known TP53 polymorphism and changes in the intron region were detected in three. Tumors with wild-type TP53 displayed a wide range of immunolabeling patterns, with the most common pattern showing ≤10% of positive cells in 6 cases (46%). Mutant TP53 was associated with 60-100% positive cells in 23 cases (64% of cases). This pattern of staining was also seen in three cases with wild-type TP53. Tumors that were completely negative (0% cells staining) had a mutation of TP53 in 65% of cases and wild-type TP53 in 11%. Combining two immunohistochemical labeling patterns associated with TP53 mutations (0% and 60-100% positive cells), correctly identified a mutation in 94% of cases (Povarian carcinomas. In addition to a strong and diffuse pattern of p53 expression (in greater than 60% of cells), complete absence of p53 immunoexpression is

  5. An Affinity Propagation-Based DNA Motif Discovery Algorithm

    Directory of Open Access Journals (Sweden)

    Chunxiao Sun

    2015-01-01

    Full Text Available The planted (l,d motif search (PMS is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

  6. Mapping the structure of folding cores in TIM barrel proteins by hydrogen exchange mass spectrometry: the roles of motif and sequence for the indole-3-glycerol phosphate synthase from Sulfolobus solfataricus.

    Science.gov (United States)

    Gu, Zhenyu; Zitzewitz, Jill A; Matthews, C Robert

    2007-04-27

    To test the roles of motif and amino acid sequence in the folding mechanisms of TIM barrel proteins, hydrogen-deuterium exchange was used to explore the structure of the stable folding intermediates for the of indole-3-glycerol phosphate synthase from Sulfolobus solfataricus (sIGPS). Previous studies of the urea denaturation of sIGPS revealed the presence of an intermediate that is highly populated at approximately 4.5 M urea and contains approximately 50% of the secondary structure of the native (N) state. Kinetic studies showed that this apparent equilibrium intermediate is actually comprised of two thermodynamically distinct species, I(a) and I(b). To probe the location of the secondary structure in this pair of stable on-pathway intermediates, the equilibrium unfolding process of sIGPS was monitored by hydrogen-deuterium exchange mass spectrometry. The intact protein and pepsin-digested fragments were studied at various concentrations of urea by electrospray and matrix-assisted laser desorption ionization time-of-flight mass spectrometry, respectively. Intact sIGPS strongly protects at least 54 amide protons from hydrogen-deuterium exchange in the intermediate states, demonstrating the presence of stable folded cores. When the protection patterns and the exchange mechanisms for the peptides are considered with the proposed folding mechanism, the results can be interpreted to define the structural boundaries of I(a) and I(b). Comparison of these results with previous hydrogen-deuterium exchange studies on another TIM barrel protein of low sequence identify, alpha-tryptophan synthase (alphaTS), indicates that the thermodynamic states corresponding to the folding intermediates are better conserved than their structures. Although the TIM barrel motif appears to define the basic features of the folding free energy surface, the structures of the partially folded states that appear during the folding reaction depend on the amino acid sequence. Markedly, the good

  7. The arabidopsis cyclic nucleotide interactome

    KAUST Repository

    Donaldson, Lara Elizabeth

    2016-05-11

    Background Cyclic nucleotides have been shown to play important signaling roles in many physiological processes in plants including photosynthesis and defence. Despite this, little is known about cyclic nucleotide-dependent signaling mechanisms in plants since the downstream target proteins remain unknown. This is largely due to the fact that bioinformatics searches fail to identify plant homologs of protein kinases and phosphodiesterases that are the main targets of cyclic nucleotides in animals. Methods An affinity purification technique was used to identify cyclic nucleotide binding proteins in Arabidopsis thaliana. The identified proteins were subjected to a computational analysis that included a sequence, transcriptional co-expression and functional annotation analysis in order to assess their potential role in plant cyclic nucleotide signaling. Results A total of twelve cyclic nucleotide binding proteins were identified experimentally including key enzymes in the Calvin cycle and photorespiration pathway. Importantly, eight of the twelve proteins were shown to contain putative cyclic nucleotide binding domains. Moreover, the identified proteins are post-translationally modified by nitric oxide, transcriptionally co-expressed and annotated to function in hydrogen peroxide signaling and the defence response. The activity of one of these proteins, GLYGOLATE OXIDASE 1, a photorespiratory enzyme that produces hydrogen peroxide in response to Pseudomonas, was shown to be repressed by a combination of cGMP and nitric oxide treatment. Conclusions We propose that the identified proteins function together as points of cross-talk between cyclic nucleotide, nitric oxide and reactive oxygen species signaling during the defence response.

  8. Encoded expansion: an efficient algorithm to discover identical string motifs.

    Science.gov (United States)

    Azmi, Aqil M; Al-Ssulami, Abdulrakeeb

    2014-01-01

    A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952-7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes [Formula: see text] in theoretical time complexity of [Formula: see text] and a space complexity of [Formula: see text] where [Formula: see text] is the length of the input sequence and [Formula: see text] is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes [Formula: see text] that occur at least [Formula: see text] times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than [Formula: see text] times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of [Formula: see text] and a space complexity of [Formula: see text] Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms.

  9. Thermal Stability of Modified i-Motif Oligonucleotides with Naphthalimide Intercalating Nucleic Acids

    DEFF Research Database (Denmark)

    El-Sayed, Ahmed Ali; Pedersen, Erik B.; Khaireldin, Nahid Y.

    2016-01-01

    In continuation of our investigation of characteristics and thermodynamic properties of the i-motif 5′-d[(CCCTAA)3CCCT)] upon insertion of intercalating nucleotides into the cytosine-rich oligonucleotide, this article evaluates the stabilities of i-motif oligonucleotides upon insertion of naphtha......In continuation of our investigation of characteristics and thermodynamic properties of the i-motif 5′-d[(CCCTAA)3CCCT)] upon insertion of intercalating nucleotides into the cytosine-rich oligonucleotide, this article evaluates the stabilities of i-motif oligonucleotides upon insertion...... of naphthalimide (1H-benzo[de]isoquinoline-1,3(2H)-dione) as the intercalating nucleic acid. The stabilities of i-motif structures with inserted naphthalimide intercalating nucleotides were studied using UV melting temperatures (Tm) and circular dichroism spectra at different pH values and conditions (crowding...

  10. Single Nucleotide Polymorphism

    DEFF Research Database (Denmark)

    Børsting, Claus; Pereira, Vania; Andersen, Jeppe Dyrberg

    2014-01-01

    and briefly describe the methods that are preferred for SNP typing in forensic genetics. In addition, we will illustrate how SNPs can be used as investigative leads in the police investigation by discussing the use of ancestry informative markers and forensic DNA phenotyping. Modern DNA sequencing......Single nucleotide polymorphisms (SNPs) are the most frequent DNA sequence variations in the genome. They have been studied extensively in the last decade with various purposes in mind. In this chapter, we will discuss the advantages and disadvantages of using SNPs for human identification...... technologies (also called next generation sequencing or NGS) have the potential to completely transform forensic genetic investigations as we know them today. Here, we will make a short introduction to NGS and explain how NGS may combine analysis of the traditional forensic genetic markers with analysis...

  11. Probing structural changes of self assembled i-motif DNA

    KAUST Repository

    Lee, Iljoon

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  12. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  13. Gene Isolation Using Degenerate Primers Targeting Protein Motif: A Laboratory Exercise

    Science.gov (United States)

    Yeo, Brandon Pei Hui; Foong, Lian Chee; Tam, Sheh May; Lee, Vivian; Hwang, Siaw San

    2018-01-01

    Structures and functions of protein motifs are widely included in many biology-based course syllabi. However, little emphasis is placed to link this knowledge to applications in biotechnology to enhance the learning experience. Here, the conserved motifs of nucleotide binding site-leucine rich repeats (NBS-LRR) proteins, successfully used for the…

  14. Multiple POU-binding motifs, recognized by tissue-specific nuclear factors, are important for Dll1 gene expression in neural stem cells

    International Nuclear Information System (INIS)

    Nakayama, Kohzo; Nagase, Kazuko; Tokutake, Yuriko; Koh, Chang-Sung; Hiratochi, Masahiro; Ohkawara, Takeshi; Nakayama, Noriko

    2004-01-01

    We cloned the 5'-flanking region of the mouse homolog of the Delta gene (Dll1) and demonstrated that the sequence between nucleotide position -514 and -484 in the 5'-flanking region of Dll1 played a critical role in the regulation of its tissue-specific expression in neural stem cells (NSCs). Further, we showed that multiple POU-binding motifs, located within this short sequence of 30 bp, were essential for transcriptional activation of Dll1 and also that multiple tissue-specific nuclear factors recognized these POU-binding motifs in various combinations through differentiation of NSCs. Thus, POU-binding factors may play an important role in Dll1 expression in developing NSCs

  15. Discovery of stress responsive DNA regulatory motifs in Arabidopsis.

    Science.gov (United States)

    Ma, Shisong; Bachan, Shawn; Porto, Matthew; Bohnert, Hans J; Snyder, Michael; Dinesh-Kumar, Savithramma P

    2012-01-01

    The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.

  16. Motif content comparison between monocot and dicot species

    Directory of Open Access Journals (Sweden)

    Matyas Cserhati

    2015-03-01

    Full Text Available While a number of DNA sequence motifs have been functionally characterized, the full repertoire of motifs in an organism (the motifome is yet to be characterized. The present study wishes to widen the scope of motif content analysis in different monocot and dicot species that include both rice species, Brachypodium, corn, wheat as monocots and Arabidopsis, Lotus japonica, Medicago truncatula, and Populus tremula as dicots. All possible existing motifs were analyzed in different regions of genomes such as were found in different sets of sequences in these species: the whole genome, core proximal and distal promoters, 5′ and 3′ UTRs, and the 1st introns. Due to the increased number of species involved in this study compared to previous works, species relationships were analyzed based on the similarity of common motif content. Certain secondary structure elements were inferred in the genomes of these species as well as new unknown motifs. The distribution of 20 motifs common to the studied species were found to have a significantly larger occurrence within the promoters and 3′ UTRs of genes, both being regulatory regions. Motifs common to the promoter regions of japonica rice, Brachypodium, and corn were also found in a number of orthologous and paralogous genes. Some of our motifs were found to be complementary to miRNA elements in Brachypodium distachyon and japonica rice.

  17. Isolation and characterization of an unusual repeated sequence from the ribosomal intergenic spacer of the crucifer Sisymbrium irio.

    Science.gov (United States)

    Grellet, F; Delcasso-Tremousaygue, D; Delseny, M

    1989-06-01

    A recombinant plasmid containing a 433 base pair (bp) Bam HI fragment from Sisymbrium irio genomic DNA was isolated and characterized. This fragment was shown to be a ribosomal intergenic spacer (IGS) sequence which is reiterated up to six times in the IGS and extends close to the 5' end of the 18S rRNA gene. The nucleotide sequence of the cloned element is composed of 10-11 40 bp blocks that are probably derived from a common ancestor. The presence of a similar sequence can be detected in the DNA of another Sisymbrium species and in Matthiola incana. Homology was also found with the last 43 nucleotides of the radish IGS 3' end, suggesting that there is possibly a common ancestral nucleotide motif in cruciferous IGS sequences. The cloned element hybridises to RNA transcripts, indicating that the S. irio IGS repetitive sequence is at least partially transcribed during the pre-rRNA transcription process.

  18. Protein Chaperones Q8ZP25_SALTY from Salmonella Typhimurium and HYAE_ECOLI from Escherichia coli Exhibit Thioredoxin-like Structures Despite Lack of Canonical Thioredoxin Active Site Sequence Motif

    Energy Technology Data Exchange (ETDEWEB)

    Parish, D.; Benach, J; Liu, G; Singarapu, K; Xiao, R; Acton, T; Hunt, J; Montelione, G; Szyperski, T; et. al.

    2008-01-01

    The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe) hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  19. Protein chaperones Q8ZP25_SALTY from Salmonella typhimurium and HYAE_ECOLI from Escherichia coli exhibit thioredoxin-like structures despite lack of canonical thioredoxin active site sequence motif.

    Science.gov (United States)

    Parish, David; Benach, Jordi; Liu, Goahua; Singarapu, Kiran Kumar; Xiao, Rong; Acton, Thomas; Su, Min; Bansal, Sonal; Prestegard, James H; Hunt, John; Montelione, Gaetano T; Szyperski, Thomas

    2008-12-01

    The structure of the 142-residue protein Q8ZP25_SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE_ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE_ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE_ECOLI was previously classified as a [NiFe] hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  20. The Entire Nucleotide Sequence of Friend-Related and Paralysis-Inducing PVC-441 Murine Leukemia Virus (MuLV) and Its Comparison with Those of PVC-211 MuLV and Friend MuLV

    OpenAIRE

    Tanaka, Atsushi; Oka, Kiyomasa; Tanaka, Keiji; Jinno, Atsushi; Ruscetti, Sandra K.; Kai, Kazushige

    1998-01-01

    PVC-441 murine leukemia virus (MuLV) is a member of the PVC group of Friend MuLV (F-MuLV)-derived neuropathogenic retroviruses. In order to determine the molecular basis for the difference in neuropathogenicity between PVC-441 and the previously characterized PVC-211 MuLVs, the entire nucleotide sequence of PVC-441 MuLV was determined and compared with those of PVC-211 and F-MuLV. The results suggest that PVC-441 and PVC-211 MuLVs were formed as a result of random mutations of F-MuLV and deve...

  1. [Personal motif in art].

    Science.gov (United States)

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  2. Artin t-Motifs

    OpenAIRE

    Taelman, Lenny

    2008-01-01

    We show that analytically trivial t-motifs satisfy a Tannakian duality, without restrictions on the base field, save for that it be of generic characteristic. We show that the group of components of the t-motivic Galois group coincides with the absolute Galois group of the base field.

  3. Direct AUC optimization of regulatory motifs.

    Science.gov (United States)

    Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

    2017-07-15

    The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  4. Proteome-wide analysis of single-nucleotide variations in the N-glycosylation sequon of human genes.

    Directory of Open Access Journals (Sweden)

    Raja Mazumder

    Full Text Available N-linked glycosylation is one of the most frequent post-translational modifications of proteins with a profound impact on their biological function. Besides other functions, N-linked glycosylation assists in protein folding, determines protein orientation at the cell surface, or protects proteins from proteases. The N-linked glycans attach to asparagines in the sequence context Asn-X-Ser/Thr, where X is any amino acid except proline. Any variation (e.g. non-synonymous single nucleotide polymorphism or mutation that abolishes the N-glycosylation sequence motif will lead to the loss of a glycosylation site. On the other hand, variations causing a substitution that creates a new N-glycosylation sequence motif can result in the gain of glycosylation. Although the general importance of glycosylation is well known and acknowledged, the effect of variation on the actual glycoproteome of an organism is still mostly unknown. In this study, we focus on a comprehensive analysis of non-synonymous single nucleotide variations (nsSNV that lead to either loss or gain of the N-glycosylation motif. We find that 1091 proteins have modified N-glycosylation sequons due to nsSNVs in the genome. Based on analysis of proteins that have a solved 3D structure at the site of variation, we find that 48% of the variations that lead to changes in glycosylation sites occur at the loop and bend regions of the proteins. Pathway and function enrichment analysis show that a significant number of proteins that gained or lost the glycosylation motif are involved in kinase activity, immune response, and blood coagulation. A structure-function analysis of a blood coagulation protein, antithrombin III and a protease, cathepsin D, showcases how a comprehensive study followed by structural analysis can help better understand the functional impact of the nsSNVs.

  5. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects...

  6. Multifactor dimensionality reduction analysis identifies specific nucleotide patterns promoting genetic polymorphisms

    Science.gov (United States)

    Arehart, Eric; Gleim, Scott; White, Bill; Hwa, John; Moore, Jason H

    2009-01-01

    Background The fidelity of DNA replication serves as the nidus for both genetic evolution and genomic instability fostering disease. Single nucleotide polymorphisms (SNPs) constitute greater than 80% of the genetic variation between individuals. A new theory regarding DNA replication fidelity has emerged in which selectivity is governed by base-pair geometry through interactions between the selected nucleotide, the complementary strand, and the polymerase active site. We hypothesize that specific nucleotide combinations in the flanking regions of SNP fragments are associated with mutation. Results We modeled the relationship between DNA sequence and observed polymorphisms using the novel multifactor dimensionality reduction (MDR) approach. MDR was originally developed to detect synergistic interactions between multiple SNPs that are predictive of disease susceptibility. We initially assembled data from the Broad Institute as a pilot test for the hypothesis that flanking region patterns associate with mutagenesis (n = 2194). We then confirmed and expanded our inquiry with human SNPs within coding regions and their flanking sequences collected from the National Center for Biotechnology Information (NCBI) database (n = 29967) and a control set of sequences (coding region) not associated with SNP sites randomly selected from the NCBI database (n = 29967). We discovered seven flanking region pattern associations in the Broad dataset which reached a minimum significance level of p ≤ 0.05. Significant models (p << 0.001) were detected for each SNP type examined in the larger NCBI dataset. Importantly, the flanking region models were elongated or truncated depending on the nucleotide change. Additionally, nucleotide distributions differed significantly at motif sites relative to the type of variation observed. The MDR approach effectively discerned specific sites within the flanking regions of observed SNPs and their respective identities, supporting the collective

  7. Multifactor dimensionality reduction analysis identifies specific nucleotide patterns promoting genetic polymorphisms.

    Science.gov (United States)

    Arehart, Eric; Gleim, Scott; White, Bill; Hwa, John; Moore, Jason H

    2009-03-30

    The fidelity of DNA replication serves as the nidus for both genetic evolution and genomic instability fostering disease. Single nucleotide polymorphisms (SNPs) constitute greater than 80% of the genetic variation between individuals. A new theory regarding DNA replication fidelity has emerged in which selectivity is governed by base-pair geometry through interactions between the selected nucleotide, the complementary strand, and the polymerase active site. We hypothesize that specific nucleotide combinations in the flanking regions of SNP fragments are associated with mutation. We modeled the relationship between DNA sequence and observed polymorphisms using the novel multifactor dimensionality reduction (MDR) approach. MDR was originally developed to detect synergistic interactions between multiple SNPs that are predictive of disease susceptibility. We initially assembled data from the Broad Institute as a pilot test for the hypothesis that flanking region patterns associate with mutagenesis (n = 2194). We then confirmed and expanded our inquiry with human SNPs within coding regions and their flanking sequences collected from the National Center for Biotechnology Information (NCBI) database (n = 29967) and a control set of sequences (coding region) not associated with SNP sites randomly selected from the NCBI database (n = 29967). We discovered seven flanking region pattern associations in the Broad dataset which reached a minimum significance level of p nucleotide change. Additionally, nucleotide distributions differed significantly at motif sites relative to the type of variation observed. The MDR approach effectively discerned specific sites within the flanking regions of observed SNPs and their respective identities, supporting the collective contribution of these

  8. DMINDA: an integrated web server for DNA motif identification and analyses.

    Science.gov (United States)

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-07-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Identification of a Gamma Interferon-Activated Inhibitor of Translation-Like RNA Motif at the 3′ End of the Transmissible Gastroenteritis Coronavirus Genome Modulating Innate Immune Response

    Science.gov (United States)

    Marquez-Jurado, Silvia; Nogales, Aitor; Zuñiga, Sonia; Almazán, Fernando

    2015-01-01

    ABSTRACT A 32-nucleotide (nt) RNA motif located at the 3′ end of the transmissible gastroenteritis coronavirus (TGEV) genome was found to specifically interact with the host proteins glutamyl-prolyl-tRNA synthetase (EPRS) and arginyl-tRNA synthetase (RRS). This RNA motif has high homology in sequence and secondary structure with the gamma interferon-activated inhibitor of translation (GAIT) element, which is located at the 3′ end of several mRNAs encoding proinflammatory proteins. The GAIT element is involved in the translation silencing of these mRNAs through its interaction with the GAIT complex (EPRS, heterogeneous nuclear ribonucleoprotein Q, ribosomal protein L13a, and glyceraldehyde 3-phosphate dehydrogenase) to favor the resolution of inflammation. Interestingly, we showed that the viral RNA motif bound the GAIT complex and inhibited the in vitro translation of a chimeric mRNA containing this RNA motif. To our knowledge, this is the first GAIT-like motif described in a positive RNA virus. To test the functional role of the GAIT-like RNA motif during TGEV infection, a recombinant coronavirus harboring mutations in this motif was engineered and characterized. Mutations of the GAIT-like RNA motif did not affect virus growth in cell cultures. However, an exacerbated innate immune response, mediated by the melanoma differentiation-associated gene 5 (MDA5) pathway, was observed in cells infected with the mutant virus compared with the response observed in cells infected with the parental virus. Furthermore, the mutant virus was more sensitive to beta interferon than the parental virus. All together, these data strongly suggested that the viral GAIT-like RNA motif modulates the host innate immune response. PMID:25759500

  10. Split tasks of asymmetric nucleotide-binding sites in the heterodimeric ABC exporter EfrCD.

    Science.gov (United States)

    Hürlimann, Lea M; Hohl, Michael; Seeger, Markus A

    2017-06-01

    Many heterodimeric ATP-binding cassette (ABC) exporters evolved asymmetric ATP-binding sites containing a degenerate site incapable of ATP hydrolysis due to noncanonical substitutions in conserved sequence motifs. Recent studies revealed that nucleotide binding to the degenerate site stabilizes contacts between the nucleotide-binding domains (NBDs) of the inward-facing transporter and regulates ATP hydrolysis at the consensus site via allosteric coupling mediated by the D-loops. However, it is unclear whether nucleotide binding to the degenerate site is strictly required for substrate transport. In this study, we examined the functional consequences of a systematic set of mutations introduced at the degenerate and consensus site of the multidrug efflux pump EfrCD of Enterococcus faecalis. Mutating motifs which differ among the two ATP-binding sites (Walker B, switch loop, and ABC signature) or which are involved in interdomain communication (D-loop and Q-loop) led to asymmetric results in the functional assays and were better tolerated at the degenerate site. This highlights the importance of the degenerate site to allosterically regulate the events at the consensus site. Mutating invariant motifs involved in ATP binding and NBD closure (A-loop and Walker A) resulted in equally reduced transport activities, regardless at which ATP-binding site they were introduced. In contrast to previously investigated heterodimeric ABC exporters, mutation of the degenerate site Walker A lysine completely inactivated ATPase activity and substrate transport, indicating that ATP binding to the degenerate site is essential for EfrCD. This study provides novel insights into the split tasks of asymmetric ATP-binding sites of heterodimeric ABC exporters. © 2017 The Authors. The FEBS Journal published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.

  11. A 19-nucleotide insertion in the leader sequence of avian leukosis virus subgroup J contributes to its replication in vitro but is not related to its pathogenicity in vivo.

    Directory of Open Access Journals (Sweden)

    Xiaolin Ji

    Full Text Available Subgroup J avian leukosis virus (ALV-J was first isolated from meat-type chickens that had developed myeloid leukosis and since 2008, ALV-J infections in chickens have become widespread in China. A comparison of the sequence of ALV-J epidemic isolates with HPRS-103, the ALV-J prototype virus, revealed several distinct features, one of which is a 19-nucleotide (nt insertion in the leader sequence. To determine the role of the 19-nt insertion in ALV-J pathogenicity, a pair of viruses were constructed and rescued. The first virus was an ALV-J Chinese isolate (designated rSD1009 containing the 19-nt insertion in its leader sequence. The second virus was a clone, in which the leader sequence had a deleted 19-nt sequence (designated rSD1009△19. Compared with rSD1009△19, rSD1009 displayed a moderate growth advantage in vitro. However, no differences were demonstrated in either viral replication or oncogenicity between the two rescued viruses in chickens. These results indicated that the 19-nt insertion contributed to ALV-J replication in vitro but was not related to its pathogenicity in vivo.

  12. Evolutionary history of Phakopsora pachyrhizi (the Asian soybean rust in Brazil based on nucleotide sequences of the internal transcribed spacer region of the nuclear ribosomal DNA

    Directory of Open Access Journals (Sweden)

    Maíra C. M. Freire

    2008-01-01

    Full Text Available Phakopsora pachyrhizi has dispersed globally and brought severe economic losses to soybean growers. The fungus has been established in Brazil since 2002 and is found nationwide. To gather information on the temporal and spatial patterns of genetic variation in P. pachyrhizi , we sequenced the nuclear internal transcribed spacer regions (ITS1 and ITS2. Total genomic DNA was extracted using either lyophilized urediniospores or lesions removed from infected leaves sampled from 26 soybean fields in Brazil and one field in South Africa. Cloning prior to sequencing was necessary because direct sequencing of PCR amplicons gave partially unreadable electrophoretograms with peak displacements suggestive of multiple sequences with length polymorphism. Sequences were determined from four clones per field. ITS sequences from African or Asian isolates available from the GenBank were included in the analyses. Independent sequence alignments of the ITS1 and ITS2 datasets identified 27 and 19 ribotypes, respectively. Molecular phylogeographic analyses revealed that ribotypes of widespread distribution in Brazil displayed characteristics of ancestrality and were shared with Africa and Asia, while ribotypes of rare occurrence in Brazil were indigenous. The results suggest P. pachyrhizi found in Brazil as originating from multiple, independent long-distance dispersal events.

  13. The CHH motif in sugar beet satellite DNA: a modulator for cytosine methylation.

    Science.gov (United States)

    Zakrzewski, Falk; Schubert, Veit; Viehoever, Prisca; Minoche, André E; Dohm, Juliane C; Himmelbauer, Heinz; Weisshaar, Bernd; Schmidt, Thomas

    2014-06-01

    Methylation of DNA is important for the epigenetic silencing of repetitive DNA in plant genomes. Knowledge about the cytosine methylation status of satellite DNAs, a major class of repetitive DNA, is scarce. One reason for this is that arrays of tandemly arranged sequences are usually collapsed in next-generation sequencing assemblies. We applied strategies to overcome this limitation and quantified the level of cytosine methylation and its pattern in three satellite families of sugar beet (Beta vulgaris) which differ in their abundance, chromosomal localization and monomer size. We visualized methylation levels along pachytene chromosomes with respect to small satellite loci at maximum resolution using chromosome-wide fluorescent in situ hybridization complemented with immunostaining and super-resolution microscopy. Only reduced methylation of many satellite arrays was obtained. To investigate methylation at the nucleotide level we performed bisulfite sequencing of 1569 satellite sequences. We found that the level of methylation of cytosine strongly depends on the sequence context: cytosines in the CHH motif show lower methylation (44-52%), while CG and CHG motifs are more strongly methylated. This affects the overall methylation of satellite sequences because CHH occurs frequently while CG and CHG are rare or even absent in the satellite arrays investigated. Evidently, CHH is the major target for modulation of the cytosine methylation level of adjacent monomers within individual arrays and contributes to their epigenetic function. This strongly indicates that asymmetric cytosine methylation plays a role in the epigenetic modification of satellite repeats in plant genomes. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.

  14. DNA authentication of Plantago Herb based on nucleotide sequences of 18S-28S rRNA internal transcribed spacer region.

    Science.gov (United States)

    Sahin, Fatma Pinar; Yamashita, Hiromi; Guo, Yahong; Terasaka, Kazuyoshi; Kondo, Toshiya; Yamamoto, Yutaka; Shimada, Hiroshi; Fujita, Masao; Kawasaki, Takeshi; Sakai, Eiji; Tanaka, Toshihiro; Goda, Yukihiro; Mizukami, Hajime

    2007-07-01

    Internal transcribed spacer (ITS) regions of nuclear ribosomal RNA gene were amplified from 23 plant- and herbarium specimens belonging to eight Plantago species (P. asiatica, P. depressa, P. major, P. erosa, P. hostifolia, P. camtschatica, P. virginica and P. lanceolata). Sequence comparison indicated that these Plantago species could be identified based on the sequence type of the ITS locus. Sequence analysis of the ITS regions amplified from the crude drug Plantago Herb obtained in the markets indicated that all the drugs from Japan were derived from P. asiatica whereas the samples obtained in China were originated from various Plantago species including P. asiatica, P. depressa, P. major and P. erosa.

  15. STEME: a robust, accurate motif finder for large data sets.

    Directory of Open Access Journals (Sweden)

    John E Reid

    Full Text Available Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface.

  16. Nucleotide diversity and phylogenetic relationships among ...

    Indian Academy of Sciences (India)

    The plastid genome regions of two intergenic spacers, psbA–trnH and trnL–trnF, were sequenced to study the nucleotide diversity and phylogenetic relationships among Gladiolus cultivars. Nucleotide diversity of psbA–trnH region was higher than trnL–trnF region of chloroplast. We employed Bayesian, maximum ...

  17. Peptide and nucleotide sequences of rat CD4 (W3/25) antigen: evidence for derivation from a structure with four immunoglobulin-related domains

    International Nuclear Information System (INIS)

    Clark, S.J.; Jefferies, W.A.; Barclay, A.N.; Gagnon, J.; Williams, A.F.

    1987-01-01

    The rat W3/25 antigen was the first marker antigen of helper T lymphocytes to be identified. Subsequently, the human OKT4 antigen (now called CD4) was described, and cell distribution and functional data suggested that W3/25 and OKT4 antigens were homologous. This is now confirmed by the matching of peptide sequences from W3/25 antigen with sequence predicted from rat cDNA clones detected by cross-hybridization with a cDNA probe for human CD4. Analysis of the two sequences suggests an evolutionary origin from a structure with four immunoglobulin-related domains, although only domain 1 at the NH 2 terminus meets the standard criteria for an immunoglobulin-related sequence. CD4 domains 2 and 4 contain disulfide bonds but seem like truncated immunoglobulin domains, whereas domain 3 may have a pattern of β-strands like an immunoglobulin variable domain, but without the disulfide bond

  18. The nucleotide sequence of the Desulfovibrio gigas desulforedoxin gene indicates that the Desulfovibrio vulgaris rbo gene originated from a gene fusion event.

    OpenAIRE

    Brumlik, M J; Leroy, G; Bruschi, M; Voordouw, G

    1990-01-01

    Expression of the rbo gene from Desulfovibrio vulgaris Hildenborough in Escherichia coli minicells and Western blotting (immunoblotting) of Desulfovibrio cell extracts with antibodies raised against a synthetic peptide indicated the presence of a 14-kDa polypeptide product, as expected from the gene sequence. Cloning and sequencing of the gene (dsr) for desulforedoxin, a 4-kDa redox protein from Desulfovibrio gigas, showed that it is formed by expression of an autonomous gene of 111 bp, not b...

  19. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    In vertebrates, the major histocompatibility complex (MHC) presents peptides to the immune system. In humans, MHCs are called human leukocyte antigens (HLAs), and some of the loci encoding them are the most polymorphic in the human genome. Different MHC molecules present different subsets...... of peptides, and knowledge of their binding specificities is important for understanding the differences in the immune response between individuals. Knowledge of motifs may be used to identify epitopes, to understand the MHC restriction of epitopes, and to compare the specificities of different MHC molecules....... Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif...

  20. Conservation of MHC class II DOA sequences among carnivores.

    Science.gov (United States)

    Soll, S J; Stewart, B S; Lehman, N

    2005-03-01

    We obtained the nucleotide sequence for most of the major histocompatibility complex (MHC) class II DOA locus for Weddell, leopard, northern elephant, and southern elephant seals and from the coyote and compared them to all known DOA data available to date. We found generally low levels of interspecific polymorphisms, providing further support for stabilizing selection acting on the DOA locus. This suggests that DO gene products play a substantial functional role in the regulation of antigen presentation. A seven-amino-acid motif of VWRLPEF was found to be conserved across all DOA sequences and may be a DO-specific recognition element.

  1. Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections

    Directory of Open Access Journals (Sweden)

    Saliha Hammoumi

    2016-09-01

    Full Text Available Koi herpesvirus disease (KHVD is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3, also known as koi herpesvirus (KHV. Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984 as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×107. The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity. By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3.

  2. Nucleotide sequence of the goat embryonic alpha globin gene (zeta) and linkage and evolutionary analysis of the complete alpha globin cluster.

    Science.gov (United States)

    Wernke, S M; Lingrel, J B

    1986-12-05

    In previous studies we identified and sequenced clones containing two adult alpha globin genes of the goat. Additional studies have revealed the presence of an embryonic alpha globin gene termed zeta. Sequence analysis of the gene shows that it is the largest mammalian or avian globin gene cloned to date. Its unusual size is mainly due to a 14 base-pair tandem repeat sequence in its first intron. A similar sequence is also found in the first intron of the human zeta gene. The goat zeta coding sequence differs greatly from that of the adult alpha, particularly at amino acid position 38, where it codes for the amino acid replacement of Gln for Thr. This change may confer a higher intrinsic O2 affinity on the zeta globin protein, ensuring a sufficient O2 supply for the developing goat embryo. The cloning and sequencing of this gene completes the alpha globin locus of the goat, composed of three genes in the following order 5'-zeta-I alpha-II alpha-3'. Evolutionary comparisons of the goat alpha locus with other amphibian, avian and mammalian loci reveal several interesting features. Statistical analysis confirms the hypothesis that the embryonic alpha gene is much older (400 million years) than the embryonic beta gene (200 million years), and that it is descended from a primordial gene, whose present-day counterpart is the Xenopus larval alpha globin gene. Our results also suggest that after the divergence of the avian line, the alpha A gene converted the alpha D gene during the evolution of the pre-mammalian line. The alpha D globin gene remains unconverted in the avian line, potentially because of insertion/deletion sequences that may prevent any gene conversion event. The divergence rates of specific globin genes have been analyzed and found to form an essentially straight line, in agreement with the neutralist view of evolution.

  3. NUCLEOTIDES IN INFANT FEEDING

    Directory of Open Access Journals (Sweden)

    L.G. Mamonova

    2007-01-01

    Full Text Available The article reviews the application of nucleotides-metabolites, playing a key role in many biological processes, for the infant feeding. The researcher provides the date on the nucleotides in the women's milk according to the lactation stages. She also analyzes the foreign experience in feeding newborns with nucleotides-containing milk formulas. The article gives a comparison of nucleotides in the adapted formulas represented in the domestic market of the given products.Key words: children, feeding, nucleotides.

  4. Nucleotide sequence of the hexA gene for DNA mismatch repair in Streptococcus pneumoniae and homology of hexA to mutS of Escherichia coli and Salmonella typhimurium

    International Nuclear Information System (INIS)

    Priebe, S.D.; Hadi, S.M.; Greenberg, B.; Lacks, S.A.

    1988-01-01

    The Hex system of heteroduplex DNA base mismatch repair operates in Streptococcus pneumoniae after transformation and replication to correct donor and nascent DNA strands, respectively. A functionally similar system, called Mut, operates in Escherichia coli and Salmonella typhimurium. The nucleotide sequence of a 3.8-kilobase segment from the S. pneumoniae chromosome that includes the 2.7-kilobase hexA gene was determined. Chromosomal DNA used as donor to measure Hex phenotype was irradiated with UV light. An open reading frame that could encode a 17-kilodalton polypeptide (OrfC) was located just upstream of the gene encoding a polypeptide of 95 kilodaltons corresponding to HexA. Shine-Dalgarno sequences and putative promoters were identified upstream of each protein start site. Insertion mutations showed that only HexA functioned in mismatch repair and that the promoter for hexA transcription was located within the OrfC-coding region. The HexA polypeptide contains a consensus sequence for ATP- or GTP-binding sites in proteins. Comparison of the entire HexA protein sequence to that of MutS of S. typhimurium, showed the proteins to be homologous, inasmuch as 36% of their amino acid residues were identical. This homology indicates that the Hex and Mut systems of mismatch repair evolved from an ancestor common to the gram-positive streptococci and the gram-negative enterobacteria. It is the first direct evidence linking the two systems

  5. Whole-Genome Bisulfite Sequencing for the Analysis of Genome-Wide DNA Methylation and Hydroxymethylation Patterns at Single-Nucleotide Resolution.

    Science.gov (United States)

    Kernaleguen, Magali; Daviaud, Christian; Shen, Yimin; Bonnet, Eric; Renault, Victor; Deleuze, Jean-François; Mauger, Florence; Tost, Jörg

    2018-01-01

    The analysis of genome-wide epigenomic alterations including DNA methylation and hydroxymethylation has become a subject of intensive research for many biological and disease-associated investigations. Whole-genome bisulfite sequencing (WGBS) using next-generation sequencing technologies is currently considered as the gold standard for a comprehensive and quantitative analysis of DNA methylation throughout the genome. However, bisulfite conversion does not allow distinguishing between cytosine methylation and hydroxymethylation requiring an additional chemical or enzymatic step to identify hydroxymethylated cytosines. Here we provide two detailed protocols based on commercial kits for the preparation of sequencing libraries for the comprehensive whole-genome analysis of DNA methylation and/or hydroxymethylation. If only DNA methylation is of interest, sequencing libraries can be constructed from limited amounts of input DNA by ligation of methylated adaptors to the fragmented DNA prior to bisulfite conversion. For samples with significant levels of hydroxymethylation such as stem cells or brain tissue, we describe the protocol of oxidative bisulfite sequencing (OxBs-seq), which in its current version uses a post-bisulfite adaptor tagging (PBAT) approach. Two methylomes need to be generated: a classic methylome following bisulfite conversion and analyzing both methylated and hydroxymethylated cytosines and a methylome analyzing only methylated cytosines, respectively. We also provide a step-by-step description of the data analysis using publicly available bioinformatic tools. The described protocols have been successfully applied to different human samples and yield robust and reproducible results.

  6. The soybean-Phytophthora resistance locus Rps1-k encompasses coiled coil-nucleotide binding-leucine rich repeat-like genes and repetitive sequences

    Directory of Open Access Journals (Sweden)

    Bhattacharyya Madan K

    2008-03-01

    Full Text Available Abstract Background A series of Rps (resistance to Pytophthora sojae genes have been protecting soybean from the root and stem rot disease caused by the Oomycete pathogen, Phytophthora sojae. Five Rps genes were mapped to the Rps1 locus located near the 28 cM map position on molecular linkage group N of the composite genetic soybean map. Among these five genes, Rps1-k was introgressed from the cultivar, Kingwa. Rps1-k has been providing stable and broad-spectrum Phytophthora resistance in the major soybean-producing regions of the United States. Rps1-k has been mapped and isolated. More than one functional Rps1-k gene was identified from the Rps1-k locus. The clustering feature at the Rps1-k locus might have facilitated the expansion of Rps1-k gene numbers and the generation of new recognition specificities. The Rps1-k region was sequenced to understand the possible evolutionary steps that shaped the generation of Phytophthora resistance genes in soybean. Results Here the analyses of sequences of three overlapping BAC clones containing the 184,111 bp Rps1-k region are reported. A shotgun sequencing strategy was applied in sequencing the BAC contig. Sequence analysis predicted a few full-length genes including two Rps1-k genes, Rps1-k-1 and Rps1-k-2. Previously reported Rps1-k-3 from this genomic region 1 was evolved through intramolecular recombination between Rps1-k-1 and Rps1-k-2 in Escherichia coli. The majority of the predicted genes are truncated and therefore most likely they are nonfunctional. A member of a highly abundant retroelement, SIRE1, was identified from the Rps1-k region. The Rps1-k region is primarily composed of repetitive sequences. Sixteen simple repeat and 63 tandem repeat sequences were identified from the locus. Conclusion These data indicate that the Rps1 locus is located in a gene-poor region. The abundance of repetitive sequences in the Rps1-k region suggested that the location of this locus is in or near a

  7. Nucleotide sequence and phylogeny of the tet (L) tetracycline resistance determinant encoded by the plasmid pSTE1 from Staphylococcus hyicus

    DEFF Research Database (Denmark)

    Schwarz, S.; Cardoso, M.; Wegener, Henrik Caspar

    1992-01-01

    O from Streptococcus mutans were performed. An alignment of Tet amino acid sequence revealed the presence of 30 conserved amino acids among these Tet variants. On the basis of the alignment, a phylogenetic tree was constructed. It demonstrated large evolutionary distances between the Tet M and Tet O...

  8. Population genetic structure in farm and feral American mink (Neovison vison) inferred from RAD sequencing-generated single nucleotide polymorphisms

    DEFF Research Database (Denmark)

    Thirstrup, Janne Pia; Ruiz-Gonzalez, Aritz; Pujolar, José Martin

    2015-01-01

    Feral American mink populations (Neovison vison), derived from mink farms, are widespread in Europe. In this study we investigated genetic diversity and genetic differentiation between feral and farm mink using a panel of genetic markers (194 SNP) generated from RAD sequencing data. Sampling incl...

  9. NUCLEOTIDE SEQUENCING AND TRANSCRIPTIONAL MAPPING OF THE GENES ENCODING BIPHENYL DIOXYGENASE, A MULTICOM- PONENT POLYCHLORINATED-BIPHENYL-DEGRADING ENZYME IN PSEUDOMONAS STRAIN LB400

    Science.gov (United States)

    The DNA region encoding biphenyl dioxygenase, the first enzyme in the biphenyl-polychlorinated biphenyl degradation pathway of Pseudomonas species strain LB400, was sequenced. Six open reading frames were identified, four of which are homologous to the components of toluene dioxy...

  10. CHARACTERIZATION AND NUCLEOTIDE SEQUENCE DETERMINATION OF A REPEAT ELEMENT ISOLATED FROM A 2,4,5,-T DEGRADING STRAIN OF PSEUDOMONAS CEPACIA

    Science.gov (United States)

    Pseudomonas cepacia strain AC1100, capable of growth on 2,4,5-trichlorophenoxyacetic acid (2,4,5-T), was mutated to the 2,4,5-T− strain PT88 by a ColE1 :: Tn5 chromosomal insertion. Using cloned DNA from the region flanking the insertion, a 1477-bp sequence (designated RS1100) wa...

  11. Both Maintenance and Avoidance of RNA-Binding Protein Interactions Constrain Coding Sequence Evolution.

    Science.gov (United States)

    Savisaar, Rosina; Hurst, Laurence D

    2017-05-01

    While the principal force directing coding sequence (CDS) evolution is selection on protein function, to ensure correct gene expression CDSs must also maintain interactions with RNA-binding proteins (RBPs). Understanding how our genes are shaped by these RNA-level pressures is necessary for diagnostics and for improving transgenes. However, the evolutionary impact of the need to maintain RBP interactions remains unresolved. Are coding sequences constrained by the need to specify RBP binding motifs? If so, what proportion of mutations are affected? Might sequence evolution also be constrained by the need not to specify motifs that might attract unwanted binding, for instance because it would interfere with exon definition? Here, we have scanned human CDSs for motifs that have been experimentally determined to be recognized by RBPs. We observe two sets of motifs-those that are enriched over nucleotide-controlled null and those that are depleted. Importantly, the depleted set is enriched for motifs recognized by non-CDS binding RBPs. Supporting the functional relevance of our observations, we find that motifs that are more enriched are also slower-evolving. The net effect of this selection to preserve is a reduction in the over-all rate of synonymous evolution of 2-3% in both primates and rodents. Stronger motif depletion, on the other hand, is associated with stronger selection against motif gain in evolution. The challenge faced by our CDSs is therefore not only one of attracting the right RBPs but also of avoiding the wrong ones, all while also evolving under selection pressures related to protein structure. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  13. Highly scalable Ab initio genomic motif identification

    KAUST Repository

    Marchand, Benoit

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  14. Analysis of complete nucleotide sequences of Angolan hepatitis B virus isolates reveals the existence of a separate lineage within genotype E.

    Directory of Open Access Journals (Sweden)

    Barbara V Lago

    Full Text Available Hepatitis B virus genotype E (HBV/E is highly prevalent in Western Africa. In this work, 30 HBV/E isolates from HBsAg positive Angolans (staff and visitors of a private hospital in Luanda were genetically characterized: 16 of them were completely sequenced and the pre-S/S sequences of the remaining 14 were determined. A high proportion (12/30, 40% of subjects tested positive for both HBsAg and anti-HBs markers. Deduced amino acid sequences revealed the existence of specific substitutions and deletions in the B- and T-cell epitopes of the surface antigen (pre-S1- and pre-S2 regions of the virus isolates derived from 8/12 individuals with concurrent HBsAg/anti-HBs. Phylogenetic analysis performed with 231 HBV/E full-length sequences, including 16 from this study, showed that all isolates from Angola, Namibia and the Democratic Republic of Congo (n = 28 clustered in a separate lineage, divergent from the HBV/E isolates from nine other African countries, namely Cameroon, Central African Republic, Côte d'Ivoire, Ghana, Guinea, Madagascar, Niger, Nigeria and Sudan, with a Bayesian posterior probability of 1. Five specific mutations, namely small S protein T57I, polymerase Q177H, G245W and M612L, and X protein V30L, were observed in 79-96% of the isolates of the separate lineage, compared to a frequency of 0-12% among the other HBV/E African isolates.

  15. The complete nucleotide sequence and environmental distribution of the cryptic, conjugative, broad-host-range plasmid pIPO2 islated from bacteria of the wheat rhizosphere

    OpenAIRE

    Tauch, A.; Schneiker, S.; Selbitschka, W.; PÜhler, A.; Overbeek, van, L.S.; Smalla, K.; Thomas, C.M.; Bailey, M.J.; Forney, L.J.; Weightman, A.; Ceglowski, P.; Pembroke, T.; Tietze, E.; Schröder, G.; Lanka, E.

    2002-01-01

    The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approac...

  16. Functional diversity of CTCFs is encoded in their binding motifs.

    Science.gov (United States)

    Fang, Rongxin; Wang, Chengqi; Skogerbo, Geir; Zhang, Zhihua

    2015-08-28

    The CCCTC-binding factor (CTCF) has diverse regulatory functions. However, the definitive characteristics of the CTCF binding motif required for its functional diversity still remains elusive. Here, we describe a new motif discovery workflow by which we have identified three CTCF binding motif variations with highly divergent functionalities. Supported by transcriptomic, epigenomic and chromatin-interactomic data, we show that the functional diversity of the CTCF binding motifs is strongly associated with their GC content, CpG dinucleotide coverage and relative DNA methylation level at the 12th position of the motifs. Further analysis suggested that the co-localization of cohesin, the key factor in cohesion of sister chromatids, is negatively correlated with the CpG coverage and the relative DNA methylation level at the 12th position. Finally, we present evidences for a hypothetical model in which chromatin interactions between promoters and distal regulatory regions are likely mediated by CTCFs binding to sequences with high CpG. These results demonstrate the existence of definitive CTCF binding motifs corresponding to CTCF's diverse functions, and that the functional diversity of the motifs is strongly associated with genetic and epigenetic features at the 12th position of the motifs.

  17. Motif Participation by Genes in E. coli Transcriptional Networks

    Directory of Open Access Journals (Sweden)

    Michael eMayo

    2012-09-01

    Full Text Available Motifs are patterns of recurring connections among the genes of genetic networks that occur more frequently than would be expected from randomized networks with the same degree sequence. Although the abundance of certain three-node motifs, such as the feed-forward loop, is positively correlated with a networks’ ability to tolerate moderate disruptions to gene expression, little is known regarding the connectivity of individual genes participating in multiple motifs. Using the transcriptional network of the bacterium Escherichia coli, we investigate this feature by reconstructing the distribution of genes participating in feed-forward loop motifs from its largest connected network component. We contrast these motif participation distributions with those obtained from model networks built using the preferential attachment mechanism employed by many biological and man-made networks. We report that, although some of these model networks support a motif participation distribution that appears qualitatively similar to that obtained from the bacterium Escherichia coli, the probability for a node to support a feed-forward loop motif may instead be strongly influenced by only a few master transcriptional regulators within the network. From these analyses we conclude that such master regulators may be a crucial ingredient to describe coupling among feed-forward loop motifs in transcriptional regulatory networks.

  18. SNARE motif: A common motif used by pathogens to manipulate membrane fusion

    Science.gov (United States)

    Wesolowski, Jordan

    2010-01-01

    To penetrate host cells through their membranes, pathogens use a variety of molecular components in which the presence of heptad repeat motifs seems to be a prevailing element. Heptad repeats are characterized by a pattern of seven, generally hydrophobic, residues. In order to initiate membrane fusion, viruses use glycoproteins-containing heptad repeats. These proteins are structurally and functionally similar to the SNARE proteins known to be involved in eukaryotic membrane fusion. SNAREs also display a heptad repeat motif called the “SNARE motif”. As bacterial genomes are being sequenced, microorganisms also appear to be carrying membrane proteins resembling eukaryotic SNAREs. This category of SNARE-like proteins might share similar functions and could be used by microorganisms to either promote or block membrane fusion. Such a recurrence across pathogenic organisms suggests that this architectural motif was evolutionarily selected because it most effectively ensures the survival of pathogens within the eukaryotic environment. PMID:21178463

  19. An Inhibitory Motif on the 5'UTR of Several Rotavirus Genome Segments Affects Protein Expression and Reverse Genetics Strategies.

    Directory of Open Access Journals (Sweden)

    Giuditta De Lorenzo

    Full Text Available Rotavirus genome consists of eleven segments of dsRNA, each encoding one single protein. Viral mRNAs contain an open reading frame (ORF flanked by relatively short untranslated regions (UTRs, whose role in the viral cycle remains elusive. Here we investigated the role of 5'UTRs in T7 polymerase-driven cDNAs expression in uninfected cells. The 5'UTRs of eight genome segments (gs3, gs5-6, gs7-11 of the simian SA11 strain showed a strong inhibitory effect on the expression of viral proteins. Decreased protein expression was due to both compromised transcription and translation and was independent of the ORF and the 3'UTR sequences. Analysis of several mutants of the 21-nucleotide long 5'UTR of gs 11 defined an inhibitory motif (IM represented by its primary sequence rather than its secondary structure. IM was mapped to the 5' terminal 6-nucleotide long pyrimidine-rich tract 5'-GGY(U/AUY-3'. The 5' terminal position within the mRNA was shown to be essentially required, as inhibitory activity was lost when IM was moved to an internal position. We identified two mutations (insertion of a G upstream the 5'UTR and the U to A mutation of the fifth nucleotide of IM that render IM non-functional and increase the transcription and translation rate to levels that could considerably improve the efficiency of virus helper-free reverse genetics strategies.

  20. Nucleotide polymorphism in the 5.8S nrDNA gene and internal transcribed spacers in Phakopsora pachyrhizi viewed from structural models.

    Science.gov (United States)

    Freire, Maíra Cristina Menezes; da Silva, Maria Roméria; Zhang, Xuecheng; Almeida, Álvaro Manuel Rodrigues; Stacey, Gary; de Oliveira, Luiz Orlando

    2012-02-01

    The assessment of nucleotide polymorphisms in environmental samples of obligate pathogens requires DNA amplification through the polymerase chain reaction (PCR) and bacterial cloning of PCR products prior to sequencing. The drawback of this strategy is that it can give rise to false polymorphisms owing to DNA polymerase misincorporation during PCR or bacterial cloning. We investigated patterns of nucleotide polymorphism in the internal transcribed spacer (ITS) region for Phakopsora pachyrhizi, an obligate biotrophic fungus that causes the Asian soybean rust. Field-collected samples of P. pachyrhizi were obtained from all major soybean production areas worldwide, including Brazil and the United States. Bacterially-cloned, PCR products were obtained using a high fidelity DNA polymerase. A total of 370 ITS sequences that were subjected to an array of complementary sequence analyses, which included analyses of secondary structure stability, the pattern of nucleotide polymorphisms, GC content, and the presence of conserved motifs. The sequences exhibited features of functional rRNAs. Overall, polymorphisms took place within less conserved motives, such as loops and bulges; alternatively, they gave rise to non-canonical G-U pairs within conserved regions of double stranded helices. We discuss the usefulness of structural analyses to filter out putative 'suspicious' bacterially cloned ITS sequences, thus keeping artificially-induced sequence variation to a minimum. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. The Transmembrane Morphogenesis Protein gp1 of Filamentous Phages Contains Walker A and Walker B Motifs Essential for Phage Assembly

    Directory of Open Access Journals (Sweden)

    Belinda Loh

    2017-04-01

    Full Text Available In contrast to lytic phages, filamentous phages are assembled in the inner membrane and secreted across the bacterial envelope without killing the host. For assembly and extrusion of the phage across the host cell wall, filamentous phages code for membrane-embedded morphogenesis proteins. In the outer membrane of Escherichia coli, the protein gp4 forms a pore-like structure, while gp1 and gp11 form a complex in the inner membrane of the host. By comparing sequences with other filamentous phages, we identified putative Walker A and B motifs in gp1 with a conserved lysine in the Walker A motif (K14, and a glutamic and aspartic acid in the Walker B motif (D88, E89. In this work we demonstrate that both, Walker A and Walker B, are essential for phage production. The crucial role of these key residues suggests that gp1 might be a molecular motor driving phage assembly. We further identified essential residues for the function of the assembly complex. Mutations in three out of six cysteine residues abolish phage production. Similarly, two out of six conserved glycine residues are crucial for gp1 function. We hypothesise that the residues represent molecular hinges allowing domain movement for nucleotide binding and phage assembly.

  2. The Transmembrane Morphogenesis Protein gp1 of Filamentous Phages Contains Walker A and Walker B Motifs Essential for Phage Assembly.

    Science.gov (United States)

    Loh, Belinda; Haase, Maximilian; Mueller, Lukas; Kuhn, Andreas; Leptihn, Sebastian

    2017-04-09

    In contrast to lytic phages, filamentous phages are assembled in the inner membrane and secreted across the bacterial envelope without killing the host. For assembly and extrusion of the phage across the host cell wall, filamentous phages code for membrane-embedded morphogenesis proteins. In the outer membrane of Escherichia coli, the protein gp4 forms a pore-like structure, while gp1 and gp11 form a complex in the inner membrane of the host. By comparing sequences with other filamentous phages, we identified putative Walker A and B motifs in gp1 with a conserved lysine in the Walker A motif (K14), and a glutamic and aspartic acid in the Walker B motif (D88, E89). In this work we demonstrate that both, Walker A and Walker B, are essential for phage production. The crucial role of these key residues suggests that gp1 might be a molecular motor driving phage assembly. We further identified essential residues for the function of the assembly complex. Mutations in three out of six cysteine residues abolish phage production. Similarly, two out of six conserved glycine residues are crucial for gp1 function. We hypothesise that the residues represent molecular hinges allowing domain movement for nucleotide binding and phage assembly.

  3. Molecular relationships in Encephalartos (Zamiaceae, Cycadales) based on nucleotide sequences of nuclear ITS 1&2, rbcL, and genomic ISSR fingerprinting.

    Science.gov (United States)

    Treutlein, J; Vorster, P; Wink, M

    2005-01-01

    The cycad genus Encephalartos is restricted to Africa and is threatened with extinction in most of its range. Total DNA was extracted from 51, i.e., 78 %, of the described species of Encephalartos. The accessions were sampled from the furthest western occurrence of the genus in Nigeria, via Sudan and Uganda, to southern South Africa. The sequences of nuclear ribosomal internal transcribed spacer regions 1 and 2 (ITS 1&2), the chloroplast encoded rbcL gene, and ISSR genomic fingerprinting were employed to resolve the molecular history and the relationships within the genus. Sequence alignment, as well as ISSR fingerprinting, data show low genetic variation among all analysed accessions, indicating diversification within the Pliocene/Pleistocene. ITS 1&2 data agree well with morphological and geographical characters and resolved three major genetic clusters with overlapping distribution ranges in eastern South Africa. This area, that contains the largest diversity of genotypes of Encephalartos, may have served as a Pliocene/Pleistocene refugium.

  4. Nucleotide sequence of pOLA52: a conjugative IncX1 plasmid from Escherichia coli which enables biofilm formation and multidrug efflux

    DEFF Research Database (Denmark)

    Norman, Anders; Hansen, Lars H.; She, Qunxin

    2008-01-01

    The large conjugative multidrug resistance (MDR) plasmid pOLA52 was sequenced and annotated. The plasmid encodes two phenotypes normally associated with the chromosomes of opportunistic pathogens, namely MDR via a resistance-nodulation-division (RND)-type efflux-pump (oqxAB), and the formation of...... and Tn6011) that seemed to originate from Klebsiella pneumoniae, thus demonstrating the capability of IncX1 plasmids of facilitating lateral transfer of gene cassettes between different Enterobacteriaceae.......The large conjugative multidrug resistance (MDR) plasmid pOLA52 was sequenced and annotated. The plasmid encodes two phenotypes normally associated with the chromosomes of opportunistic pathogens, namely MDR via a resistance-nodulation-division (RND)-type efflux-pump (oqxAB), and the formation...... of type 3 fimbriae (mrkABCDF). The plasmid was found to be 51,602 bp long with 68 putative genes. About half of the plasmid constituted a conserved IncX1-type backbone with predicted regions for conjugation, replication and partitioning, as well as a toxin/antitoxin (TA) plasmid addiction system...

  5. The AplI restriction-modification system in an edible cyanobacterium, Arthrospira (Spirulina) platensis NIES-39, recognizes the nucleotide sequence 5'-CTGCAG-3'.

    Science.gov (United States)

    Shiraishi, Hideaki; Tabuse, Yosuke

    2013-01-01

    The degradation of foreign DNAs by restriction enzymes in an edible cyanobacterium, Arthrospira platensis, is a potential barrier for gene-transfer experiments in this economically valuable organism. We overproduced in Escherichia coli the proteins involved in a putative restriction-modification system of A. platensis NIES-39. The protein produced from the putative type II restriction enzyme gene NIES39_K04640 exhibited an endonuclease activity that cleaved DNA within the sequence 5'-CTGCAG-3' between the A at the fifth position and the G at the sixth position. We designated this enzyme AplI. The protein from the adjacent gene NIES39_K04650, which encodes a putative DNA (cytosine-5-)-methyltransferase, rendered DNA molecules resistant to AplI by modifying the C at the fourth position (but not the C at the first position) in the recognition sequence. This modification enzyme, M.AplI, should be useful for converting DNA molecules into AplI-resistant forms for use in gene-transfer experiments. A summary of restriction enzymes in various Arthrospira strains is also presented in this paper.

  6. Maximum likelihood and Bayesian analyses of a combined nucleotide sequence dataset for genetic characterization of a novel pestivirus, SVA/cont-08.

    Science.gov (United States)

    Liu, Lihong; Xia, Hongyan; Baule, Claudia; Belák, Sándor

    2009-01-01

    Bovine viral diarrhoea virus 1 (BVDV-1) and Bovine viral diarrhoea virus 2 (BVDV-2) are two recognised bovine pestivirus species of the genus Pestivirus. Recently, a pestivirus, termed SVA/cont-08, was detected in a batch of contaminated foetal calf serum originating from South America. Comparative sequence analysis showed that the SVA/cont-08 virus shares 15-28% higher sequence identity to pestivirus D32/00_'HoBi' than to members of BVDV-1 and BVDV-2. In order to reveal the phylogenetic relationship of SVA/cont-08 with other pestiviruses, a molecular dataset of 30 pestiviruses and 1,896 characters, comprising the 5'UTR, N(pro) and E2 gene regions, was analysed by two methods: maximum likelihood and Bayesian approach. An identical, well-supported tree topology was observed, where four pestiviruses (SVA/cont-08, D32/00_'HoBi', CH-KaHo/cont, and Th/04_KhonKaen) formed a monophyletic clade that is closely related to the BVDV-1 and BVDV-2 clades. The strategy applied in this study is useful for classifying novel pestiviruses in the future.

  7. Validation of skeletal muscle cis-regulatory module predictions reveals nucleotide composition bias in functional enhancers.

    Directory of Open Access Journals (Sweden)

    Andrew T Kwon

    2011-12-01

    Full Text Available We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions.

  8. NestedMICA as an ab initio protein motif discovery tool

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2008-01-01

    Full Text Available Abstract Background Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length. Results Generally NestedMICA recovered most of the short (3–9 amino acid long test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME. Conclusion NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences. Availability NestedMICA is available under the Lesser GPL open-source license from: http://www.sanger.ac.uk/Software/analysis/nmica/

  9. CMD: A Database to Store the Bonding States of Cysteine Motifs with Secondary Structures

    Directory of Open Access Journals (Sweden)

    Hamed Bostan

    2012-01-01

    Full Text Available Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.

  10. Diversity and evolutionary relationship of nucleotide binding site ...

    Indian Academy of Sciences (India)

    PRAKASH KUMAR

    Most plant disease-resistance genes (R-genes) isolated so far encode proteins with a nucleotide binding site (NBS) domain and belong to a superfamily. NBS domains related to R-genes show a highly conserved backbone of an amino acid motif, which makes it possible to isolate resistance gene analogues (RGAs) by ...

  11. Insights into the motif preference of APOBEC3 enzymes.

    Directory of Open Access Journals (Sweden)

    Diako Ebrahimi

    Full Text Available We used a multivariate data analysis approach to identify motifs associated with HIV hypermutation by different APOBEC3 enzymes. The analysis showed that APOBEC3G targets G mainly within GG, TG, TGG, GGG, TGGG and also GGGT. The G nucleotides flanked by a C at the 3' end (in +1 and +2 positions were indicated as disfavoured targets by APOBEC3G. The G nucleotides within GGGG were found to be targeted at a frequency much less than what is expected. We found that the infrequent G-to-A mutation within GGGG is not limited to the inaccessibility, to APOBEC3, of poly Gs in the central and 3'polypurine tracts (PPTs which remain double stranded during the HIV reverse transcription. GGGG motifs outside the PPTs were also disfavoured. The motifs GGAG and GAGG were also found to be disfavoured targets for APOBEC3. The motif-dependent mutation of G within the HIV genome by members of the APOBEC3 family other than APOBEC3G was limited to GA→AA changes. The results did not show evidence of other types of context dependent G-to-A changes in the HIV genome.

  12. Targeted Amplicon Sequencing for Single-Nucleotide-Polymorphism Genotyping of Attaching and Effacing Escherichia coli O26:H11 Cattle Strains via a High-Throughput Library Preparation Technique.

    Science.gov (United States)

    Ison, Sarah A; Delannoy, Sabine; Bugarel, Marie; Nagaraja, Tiruvoor G; Renter, David G; den Bakker, Henk C; Nightingale, Kendra K; Fach, Patrick; Loneragan, Guy H

    2016-01-15

    Enterohemorrhagic Escherichia coli (EHEC) O26:H11, a serotype within Shiga toxin-producing E. coli (STEC) that causes severe human disease, has been considered to have evolved from attaching and effacing E. coli (AEEC) O26:H11 through the acquisition of a Shiga toxin-encoding gene. Targeted amplicon sequencing using next-generation sequencing technology of 48 phylogenetically informative single-nucleotide polymorphisms (SNPs) and three SNPs differentiating Shiga toxin-positive (stx-positive) strains from Shiga toxin-negative (stx-negative) strains were used to infer the phylogenetic relationships of 178 E. coli O26:H11 strains (6 stx-positive strains and 172 stx-negative AEEC strains) from cattle feces to 7 publically available genomes of human clinical strains. The AEEC cattle strains displayed synonymous SNP genotypes with stx2-positive sequence type 29 (ST29) human O26:H11 strains, while stx1 ST21 human and cattle strains clustered separately, demonstrating the close phylogenetic relatedness of these Shiga toxin-negative AEEC cattle strains and human clinical strains. With the exception of seven stx-negative strains, five of which contained espK, three stx-related SNPs differentiated the STEC strains from non-STEC strains, supporting the hypothesis that these AEEC cattle strains could serve as a potential reservoir for new or existing pathogenic human strains. Our results support the idea that targeted amplicon sequencing for SNP genotyping expedites strain identification and genetic characterization of E. coli O26:H11, which is important for food safety and public health. Copyright © 2016 Ison et al.

  13. Nucleotide sequence of a cDNA coding for the barley seed protein CMa: an inhibitor of insect α-amylase

    DEFF Research Database (Denmark)

    Rasmussen, Søren Kjærsgård; Johansson, A.

    1992-01-01

    The primary structure of the insect alpha-amylase inhibitor CMa of barley seeds was deduced from a full-length cDNA clone pc43F6. Analysis of RNA from barley endosperm shows high levels 15 and 20 days after flowering. The cDNA predicts an amino acid sequence of 119 residues preceded by a signal...... peptide of 25 amino acids. Ala and Leu account for 55% of the signal peptide. CMa is 60-85% identical with alpha-amylase inhibitors of wheat, but shows less than 50% identity to trypsin inhibitors of barley and wheat. The 10 Cys residues are located in identical positions compared to the cereal inhibitor...

  14. Identification of Single-Nucleotide Polymorphic Loci Associated with Biomass Yield under Water Deficit in Alfalfa (Medicago sativa L. Using Genome-Wide Sequencing and Association Mapping

    Directory of Open Access Journals (Sweden)

    Long-Xi Yu

    2017-06-01

    Full Text Available Alfalfa is a worldwide grown forage crop and is important due to its high biomass production and nutritional value. However, the production of alfalfa is challenged by adverse environmental factors such as drought and other stresses. Developing drought resistance alfalfa is an important breeding target for enhancing alfalfa productivity in arid and semi-arid regions. In the present study, we used genotyping-by-sequencing and genome-wide association to identify marker loci associated with biomass yield under drought in the field in a panel of diverse germplasm of alfalfa. A total of 28 markers at 22 genetic loci were associated with yield under water deficit, whereas only four markers associated with the same trait under well-watered condition. Comparisons of marker-trait associations between water deficit and well-watered conditions showed non-similarity except one. Most of the markers were identical across harvest periods within the treatment, although different levels of significance were found among the three harvests. The loci associated with biomass yield under water deficit located throughout all chromosomes in the alfalfa genome agreed with previous reports. Our results suggest that biomass yield under drought is a complex quantitative trait with polygenic inheritance and may involve a different mechanism compared to that of non-stress. BLAST searches of the flanking sequences of the associated loci against DNA databases revealed several stress-responsive genes linked to the drought resistance loci, including leucine-rich repeat receptor-like kinase, B3 DNA-binding domain protein, translation initiation factor IF2, and phospholipase-like protein. With further investigation, those markers closely linked to drought resistance can be used for MAS to accelerate the development of new alfalfa cultivars with improved resistance to drought and other abiotic stresses.

  15. Motivated Proteins: A web application for studying small three-dimensional protein motifs

    Directory of Open Access Journals (Sweden)

    Milner-White E James

    2009-02-01

    Full Text Available Abstract Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (XHTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema.

  16. Motivated proteins: a web application for studying small three-dimensional protein motifs.

    Science.gov (United States)

    Leader, David P; Milner-White, E James

    2009-02-11

    Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are alphabeta-motifs, asx-motifs, asx-turns, beta-bulges, beta-bulge loops, beta-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema.

  17. Sequence requirement of the ade6-4095 meiotic recombination hotspot in Schizosaccharomyces pombe.

    Science.gov (United States)

    Foulis, Steven J; Fowler, Kyle R; Steiner, Walter W

    2018-02-01

    Homologous recombination occurs at a greatly elevated frequency in meiosis compared to mitosis and is initiated by programmed double-strand DNA breaks (DSBs). DSBs do not occur at uniform frequency throughout the genome in most organisms, but occur preferentially at a limited number of sites referred to as hotspots. The location of hotspots have been determined at nucleotide-level resolution in both the budding and fission yeasts, and while several patterns have emerged regarding preferred locations for DSB hotspots, it remains unclear why particular sites experience DSBs at much higher frequency than other sites with seemingly similar properties. Short sequence motifs, which are often sites for binding of transcription factors, are known to be responsible for a number of hotspots. In this study we identified the minimum sequence required for activity of one of such motif identified in a screen of random sequences capable of producing recombination hotspots. The experimentally determined sequence, GGTCTRGACC, closely matches the previously inferred sequence. Full hotspot activity requires an effective sequence length of 9.5 bp, whereas moderate activity requires an effective sequence length of approximately 8.2 bp and shows significant association with DSB hotspots. In combination with our previous work, this result is consistent with a large number of different sequence motifs capable of producing recombination hotspots, and supports a model in which hotspots can be rapidly regenerated by mutation as they are lost through recombination.

  18. Rapid, High-Throughput Identification of Anthrax-Causing and Emetic Bacillus cereus Group Genome Assemblies via BTyper, a Computational Tool for Virulence-Based Classification of Bacillus cereus Group Isolates by Using Nucleotide Sequencing Data

    Science.gov (United States)

    Carroll, Laura M.; Miller, Rachel A.; Wiedmann, Martin

    2017-01-01

    ABSTRACT The Bacillus cereus group comprises nine species, several of which are pathogenic. Differentiating between isolates that may cause disease and those that do not is a matter of public health and economic importance, but it can be particularly challenging due to the high genomic similarity within the group. To this end, we have developed BTyper, a computational tool that employs a combination of (i) virulence gene-based typing, (ii) multilocus sequence typing (MLST), (iii) panC clade typing, and (iv) rpoB allelic typing to rapidly classify B. cereus group isolates using nucleotide sequencing data. BTyper was applied to a set of 662 B. cereus group genome assemblies to (i) identify anthrax-associated genes in non-B. anthracis members of the B. cereus group, and (ii) identify assemblies from B. cereus group strains with emetic potential. With BTyper, the anthrax toxin genes cya, lef, and pagA were detected in 8 genomes classified by the NCBI as B. cereus that clustered into two distinct groups using k-medoids clustering, while either the B. anthracis poly-γ-d-glutamate capsule biosynthesis genes capABCDE or the hyaluronic acid capsule hasA gene was detected in an additional 16 assemblies classified as either B. cereus or Bacillus thuringiensis isolated from clinical, environmental, and food sources. The emetic toxin genes cesABCD were detected in 24 assemblies belonging to panC clades III and VI that had been isolated from food, clinical, and environmental settings. The command line version of BTyper is available at https://github.com/lmc297/BTyper. In addition, BMiner, a companion application for analyzing multiple BTyper output files in aggregate, can be found at https://github.com/lmc297/BMiner. IMPORTANCE Bacillus cereus is a foodborne pathogen that is estimated to cause tens of thousands of illnesses each year in the United States alone. Even with molecular methods, it can be difficult to distinguish nonpathogenic B. cereus group isolates from their

  19. Phylogeny reconstruction and hybrid analysis of populus (Salicaceae) based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments.

    Science.gov (United States)

    Wang, Zhaoshan; Du, Shuhui; Dayanandan, Selvadurai; Wang, Dongsheng; Zeng, Yanfei; Zhang, Jianguo

    2014-01-01

    Populus (Salicaceae) is one of the most economically and ecologically important genera of forest trees. The complex reticulate evolution and lack of highly variable orthologous single-copy DNA markers have posed difficulties in resolving the phylogeny of this genus. Based on a large data set of nuclear and plastid DNA sequences, we reconstructed robust phylogeny of Populus using parsimony, maximum likelihood and Bayesian inference methods. The resulting phylogenetic trees showed better resolution at both inter- and intra-sectional level than previous studies. The results revealed that (1) the plastid-based phylogenetic tree resulted in two main clades, suggesting an early divergence of the maternal progenitors of Populus; (2) three advanced sections (Populus, Aigeiros and Tacamahaca) are of hybrid origin; (3) species of the section Tacamahaca could be divided into two major groups based on plastid and nuclear DNA data, suggesting a polyphyletic nature of the section; and (4) many species proved to be of hybrid origin based on the incongruence between plastid and nuclear DNA trees. Reticulate evolution may have played a significant role in the evolution history of Populus by facilitating rapid adaptive radiations into different environments.

  20. Phylogeny reconstruction and hybrid analysis of populus (Salicaceae based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments.

    Directory of Open Access Journals (Sweden)

    Zhaoshan Wang

    Full Text Available Populus (Salicaceae is one of the most economically and ecologically important genera of forest trees. The complex reticulate evolution and lack of highly variable orthologous single-copy DNA markers have posed difficulties in resolving the phylogeny of this genus. Based on a large data set of nuclear and plastid DNA sequences, we reconstructed robust phylogeny of Populus using parsimony, maximum likelihood and Bayesian inference methods. The resulting phylogenetic trees showed better resolution at both inter- and intra-sectional level than previous studies. The results revealed that (1 the plastid-based phylogenetic tree resulted in two main clades, suggesting an early divergence of the maternal progenitors of Populus; (2 three advanced sections (Populus, Aigeiros and Tacamahaca are of hybrid origin; (3 species of the section Tacamahaca could be divided into two major groups based on plastid and nuclear DNA data, suggesting a polyphyletic nature of the section; and (4 many species proved to be of hybrid origin based on the incongruence between plastid and nuclear DNA trees. Reticulate evolution may have played a significant role in the evolution history of Populus by facilitating rapid adaptive radiations into different environments.

  1. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  2. Characterization of simple sequence repeats (SSRs from Phlebotomus papatasi (Diptera: Psychodidae expressed sequence tags (ESTs

    Directory of Open Access Journals (Sweden)

    Hamarsheh Omar

    2011-09-01

    Full Text Available Abstract Background Phlebotomus papatasi is a natural vector of Leishmania major, which causes cutaneous leishmaniasis in many countries. Simple sequence repeats (SSRs, or microsatellites, are common in eukaryotic genomes and are short, repeated nucleotide sequence elements arrayed in tandem and flanked by non-repetitive regions. The enrichment methods used previously for finding new microsatellite loci in sand flies remain laborious and time consuming; in silico mining, which includes retrieval and screening of microsatellites from large amounts of sequence data from sequence data bases using microsatellite search tools can yield many new candidate markers. Results Simple sequence repeats (SSRs were characterized in P. papatasi expressed sequence tags (ESTs derived from a public database, National Center for Biotechnology Information (NCBI. A total of 42,784 sequences were mined, and 1,499 SSRs were identified with a frequency of 3.5% and an average density of 15.55 kb per SSR. Dinucleotide motifs were the most common SSRs, accounting for 67% followed by tri-, tetra-, and penta-nucleotide repeats, accounting for 31.1%, 1.5%, and 0.1%, respectively. The length of microsatellites varied from 5 to 16 repeats. Dinucleotide types; AG and CT have the highest frequency. Dinucleotide SSR-ESTs are relatively biased toward an excess of (AXn repeats and a low GC base content. Forty primer pairs were designed based on motif lengths for further experimental validation. Conclusion The first large-scale survey of SSRs derived from P. papatasi is presented; dinucleotide SSRs identified are more frequent than other types. EST data mining is an effective strategy to identify functional microsatellites in P. papatasi.

  3. IQ-motif peptides as novel anti-microbial agents.

    Science.gov (United States)

    McLean, Denise T F; Lundy, Fionnuala T; Timson, David J

    2013-04-01

    The IQ-motif is an amphipathic, often positively charged, α-helical, calmodulin binding sequence found in a number of eukaryote signalling, transport and cytoskeletal proteins. They share common biophysical characteristics with established, cationic α-helical antimicrobial peptides, such as the human cathelicidin LL-37. Therefore, we tested eight peptides encoding the sequences of IQ-motifs derived from the human cytoskeletal scaffolding proteins IQGAP2 and IQGAP3. Some of these peptides were able to inhibit the growth of Escherichia coli and Staphylococcus aureus with minimal inhibitory concentrations (MIC) comparable to LL-37. In addition some IQ-motifs had activity against the fungus Candida albicans. This antimicrobial activity is combined with low haemolytic activity (comparable to, or lower than, that of LL-37). Those IQ-motifs with anti-microbial activity tended to be able to bind to lipopolysaccharide. Some of these were also able to permeabilise the cell membranes of both Gram positive and Gram negative bacteria. These results demonstrate that IQ-motifs are viable lead sequences for the identification and optimisation of novel anti-microbial p