WorldWideScience

Sample records for nucleotide sequence motifs

  1. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  2. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...

  3. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  4. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  5. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore...... advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery....... a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs...

  6. CompariMotif: quick and easy comparisons of sequence motifs.

    Science.gov (United States)

    Edwards, Richard J; Davey, Norman E; Shields, Denis C

    2008-05-15

    CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/

  7. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  8. Nucleotide sequence preservation of human mitochondrial DNA

    International Nuclear Information System (INIS)

    Monnat, R.J. Jr.; Loeb, L.A.

    1985-01-01

    Recombinant DNA techniques have been used to quantitate the amount of nucleotide sequence divergence in the mitochondrial DNA population of individual normal humans. Mitochondrial DNA was isolated from the peripheral blood lymphocytes of five normal humans and cloned in M13 mp11; 49 kilobases of nucleotide sequence information was obtained from 248 independently isolated clones from the five normal donors. Both between- and within-individual differences were identified. Between-individual differences were identified in approximately = to 1/200 nucleotides. In contrast, only one within-individual difference was identified in 49 kilobases of nucleotide sequence information. This high degree of mitochondrial nucleotide sequence homogeneity in human somatic cells is in marked contrast to the rapid evolutionary divergence of human mitochondrial DNA and suggests the existence of mechanisms for the concerted preservation of mammalian mitochondrial DNA sequences in single organisms

  9. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  10. The International Nucleotide Sequence Database Collaboration.

    Science.gov (United States)

    Cochrane, Guy; Karsch-Mizrachi, Ilene; Nakamura, Yasukazu

    2011-01-01

    Under the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), globally comprehensive public domain nucleotide sequence is captured, preserved and presented. The partners of this long-standing collaboration work closely together to provide data formats and conventions that enable consistent data submission to their databases and support regular data exchange around the globe. Clearly defined policy and governance in relation to free access to data and relationships with journal publishers have positioned INSDC databases as a key provider of the scientific record and a core foundation for the global bioinformatics data infrastructure. While growth in sequence data volumes comes no longer as a surprise to INSDC partners, the uptake of next-generation sequencing technology by mainstream science that we have witnessed in recent years brings a step-change to growth, necessarily making a clear mark on INSDC strategy. In this article, we introduce the INSDC, outline data growth patterns and comment on the challenges of increased growth.

  11. A set of tetra-nucleotide core motif SSR markers for efficient identification of potato (Solanum tuberosum) cultivars.

    Science.gov (United States)

    Kishine, Masahiro; Tsutsumi, Katsuji; Kitta, Kazumi

    2017-12-01

    Simple sequence repeat (SSR) is a popular tool for individual fingerprinting. The long-core motif (e.g. tetra-, penta-, and hexa-nucleotide) simple sequence repeats (SSRs) are preferred because they make it easier to separate and distinguish neighbor alleles. In the present study, a new set of 8 tetra-nucleotide SSRs in potato ( Solanum tuberosum ) is reported. By using these 8 markers, 72 out of 76 cultivars obtained from Japan and the United States were clearly discriminated, while two pairs, both of which arose from natural variation, showed identical profiles. The combined probability of identity between two random cultivars for the set of 8 SSR markers was estimated to be 1.10 × 10 -8 , confirming the usefulness of the proposed SSR markers for fingerprinting analyses of potato.

  12. Identification of sequence motifs significantly associated with antisense activity

    Directory of Open Access Journals (Sweden)

    Peek Andrew S

    2007-06-01

    Full Text Available Abstract Background Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. Results We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. Conclusion The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic

  13. A Novel Protein Interaction between Nucleotide Binding Domain of Hsp70 and p53 Motif

    Directory of Open Access Journals (Sweden)

    Asita Elengoe

    2015-01-01

    Full Text Available Currently, protein interaction of Homo sapiens nucleotide binding domain (NBD of heat shock 70 kDa protein (PDB: 1HJO with p53 motif remains to be elucidated. The NBD-p53 motif complex enhances the p53 stabilization, thereby increasing the tumor suppression activity in cancer treatment. Therefore, we identified the interaction between NBD and p53 using STRING version 9.1 program. Then, we modeled the three-dimensional structure of p53 motif through homology modeling and determined the binding affinity and stability of NBD-p53 motif complex structure via molecular docking and dynamics (MD simulation. Human DNA binding domain of p53 motif (SCMGGMNR retrieved from UniProt (UniProtKB: P04637 was docked with the NBD protein, using the Autodock version 4.2 program. The binding energy and intermolecular energy for the NBD-p53 motif complex were −0.44 Kcal/mol and −9.90 Kcal/mol, respectively. Moreover, RMSD, RMSF, hydrogen bonds, salt bridge, and secondary structure analyses revealed that the NBD protein had a strong bond with p53 motif and the protein-ligand complex was stable. Thus, the current data would be highly encouraging for designing Hsp70 structure based drug in cancer therapy.

  14. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Perception Enhancement using Visual Attributes in Sequence Motif Visualization

    OpenAIRE

    Oon, Yin; Lee, Nung; Kok, Wei

    2016-01-01

    Sequence logo is a well-accepted scientific method to visualize the conservation characteristics of biological sequence motifs. Previous studies found that using sequence logo graphical representation for scientific evidence reports or arguments could seriously cause biases and misinterpretation by users. This study investigates on the visual attributes performance of a sequence logo in helping users to perceive and interpret the information based on preattentive theories and Gestalt principl...

  16. Identification of a Baeyer-Villiger monooxygenase sequence motif

    NARCIS (Netherlands)

    Fraaije, MW; Kamerbeek, NM; van Berkel, WJH; Janssen, DB; Kamerbeek, Nanne M.; Berkel, Willem J.H. van

    2002-01-01

    Baeyer-Villiger monooxygenases (BVMOs) form a distinct class of flavoproteins that catalyze the insertion of an oxygen atom in a C-C bond using dioxygen and NAD(P)H. Using newly characterized BVMO sequences, we have uncovered a BVMO-identifying sequence motif: FXGXXXRXXXW(P/D). Studies with

  17. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  18. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  19. Nucleotide sequence of the coat protein gene of the Skierniewice isolate of plum pox virus (PPV)

    International Nuclear Information System (INIS)

    Wypijewski, K.; Musial, W.; Augustyniak, J.; Malinowski, T.

    1994-01-01

    The coat protein (CP) gene of the Skierniewice isolate of plum pox virus (PPV-S) has been amplified using the reverse transcription - polymerase chain reaction (RT-PCR), cloned and sequenced. The nucleotide sequence of the gene and the deduced amino-acid sequences of PPV-S CP were compared with those of other PPV strains. The nucleotide sequence showed very high homology to most of the published sequences. The motif: Asp-Ala-Gly (DAG), important for the aphid transmissibility, was present in the amino-acid sequence. Our isolate did not react in ELISA with monoclonal antibodies MAb06 supposed to be specific for PPV-D. (author). 32 refs, 1 fig., 2 tabs

  20. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  1. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

    Science.gov (United States)

    Zhao, Xiaoyan; Sze, Sing-Hoi

    2011-05-01

    One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.

  2. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  3. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo; Jankovic, Boris R.; Bajic, Vladimir B.; Song, Le; Gao, Xin

    2013-01-01

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  4. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  5. The nucleotide sequences of two leghemoglobin genes from soybean

    DEFF Research Database (Denmark)

    Wiborg, O; Hyldig-Nielsen, J J; Jensen, E O

    1982-01-01

    We present the complete nucleotide sequences of two leghemoglobin genes isolated from soybean DNA. Both genes contain three intervening sequences in identical positions. Comparison of the coding sequences with known amino-acid sequences of soybean leghemoglobins suggest that the two genes...

  6. Evolutionary relationships in the ilarviruses: nucleotide sequence of prunus necrotic ringspot virus RNA 3.

    Science.gov (United States)

    Sánchez-Navarro, J A; Pallás, V

    1997-01-01

    The complete nucleotide sequence of an isolate of prunus necrotic ringspot virus (PNRSV) RNA 3 has been determined. Elucidation of the amino acid sequence of the proteins encoded by the two large open reading frames (ORFs) allowed us to carry out comparative and phylogenetic studies on the movement (MP) and coat (CP) proteins in the ilarvirus group. Amino acid sequence comparison of the MP revealed a highly conserved basic sequence motif with an amphipathic alpha-helical structure preceding the conserved motif of the '30K superfamily' proposed by Mushegian and Koonin [26] for MP's. Within this '30K' motif a strictly conserved transmembrane domain is present in all ilarviruses sequenced so far. At the amino-terminal end, prune dwarf virus (PDV) has an extension not present in other ilarviruses but which is observed in all bromo- and cucumoviruses, suggesting a common ancestor or a recombinational event in the Bromoviridae family. Examination of the N-terminus of the CP's of all ilarviruses revealed a highly basic region, part of which resembles the Arg-rich motif that has been characterized in the RNA-binding protein family. This motif has also been found in the other members of the Bromoviridae family, suggesting its involvement in a structural function. Furthermore this region is required for infectivity in ilarviruses. The similarities found in this Arg-rich motif are discussed in terms of this process known as genome activation. Finally, phylogenetic analysis of both the MP and CP proteins revealed a higher relationship of A1MV to PNRSV, apple mosaic virus (ApMV) and PDV than any other member of the ilarvirus group. In that sense, A1MV should be considered as a true ilarvirus instead of forming a distinct group of viruses.

  7. Statistical properties and fractals of nucleotide clusters in DNA sequences

    International Nuclear Information System (INIS)

    Sun Tingting; Zhang Linxi; Chen Jin; Jiang Zhouting

    2004-01-01

    Statistical properties of nucleotide clusters in DNA sequences and their fractals are investigated in this paper. The average size of nucleotide clusters in non-coding sequence is larger than that in coding sequence. We investigate the cluster-size distribution P(S) for human chromosomes 21 and 22, and the results are different from previous works. The cluster-size distribution P(S 1 +S 2 ) with the total size of sequential Pu-cluster and Py-cluster S 1 +S 2 is studied. We observe that P(S 1 +S 2 ) follows an exponential decay both in coding and non-coding sequences. However, we get different results for human chromosomes 21 and 22. The probability distribution P(S 1 ,S 2 ) of nucleotide clusters with the size of sequential Pu-cluster and Py-cluster S 1 and S 2 respectively, is also examined. In the meantime, some of the linear correlations are obtained in the double logarithmic plots of the fluctuation F(l) versus nucleotide cluster distance l along the DNA chain. The power spectrums of nucleotide clusters are also discussed, and it is concluded that the curves are flat and hardly changed and the 1/3 frequency is neither observed in coding sequence nor in non-coding sequence. These investigations can provide some insights into the nucleotide clusters of DNA sequences

  8. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  9. LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

    Science.gov (United States)

    Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

    2014-02-17

    As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of

  10. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified...

  11. Expressed sequence tags (ESTs) and single nucleotide ...

    African Journals Online (AJOL)

    SERVER

    2008-02-19

    Feb 19, 2008 ... the discovery of the DNA, a new area of modern plant biotechnology begun. In plant ... Marker Assisted Breeding and Sequence Tagged Sites. (STS) are all in use in modern ...... and behaviour in the honey bee. Genome Res.

  12. DNA Nucleotide Sequence Restricted by the RI Endonuclease

    Science.gov (United States)

    Hedgpeth, Joe; Goodman, Howard M.; Boyer, Herbert W.

    1972-01-01

    The sequence of DNA base pairs adjacent to the phosphodiester bonds cleaved by the RI restriction endonuclease in unmodified DNA from coliphage λ has been determined. The 5′-terminal nucleotide labeled with 32P and oligonucleotides up to the heptamer were analyzed from a pancreatic DNase digest. The following sequence of nucleotides adjacent to the RI break made in λ DNA was deduced from these data and from the 3′-dinucleotide sequence and nearest-neighbor analysis obtained from repair synthesis with the DNA polymerase of Rous sarcoma virus [Formula: see text] The RI endonuclease cleavage of the phosphodiester bonds (indicated by arrows) generates 5′-phosphoryls and short cohesive termini of four nucleotides, pApApTpT. The most striking feature of the sequence is its symmetry. PMID:4343974

  13. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......-sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...

  14. Retrieval and Representation of Nucleotide Sequence of ...

    African Journals Online (AJOL)

    Nigerian Journal of Basic and Applied Science (March, 2013), 21(1): 27-32 ... Full Length R esearch A rticle ... The present study highlights data retrieval and representation. .... the end of information and the start of the sequence on the next ...

  15. Nucleotide sequence of Hungarian grapevine chrome mosaic nepovirus RNA1.

    OpenAIRE

    Le Gall, O; Candresse, T; Brault, V; Dunez, J

    1989-01-01

    The nucleotide sequence of the RNA1 of hungarian grapevine chrome mosaic virus, a nepovirus very closely related to tomato black ring virus, has been determined from cDNA clones. It is 7212 nucleotides in length excluding the 3' terminal poly(A) tail and contains a large open reading frame extending from nucleotides 216 to 6971. The presumably encoded polyprotein is 2252 amino acids in length with a molecular weight of 250 kDa. The primary structure of the polyprotein was compared with that o...

  16. Nucleotide sequence composition and method for detection of neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Lo, A.; Yang, H.L.

    1990-01-01

    This patent describes a composition of matter that is specific for Neisseria gonorrhoeae. It comprises: at least one nucleotide sequence for which the ratio of the amount of the sequence which hybridizes to chromosomal DNA of Neisseria gonorrhoeae to the amount of the sequence which hybridizes to chromosomal DNA of Neisseria meningitidis is greater than about five. The ratio being obtained by a method described

  17. Nucleotide sequence composition and method for detection of neisseria gonorrhoeae

    Energy Technology Data Exchange (ETDEWEB)

    Lo, A.; Yang, H.L.

    1990-02-13

    This patent describes a composition of matter that is specific for {ital Neisseria gonorrhoeae}. It comprises: at least one nucleotide sequence for which the ratio of the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria gonorrhoeae} to the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria meningitidis} is greater than about five. The ratio being obtained by a method described.

  18. A 6-Nucleotide Regulatory Motif within the AbcR Small RNAs of Brucella abortus Mediates Host-Pathogen Interactions.

    Science.gov (United States)

    Sheehan, Lauren M; Caswell, Clayton C

    2017-06-06

    In Brucella abortus , two small RNAs (sRNAs), AbcR1 and AbcR2, are responsible for regulating transcripts encoding ABC-type transport systems. AbcR1 and AbcR2 are required for Brucella virulence, as a double chromosomal deletion of both sRNAs results in attenuation in mice. Although these sRNAs are responsible for targeting transcripts for degradation, the mechanism utilized by the AbcR sRNAs to regulate mRNA in Brucella has not been described. Here, two motifs (M1 and M2) were identified in AbcR1 and AbcR2, and complementary motif sequences were defined in AbcR-regulated transcripts. Site-directed mutagenesis of M1 or M2 or of both M1 and M2 in the sRNAs revealed transcripts to be targeted by one or both motifs. Electrophoretic mobility shift assays revealed direct, concentration-dependent binding of both AbcR sRNAs to a target mRNA sequence. These experiments genetically and biochemically characterized two indispensable motifs within the AbcR sRNAs that bind to and regulate transcripts. Additionally, cellular and animal models of infection demonstrated that only M2 in the AbcR sRNAs is required for Brucella virulence. Furthermore, one of the M2-regulated targets, BAB2_0612, was found to be critical for the virulence of B. abortus in a mouse model of infection. Although these sRNAs are highly conserved among Alphaproteobacteria , the present report displays how gene regulation mediated by the AbcR sRNAs has diverged to meet the intricate regulatory requirements of each particular organism and its unique biological niche. IMPORTANCE Small RNAs (sRNAs) are important components of bacterial regulation, allowing organisms to quickly adapt to changes in their environments. The AbcR sRNAs are highly conserved throughout the Alphaproteobacteria and negatively regulate myriad transcripts, many encoding ABC-type transport systems. In Brucella abortus , AbcR1 and AbcR2 are functionally redundant, as only a double abcR1 abcR2 ( abcR1 / 2 ) deletion results in attenuation in

  19. Nucleotide sequence of the triosephosphate isomerase gene from Macaca mulatta

    Energy Technology Data Exchange (ETDEWEB)

    Old, S.E.; Mohrenweiser, H.W. (Univ. of Michigan, Ann Arbor (USA))

    1988-09-26

    The triosephosphate isomerase gene from a rhesus monkey, Macaca mulatta, charon 34 library was sequenced. The human and chimpanzee enzymes differ from the rhesus enzyme at ASN 20 and GLU 198. The nucleotide sequence identity between rhesus and human is 97% in the coding region and >94% in the flanking regions. Comparison of the rhesus and chimp genes, including the intron and flanking sequences, does not suggest a mechanism for generating the two TPI peptides of proliferating cells from hominoids and a single peptide from the rhesus gene.

  20. Nucleotide sequence of Hungarian grapevine chrome mosaic nepovirus RNA1.

    Science.gov (United States)

    Le Gall, O; Candresse, T; Brault, V; Dunez, J

    1989-10-11

    The nucleotide sequence of the RNA1 of hungarian grapevine chrome mosaic virus, a nepovirus very closely related to tomato black ring virus, has been determined from cDNA clones. It is 7212 nucleotides in length excluding the 3' terminal poly(A) tail and contains a large open reading frame extending from nucleotides 216 to 6971. The presumably encoded polyprotein is 2252 amino acids in length with a molecular weight of 250 kDa. The primary structure of the polyprotein was compared with that of other viral polyproteins, revealing the same general genetic organization as that of other picorna-like viruses (comoviruses, potyviruses and picornaviruses), except that an additional protein is suspected to occupy the N-terminus of the polyprotein.

  1. The complete nucleotide sequence of RNA 3 of a peach isolate of Prunus necrotic ringspot virus.

    Science.gov (United States)

    Hammond, R W; Crosslin, J M

    1995-04-01

    The complete nucleotide sequence of RNA 3 of the PE-5 peach isolate of Prunus necrotic ringspot ilarvirus (PNRSV) was obtained from cloned cDNA. The RNA sequence is 1941 nucleotides and contains two open reading frames (ORFs). ORF 1 consisted of 284 amino acids with a calculated molecular weight of 31,729 Da and ORF 2 contained 224 amino acids with a calculated molecular weight of 25,018 Da. ORF 2 corresponds to the coat protein gene. Expression of ORF 2 engineered into a pTrcHis vector in Escherichia coli results in a fusion polypeptide of approximately 28 kDa which cross-reacts with PNRSV polyclonal antiserum. Analysis of the coat protein amino acid sequence reveals a putative "zinc-finger" domain at the amino-terminal portion of the protein. Two tetranucleotide AUGC motifs occur in the 3'-UTR of the RNA and may function in coat protein binding and genome activation. ORF 1 homologies to other ilarviruses and alfalfa mosaic virus are confined to limited regions of conserved amino acids. The translated amino acid sequence of the coat protein gene shows 92% similarity to one isolate of apple mosaic virus, a closely related member of the ilarvirus group of plant viruses, but only 66% similarity to the amino acid sequence of the coat protein gene of a second isolate. These relationships are also reflected at the nucleotide sequence level. These results in one instance confirm the close similarities observed at the biophysical and serological levels between these two viruses, but on the other hand call into question the nomenclature used to describe these viruses.

  2. Nucleotide sequence of tomato ringspot virus RNA-2.

    Science.gov (United States)

    Rott, M E; Tremaine, J H; Rochon, D M

    1991-07-01

    The sequence of tomato ringspot virus (TomRSV) RNA-2 has been determined. It is 7273 nucleotides in length excluding the 3' poly(A) tail and contains a single long open reading frame (ORF) of 5646 nucleotides in the positive sense beginning at position 78 and terminating at position 5723. A second in-frame AUG at position 441 is in a more favourable context for initiation of translation and may act as a site for initiation of translation. The TomRSV RNA-2 3' noncoding region is 1550 nucleotides in length. The coat protein is located in the C-terminal region of the large polypeptide and shows significant but limited amino acid sequence similarity to the putative coat proteins of the nepoviruses tomato black ring (TBRV), Hungarian grapevine chrome mosaic (GCMV) and grapevine fanleaf (GFLV). Comparisons of the coding and non-coding regions of TomRSV RNA-2 and the RNA components of TBRV, GCMV, GFLV and the comovirus cowpea mosaic virus revealed significant similarity for over 300 amino acids between the coding region immediately to the N-terminal side of the putative coat proteins of TomRSV and GFLV; very little similarity could be detected among the non-coding regions of TomRSV and any of these viruses.

  3. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  4. Sequencing genes in silico using single nucleotide polymorphisms

    Directory of Open Access Journals (Sweden)

    Zhang Xinyi

    2012-01-01

    Full Text Available Abstract Background The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive. Results To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS, which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%. This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes. Conclusions Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate

  5. Determination of 5 '-leader sequences from radically disparate strains of porcine reproductive and respiratory syndrome virus reveals the presence of highly conserved sequence motifs

    DEFF Research Database (Denmark)

    Oleksiewicz, M.B.; Bøtner, Anette; Nielsen, Jens

    1999-01-01

    We determined the untranslated 5'-leader sequence for three different isolates of porcine reproductive and respiratory syndrome virus (PRRSV): pathogenic European- and American-types, as well as an American-type vaccine strain. 5'-leader from European- and American-type PRRSV differed in length...... (220 and 190 nt, respectively), and exhibited only approximately 50% nucleotide homology. Nevertheless, highly conserved areas were identified in the leader of all 3 PRRSV isolates, which constitute candidate motifs for binding of protein(s) involved in viral replication. These comparative data provide...

  6. Complete nucleotide sequences of avian metapneumovirus subtype B genome.

    Science.gov (United States)

    Sugiyama, Miki; Ito, Hiroshi; Hata, Yusuke; Ono, Eriko; Ito, Toshihiro

    2010-12-01

    Complete nucleotide sequences were determined for subtype B avian metapneumovirus (aMPV), the attenuated vaccine strain VCO3/50 and its parental pathogenic strain VCO3/60616. The genomes of both strains comprised 13,508 nucleotides (nt), with a 42-nt leader at the 3'-end and a 46-nt trailer at the 5'-end. The genome contains eight genes in the order 3'-N-P-M-F-M2-SH-G-L-5', which is the same order shown in the other metapneumoviruses. The genes are flanked on either side by conserved transcriptional start and stop signals and have intergenic sequences varying in length from 1 to 88 nt. Comparison of nt and predicted amino acid (aa) sequences of VCO3/60616 with those of other metapneumoviruses revealed higher homology with aMPV subtype A virus than with other metapneumoviruses. A total of 18 nt and 10 deduced aa differences were seen between the strains, and one or a combination of several differences could be associated with attenuation of VCO3/50.

  7. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    Science.gov (United States)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  8. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  9. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  10. Viroids: from genotype to phenotype just relying on RNA sequence and structural motifs

    Directory of Open Access Journals (Sweden)

    Ricardo eFlores

    2012-06-01

    Full Text Available As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson-Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunvioidae adopt multibranched conformations occasionally stabilized by kissing loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunvioidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures ⎯either global or local ⎯ determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs.

  11. Analysis of alkaptonuria (AKU) mutations and polymorphisms reveals that the CCC sequence motif is a mutational hot spot in the homogentisate 1,2 dioxygenase gene (HGO).

    Science.gov (United States)

    Beltrán-Valero de Bernabé, D; Jimenez, F J; Aquaron, R; Rodríguez de Córdoba, S

    1999-01-01

    We recently showed that alkaptonuria (AKU) is caused by loss-of-function mutations in the homogentisate 1,2 dioxygenase gene (HGO). Herein we describe haplotype and mutational analyses of HGO in seven new AKU pedigrees. These analyses identified two novel single-nucleotide polymorphisms (INV4+31A-->G and INV11+18A-->G) and six novel AKU mutations (INV1-1G-->A, W60G, Y62C, A122D, P230T, and D291E), which further illustrates the remarkable allelic heterogeneity found in AKU. Reexamination of all 29 mutations and polymorphisms thus far described in HGO shows that these nucleotide changes are not randomly distributed; the CCC sequence motif and its inverted complement, GGG, are preferentially mutated. These analyses also demonstrated that the nucleotide substitutions in HGO do not involve CpG dinucleotides, which illustrates important differences between HGO and other genes for the occurrence of mutation at specific short-sequence motifs. Because the CCC sequence motifs comprise a significant proportion (34.5%) of all mutated bases that have been observed in HGO, we conclude that the CCC triplet is a mutational hot spot in HGO. PMID:10205262

  12. Memetic algorithms for de novo motif-finding in biomedical sequences.

    Science.gov (United States)

    Bi, Chengpeng

    2012-09-01

    The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary micro

  13. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  14. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    Science.gov (United States)

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  15. Base Sequence Context Effects on Nucleotide Excision Repair

    Directory of Open Access Journals (Sweden)

    Yuqin Cai

    2010-01-01

    Full Text Available Nucleotide excision repair (NER plays a critical role in maintaining the integrity of the genome when damaged by bulky DNA lesions, since inefficient repair can cause mutations and human diseases notably cancer. The structural properties of DNA lesions that determine their relative susceptibilities to NER are therefore of great interest. As a model system, we have investigated the major mutagenic lesion derived from the environmental carcinogen benzo[a]pyrene (B[a]P, 10S (+-trans-anti-B[a]P-2-dG in six different sequence contexts that differ in how the lesion is positioned in relation to nearby guanine amino groups. We have obtained molecular structural data by NMR and MD simulations, bending properties from gel electrophoresis studies, and NER data obtained from human HeLa cell extracts for our six investigated sequence contexts. This model system suggests that disturbed Watson-Crick base pairing is a better recognition signal than a flexible bend, and that these can act in concert to provide an enhanced signal. Steric hinderance between the minor groove-aligned lesion and nearby guanine amino groups determines the exact nature of the disturbances. Both nearest neighbor and more distant neighbor sequence contexts have an impact. Regardless of the exact distortions, we hypothesize that they provide a local thermodynamic destabilization signal for repair.

  16. Novel Nucleotide Variations, Haplotypes Structure and Associations with Growth Related Traits of Goat AT Motif-Binding Factor ( Gene

    Directory of Open Access Journals (Sweden)

    Xiaoyan Zhang

    2015-10-01

    Full Text Available The AT motif-binding factor (ATBF1 not only interacts with protein inhibitor of activated signal transducer and activator of transcription 3 (STAT3 (PIAS3 to suppress STAT3 signaling regulating embryo early development and cell differentiation, but is required for early activation of the pituitary specific transcription factor 1 (Pit1 gene (also known as POU1F1 critically affecting mammalian growth and development. The goal of this study was to detect novel nucleotide variations and haplotypes structure of the ATBF1 gene, as well as to test their associations with growth-related traits in goats. Herein, a total of seven novel single nucleotide polymorphisms (SNPs (SNP 1-7 within this gene were found in two well-known Chinese native goat breeds. Haplotypes structure analysis demonstrated that there were four haplotypes in Hainan black goat while seventeen haplotypes in Xinong Saanen dairy goat, and both breeds only shared one haplotype (hap1. Association testing revealed that the SNP2, SNP5, SNP6, and SNP7 loci were also found to significantly associate with growth-related traits in goats, respectively. Moreover, one diplotype in Xinong Saanen dairy goats significantly linked to growth related traits. These preliminary findings not only would extend the spectrum of genetic variations of the goat ATBF1 gene, but also would contribute to implementing marker-assisted selection in genetics and breeding in goats.

  17. Nucleotide sequence of the human N-myc gene

    International Nuclear Information System (INIS)

    Stanton, L.W.; Schwab, M.; Bishop, J.M.

    1986-01-01

    Human neuroblastomas frequently display amplification and augmented expression of a gene known as N-myc because of its similarity to the protooncogene c-myc. It has therefore been proposed that N-myc is itself a protooncogene, and subsequent tests have shown that N-myc and c-myc have similar biological activities in cell culture. The authors have now detailed the kinship between N-myc and c-myc by determining the nucleotide sequence of human N-myc and deducing the amino acid sequence of the protein encoded by the gene. The topography of N-myc is strikingly similar to that of c-myc: both genes contain three exons of similar lengths; the coding elements of both genes are located in the second and third exons; and both genes have unusually long 5' untranslated regions in their mRNAs, with features that raise the possibility that expression of the genes may be subject to similar controls of translation. The resemblance between the proteins encoded by N-myc and c-myc sustains previous suspicions that the genes encode related functions

  18. Distance-dependent duplex DNA destabilization proximal to G-quadruplex/i-motif sequences

    Science.gov (United States)

    König, Sebastian L. B.; Huppert, Julian L.; Sigel, Roland K. O.; Evans, Amanda C.

    2013-01-01

    G-quadruplexes and i-motifs are complementary examples of non-canonical nucleic acid substructure conformations. G-quadruplex thermodynamic stability has been extensively studied for a variety of base sequences, but the degree of duplex destabilization that adjacent quadruplex structure formation can cause has yet to be fully addressed. Stable in vivo formation of these alternative nucleic acid structures is likely to be highly dependent on whether sufficient spacing exists between neighbouring duplex- and quadruplex-/i-motif-forming regions to accommodate quadruplexes or i-motifs without disrupting duplex stability. Prediction of putative G-quadruplex-forming regions is likely to be assisted by further understanding of what distance (number of base pairs) is required for duplexes to remain stable as quadruplexes or i-motifs form. Using oligonucleotide constructs derived from precedented G-quadruplexes and i-motif-forming bcl-2 P1 promoter region, initial biophysical stability studies indicate that the formation of G-quadruplex and i-motif conformations do destabilize proximal duplex regions. The undermining effect that quadruplex formation can have on duplex stability is mitigated with increased distance from the duplex region: a spacing of five base pairs or more is sufficient to maintain duplex stability proximal to predicted quadruplex/i-motif-forming regions. PMID:23771141

  19. A novel Y-xylosidase, nucleotide sequence encoding it and use thereof.

    NARCIS (Netherlands)

    Graaff, de L.H.; Peij, van N.N.M.E.; Broeck, van den H.C.; Visser, J.

    1996-01-01

    A nucleotide sequence is provided which encodes a peptide having beta-xylosidase activity and exhibits at least 30mino acid identity with the amino acid sequence shown in SEQ ID NO. 1 or hybridises under stringent conditions with a nucleotide sequence shown in SEQ ID NO. 1, or a part thereof having

  20. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias.

    Science.gov (United States)

    Kjær, Jonas; Belsham, Graham J

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long), which induces a nonproteolytic, cotranslational "cleavage" at its own C terminus. A conserved feature among variants of 2A is the C-terminal motif N 16 P 17 G 18 /P 19 , where P 19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E 14 , S 15 , and N 16 within the 2A sequence of infectious FMDVs, but no variants at residues P 17 , G 18 , or P 19 have been identified. In this study, using highly degenerate primers, we analyzed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after two, three, or four passages. However, surprisingly, a clear codon preference for the wt nucleotide sequence encoding the NPGP motif within these viruses was observed. Indeed, the codons selected to code for P 17 and P 19 within this motif were distinct; thus the synonymous codons are not equivalent. © 2018 Kjær and Belsham; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  1. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Science.gov (United States)

    Shi, Jieming; Li, Xi; Dong, Min; Graham, Mitchell; Yadav, Nehul; Liang, Chun

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  2. JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures

    Science.gov (United States)

    Dong, Min; Graham, Mitchell; Yadav, Nehul

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416

  3. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Directory of Open Access Journals (Sweden)

    Jieming Shi

    Full Text Available Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  4. PDL1 Signals through Conserved Sequence Motifs to Overcome Interferon-Mediated Cytotoxicity

    Directory of Open Access Journals (Sweden)

    Maria Gato-Cañas

    2017-08-01

    Full Text Available PDL1 blockade produces remarkable clinical responses, thought to occur by T cell reactivation through prevention of PDL1-PD1 T cell inhibitory interactions. Here, we find that PDL1 cell-intrinsic signaling protects cancer cells from interferon (IFN cytotoxicity and accelerates tumor progression. PDL1 inhibited IFN signal transduction through a conserved class of sequence motifs that mediate crosstalk with IFN signaling. Abrogation of PDL1 expression or antibody-mediated PDL1 blockade strongly sensitized cancer cells to IFN cytotoxicity through a STAT3/caspase-7-dependent pathway. Moreover, somatic mutations found in human carcinomas within these PDL1 sequence motifs disrupted motif regulation, resulting in PDL1 molecules with enhanced protective activities from type I and type II IFN cytotoxicity. Overall, our results reveal a mode of action of PDL1 in cancer cells as a first line of defense against IFN cytotoxicity.

  5. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  6. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  7. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  8. A tandem sequence motif acts as a distance-dependent enhancer in a set of genes involved in translation by binding the proteins NonO and SFPQ

    Directory of Open Access Journals (Sweden)

    Roepcke Stefan

    2011-12-01

    Full Text Available Abstract Background Bioinformatic analyses of expression control sequences in promoters of co-expressed or functionally related genes enable the discovery of common regulatory sequence motifs that might be involved in co-ordinated gene expression. By studying promoter sequences of the human ribosomal protein genes we recently identified a novel highly specific Localized Tandem Sequence Motif (LTSM. In this work we sought to identify additional genes and LTSM-binding proteins to elucidate potential regulatory mechanisms. Results Genome-wide analyses allowed finding a considerable number of additional LTSM-positive genes, the products of which are involved in translation, among them, translation initiation and elongation factors, and 5S rRNA. Electromobility shift assays then showed specific signals demonstrating the binding of protein complexes to LTSM in ribosomal protein gene promoters. Pull-down assays with LTSM-containing oligonucleotides and subsequent mass spectrometric analysis identified the related multifunctional nucleotide binding proteins NonO and SFPQ in the binding complex. Functional characterization then revealed that LTSM enhances the transcriptional activity of the promoters in dependency of the distance from the transcription start site. Conclusions Our data demonstrate the power of bioinformatic analyses for the identification of biologically relevant sequence motifs. LTSM and the here found LTSM-binding proteins NonO and SFPQ were discovered through a synergistic combination of bioinformatic and biochemical methods and are regulators of the expression of a set of genes of the translational apparatus in a distance-dependent manner.

  9. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Rasp21 sequences opposite the nucleotide binding pocket are required for GRF-mediated nucleotide release

    DEFF Research Database (Denmark)

    Leonardsen, L; DeClue, J E; Lybaek, H

    1996-01-01

    The substrate requirements for the catalytic activity of the mouse Cdc25 homolog Guanine nucleotide Release Factor, GRF, were determined using the catalytic domain of GRF expressed in insect cells and E. coli expressed H-Ras mutants. We found a requirement for the loop 7 residues in Ras (amino ac...... and the human Ras like proteins RhoA, Rap1A, Rac1 and G25K revealed a strict Ras specificity; of these only S. pombe Ras was GRF sensitive....

  11. The nucleotide sequence of satellite RNA in grapevine fanleaf virus, strain F13.

    Science.gov (United States)

    Fuchs, M; Pinck, M; Serghini, M A; Ravelonandro, M; Walter, B; Pinck, L

    1989-04-01

    The nucleotide sequence of cDNA copies of grapevine fanleaf virus (strain F13) satellite RNA has been determined. The primary structure obtained was 1114 nucleotides in length, excluding the poly(A) tail, and contained only one long open reading frame encoding a 341 residue, highly hydrophilic polypeptide of Mr37275. The coding sequence was bordered by a leader of 14 nucleotides and a 3'-terminal non-coding region of 74 nucleotides. No homology has been found with small satellite RNAs associated with other nepoviruses. Two limited homologies of eight nucleotides have been detected between the satellite RNA in grapevine fanleaf virus and those in tomato black ring virus, and a consensus sequence U.G/UGAAAAU/AU/AU/A at the 5' end of nepovirus RNAs is reported. A less extended consensus exists in this region in comovirus and picornavirus RNA.

  12. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction.

    Directory of Open Access Journals (Sweden)

    Aalt D J van Dijk

    Full Text Available Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and

  13. Peptomics, identification of novel cationic Arabidopsis peptides with conserved sequence motifs

    DEFF Research Database (Denmark)

    Olsen, Addie Nina; Mundy, John; Skriver, Karen

    2002-01-01

    Arabidopsis family of 34 genes. The predicted peptides are characterized by a conserved C-terminal sequence motif and additional primary structure conservation in a core region. The majority of these genes had not previously been annotated. A subset of the predicted peptides show high overall sequence...... similarity to Rapid Alkalinization Factor (RALF), a peptide isolated from tobacco. We therefore refer to this peptide family as RALFL for RALF-Like. RT-PCR analysis confirmed that several of the Arabidopsis genes are expressed and that their expression patterns vary. The identification of a large gene family...

  14. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    Science.gov (United States)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  15. FASH: A web application for nucleotides sequence search

    Directory of Open Access Journals (Sweden)

    Chew Paul

    2008-05-01

    Full Text Available Abstract FASH (Fourier Alignment Sequence Heuristics is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome, FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. Availability FASH can be accessed at https://fash.bgu.ac.il:8443/fash/default.jsp (secured website

  16. Nucleotide sequence and genetic organization of Hungarian grapevine chrome mosaic nepovirus RNA2.

    Science.gov (United States)

    Brault, V; Hibrand, L; Candresse, T; Le Gall, O; Dunez, J

    1989-10-11

    The complete nucleotide sequence of hungarian grapevine chrome mosaic nepovirus (GCMV) RNA2 has been determined. The RNA sequence is 4441 nucleotides in length, excluding the poly(A) tail. A polyprotein of 1324 amino acids with a calculated molecular weight of 146 kDa is encoded in a single long open reading frame extending from nucleotides 218 to 4190. This polyprotein is homologous with the protein encoded by the S strain of tomato black ring virus (TBRV) RNA2, the only other nepovirus sequenced so far. Direct sequencing of the viral coat protein and in vitro translation of transcripts derived from cDNA sequences demonstrate that, as for comoviruses, the coat protein is located at the carboxy terminus of the polyprotein. A model for the expression of GCMV RNA2 is presented.

  17. Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification.

    Science.gov (United States)

    Afsari, Sepideh; Korshoj, Lee E; Abel, Gary R; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant

    2017-11-28

    Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as "electronic fingerprints" with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal-molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for single-nucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing.

  18. Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

    Science.gov (United States)

    Shan, Gao; Zheng, Wei-Mou

    2009-02-01

    By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.

  19. Prunus necrotic ringspot ilarvirus: nucleotide sequence of RNA3 and the relationship to other ilarviruses based on coat protein comparison.

    Science.gov (United States)

    Guo, D; Maiss, E; Adam, G; Casper, R

    1995-05-01

    The RNA3 of prunus necrotic ringspot ilarvirus (PNRSV) has been cloned and its entire sequence determined. The RNA3 consists of 1943 nucleotides (nt) and possesses two large open reading frames (ORFs) separated by an intergenic region of 74 nt. The 5' proximal ORF is 855 nt in length and codes for a protein of molecular mass 31.4 kDa which has homologies with the putative movement protein of other members of the Bromoviridae. The 3' proximal ORF of 675 nt is the cistron for the coat protein (CP) and has a predicted molecular mass of 24.9 kDa. The sequence of the 3' non-coding region (NCR) of PNRSV RNA3 showed a high degree of similarity with those of tobacco streak virus (TSV), prune dwarf virus (PDV), apple mosaic virus (ApMV) and also alfalfa mosaic virus (AIMV). In addition it contained potential stem-loop structures with interspersed AUGC motifs characteristic for ilar- and alfamoviruses. This conserved primary and secondary structure in all 3' NCRs may be responsible for the interaction with homologous and heterologous CPs and subsequent activation of genome replication. The CP gene of an ApMV isolate (ApMV-G) of 657 nt has also been cloned and sequenced. Although ApMV and PNRSV have a distant serological relationship, the deduced amino acid sequences of their CPs have an identity of only 51.8%. The N termini of PNRSV and ApMV CPs have in common a zinc-finger motif and the potential to form an amphipathic helix.

  20. Nature and distribution of feline sarcoma virus nucleotide sequences.

    Science.gov (United States)

    Frankel, A E; Gilbert, J H; Porzig, K J; Scolnick, E M; Aaronson, S A

    1979-01-01

    The genomes of three independent isolates of feline sarcoma virus (FeSV) were compared by molecular hybridization techniques. Using complementary DNAs prepared from two strains, SM- and ST-FeSV, common complementary DNA'S were selected by sequential hybridization to FeSV and feline leukemia virus RNAs. These DNAs were shown to be highly related among the three independent sarcoma virus isolates. FeSV-specific complementary DNAs were prepared by selection for hybridization by the homologous FeSV RNA and against hybridization by fline leukemia virus RNA. Sarcoma virus-specific sequences of SM-FeSV were shown to differ from those of either ST- or GA-FeSV strains, whereas ST-FeSV-specific DNA shared extensive sequence homology with GA-FeSV. By molecular hybridization, each set of FeSV-specific sequences was demonstrated to be present in normal cat cellular DNA in approximately one copy per haploid genome and was conserved throughout Felidae. In contrast, FeSV-common sequences were present in multiple DNA copies and were found only in Mediterranean cats. The present results are consistent with the concept that each FeSV strain has arisen by a mechanism involving recombination between feline leukemia virus and cat cellular DNA sequences, the latter represented within the cat genome in a manner analogous to that of a cellular gene. PMID:225544

  1. The nucleotide sequence of 5S ribosomal RNA from Micrococcus lysodeikticus.

    Science.gov (United States)

    Hori, H; Osawa, S; Murao, K; Ishikura, H

    1980-01-01

    The nucleotide sequence of ribosomal 5S RNA from Micrococcus lysodeikticus is pGUUACGGCGGCUAUAGCGUGGGGGAAACGCCCGGCCGUAUAUCGAACCCGGAAGCUAAGCCCCAUAGCGCCGAUGGUUACUGUAACCGGGAGGUUGUGGGAGAGUAGGUCGCCGCCGUGAOH. When compared to other 5S RNAs, the sequence homology is greatest with Thermus aquaticus, and these two 5S RNAs reveal several features intermediate between those of typical gram-positive bacteria and gram-negative bacteria. PMID:6780979

  2. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......-sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...

  3. Nucleotide sequence of a human tRNA gene heterocluster

    International Nuclear Information System (INIS)

    Chang, Y.N.; Pirtle, I.L.; Pirtle, R.M.

    1986-01-01

    Leucine tRNA from bovine liver was used as a hybridization probe to screen a human gene library harbored in Charon-4A of bacteriophage lambda. The human DNA inserts from plaque-pure clones were characterized by restriction endonuclease mapping and Southern hybridization techniques, using both [3'- 32 P]-labeled bovine liver leucine tRNA and total tRNA as hybridization probes. An 8-kb Hind III fragment of one of these γ-clones was subcloned into the Hind III site of pBR322. Subsequent fine restriction mapping and DNA sequence analysis of this plasmid DNA indicated the presence of four tRNA genes within the 8-kb DNA fragment. A leucine tRNA gene with an anticodon of AAG and a proline tRNA gene with an anticodon of AGG are in a 1.6-kb subfragment. A threonine tRNA gene with an anticodon of UGU and an as yet unidentified tRNA gene are located in a 1.1-kb subfragment. These two different subfragments are separated by 2.8 kb. The coding regions of the three sequenced genes contain characteristic internal split promoter sequences and do not have intervening sequences. The 3'-flanking region of these three genes have typical RNA polymerase III termination sites of at least four consecutive T residues

  4. Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview.

    Science.gov (United States)

    Karvelis, Tautvydas; Gasiunas, Giedrius; Siksnys, Virginijus

    2017-05-15

    Recently the Cas9, an RNA guided DNA endonuclease, emerged as a powerful tool for targeted genome manipulations. Cas9 protein can be reprogrammed to cleave, bind or nick any DNA target by simply changing crRNA sequence, however a short nucleotide sequence, termed PAM, is required to initiate crRNA hybridization to the DNA target. PAM sequence is recognized by Cas9 protein and must be determined experimentally for each Cas9 variant. Exploration of Cas9 orthologs could offer a diversity of PAM sequences and novel biochemical properties that may be beneficial for genome editing applications. Here we briefly review and compare Cas9 PAM identification assays that can be adopted for other PAM-dependent CRISPR-Cas systems. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods.

    Directory of Open Access Journals (Sweden)

    Kouki Yonezawa

    Full Text Available A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm-called the closest-neighbor trimming method-that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and [Formula: see text]-medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias.

  6. Typing of canine parvovirus isolates using mini-sequencing based single nucleotide polymorphism analysis.

    Science.gov (United States)

    Naidu, Hariprasad; Subramanian, B Mohana; Chinchkar, Shankar Ramchandra; Sriraman, Rajan; Rana, Samir Kumar; Srinivasan, V A

    2012-05-01

    The antigenic types of canine parvovirus (CPV) are defined based on differences in the amino acids of the major capsid protein VP2. Type specificity is conferred by a limited number of amino acid changes and in particular by few nucleotide substitutions. PCR based methods are not particularly suitable for typing circulating variants which differ in a few specific nucleotide substitutions. Assays for determining SNPs can detect efficiently nucleotide substitutions and can thus be adapted to identify CPV types. In the present study, CPV typing was performed by single nucleotide extension using the mini-sequencing technique. A mini-sequencing signature was established for all the four CPV types (CPV2, 2a, 2b and 2c) and feline panleukopenia virus. The CPV typing using the mini-sequencing reaction was performed for 13 CPV field isolates and the two vaccine strains available in our repository. All the isolates had been typed earlier by full-length sequencing of the VP2 gene. The typing results obtained from mini-sequencing matched completely with that of sequencing. Typing could be achieved with less than 100 copies of standard plasmid DNA constructs or ≤10¹ FAID₅₀ of virus by mini-sequencing technique. The technique was also efficient for detecting multiple types in mixed infections. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. Complete nucleotide sequence of Alfalfa mosaic virus isolated from alfalfa (Medicago sativa L.) in Argentina.

    Science.gov (United States)

    Trucco, Verónica; de Breuil, Soledad; Bejerman, Nicolás; Lenardon, Sergio; Giolitti, Fabián

    2014-06-01

    The complete nucleotide sequence of an Alfalfa mosaic virus (AMV) isolate infecting alfalfa (Medicago sativa L.) in Argentina, AMV-Arg, was determined. The virus genome has the typical organization described for AMV, and comprises 3,643, 2,593, and 2,038 nucleotides for RNA1, 2 and 3, respectively. The whole genome sequence and each encoding region were compared with those of other four isolates that have been completely sequenced from China, Italy, Spain and USA. The nucleotide identity percentages ranged from 95.9 to 99.1 % for the three RNAs and from 93.7 to 99 % for the protein 1 (P1), protein 2 (P2), movement protein and coat protein (CP) encoding regions, whereas the amino acid identity percentages of these proteins ranged from 93.4 to 99.5 %, the lowest value corresponding to P2. CP sequences of AMV-Arg were compared with those of other 25 available isolates, and the phylogenetic analysis based on the CP gene was carried out. The highest percentage of nucleotide sequence identity of the CP gene was 98.3 % with a Chinese isolate and 98.6 % at the amino acid level with four isolates, two from Italy, one from Brazil and the remaining one from China. The phylogenetic analysis showed that AMV-Arg is closely related to subgroup I of AMV isolates. To our knowledge, this is the first report of a complete nucleotide sequence of AMV from South America and the first worldwide report of complete nucleotide sequence of AMV isolated from alfalfa as natural host.

  8. qPMS7: a fast algorithm for finding (ℓ, d-motifs in DNA and protein sequences.

    Directory of Open Access Journals (Sweden)

    Hieu Dinh

    Full Text Available Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d-motif search (or Planted Motif Search (PMS. A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS, is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.

  9. Nucleotide sequence and genetic organization of barley stripe mosaic virus RNA gamma.

    Science.gov (United States)

    Gustafson, G; Hunter, B; Hanau, R; Armour, S L; Jackson, A O

    1987-06-01

    The complete nucleotide sequences of RNA gamma from the Type and ND18 strains of barley stripe mosaic virus (BSMV) have been determined. The sequences are 3164 (Type) and 2791 (ND18) nucleotides in length. Both sequences contain a 5'-noncoding region (87 or 88 nucleotides) which is followed by a long open reading frame (ORF1). A 42-nucleotide intercistronic region separates ORF1 from a second, shorter open reading frame (ORF2) located near the 3'-end of the RNA. There is a high degree of homology between the Type and ND18 strains in the nucleotide sequence of ORF1. However, the Type strain contains a 366 nucleotide direct tandem repeat within ORF1 which is absent in the ND18 strain. Consequently, the predicted translation product of Type RNA gamma ORF1 (mol wt 87,312) is significantly larger than that of ND18 RNA gamma ORF1 (mol wt 74,011). The amino acid sequence of the ORF1 polypeptide contains homologies with putative RNA polymerases from other RNA viruses, suggesting that this protein may function in replication of the BSMV genome. The nucleotide sequence of RNA gamma ORF2 is nearly identical in the Type and ND18 strains. ORF2 codes for a polypeptide with a predicted molecular weight of 17,209 (Type) or 17,074 (ND18) which is known to be translated from a subgenomic (sg) RNA. The initiation point of this sgRNA has been mapped to a location 27 nucleotides upstream of the ORF2 initiation codon in the intercistronic region between ORF1 and ORF2. The sgRNA is not coterminal with the 3'-end of the genomic RNA, but instead contains heterogeneous poly(A) termini up to 150 nucleotides long (J. Stanley, R. Hanau, and A. O. Jackson, 1984, Virology 139, 375-383). In the genomic RNA gamma, ORF2 is followed by a short poly(A) tract and a 238-nucleotide tRNA-like structure.

  10. Nucleotide Sequence Diversity and Linkage Disequilibrium of Four Nuclear Loci in Foxtail Millet (Setaria italica.

    Directory of Open Access Journals (Sweden)

    Shui-Lian He

    Full Text Available Foxtail millet (Setaria italica (L. Beauv is one of the earliest domesticated grains, which has been cultivated in northern China by 8,700 years before present (YBP and across Eurasia by 4,000 YBP. Owing to a small genome and diploid nature, foxtail millet is a tractable model crop for studying functional genomics of millets and bioenergy grasses. In this study, we examined nucleotide sequence diversity, geographic structure, and levels of linkage disequilibrium at four nuclear loci (ADH1, G3PDH, IGS1 and TPI1 in representative samples of 311 landrace accessions across its cultivated range. Higher levels of nucleotide sequence and haplotype diversity were observed in samples from China relative to other sampled regions. Genetic assignment analysis classified the accessions into seven clusters based on nucleotide sequence polymorphisms. Intralocus LD decayed rapidly to half the initial value within ~1.2 kb or less.

  11. Nucleotide Sequence Diversity and Linkage Disequilibrium of Four Nuclear Loci in Foxtail Millet (Setaria italica).

    Science.gov (United States)

    He, Shui-Lian; Yang, Yang; Morrell, Peter L; Yi, Ting-Shuang

    2015-01-01

    Foxtail millet (Setaria italica (L.) Beauv) is one of the earliest domesticated grains, which has been cultivated in northern China by 8,700 years before present (YBP) and across Eurasia by 4,000 YBP. Owing to a small genome and diploid nature, foxtail millet is a tractable model crop for studying functional genomics of millets and bioenergy grasses. In this study, we examined nucleotide sequence diversity, geographic structure, and levels of linkage disequilibrium at four nuclear loci (ADH1, G3PDH, IGS1 and TPI1) in representative samples of 311 landrace accessions across its cultivated range. Higher levels of nucleotide sequence and haplotype diversity were observed in samples from China relative to other sampled regions. Genetic assignment analysis classified the accessions into seven clusters based on nucleotide sequence polymorphisms. Intralocus LD decayed rapidly to half the initial value within ~1.2 kb or less.

  12. MicroRNA categorization using sequence motifs and k-mers.

    Science.gov (United States)

    Yousef, Malik; Khalifa, Waleed; Acar, İlhan Erkin; Allmer, Jens

    2017-03-14

    Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.

  13. [Replication of Streptomyces plasmids: the DNA nucleotide sequence of plasmid pSB 24.2].

    Science.gov (United States)

    Bolotin, A P; Sorokin, A V; Aleksandrov, N N; Danilenko, V N; Kozlov, Iu I

    1985-11-01

    The nucleotide sequence of DNA in plasmid pSB 24.2, a natural deletion derivative of plasmid pSB 24.1 isolated from S. cyanogenus was studied. The plasmid amounted by its size to 3706 nucleotide pairs. The G-C composition was equal to 73 per cent. The analysis of the DNA structure in plasmid pSB 24.2 revealed the protein-encoding sequence of DNA, the continuity of which was significant for replication of the plasmid containing more than 1300 nucleotide pairs. The analysis also revealed two A-T-rich areas of DNA, the G-C composition of which was less than 55 per cent and a DNA area with a branched pin structure. The results may be of value in investigation of plasmid replication in actinomycetes and experimental cloning of DNA with this plasmid as a vector.

  14. Nucleotide sequence of the Agrobacterium tumefaciens octopine Ti plasmid-encoded tmr gene

    NARCIS (Netherlands)

    Heidekamp, F.; Dirkse, W.G.; Hille, J.; Ormondt, H. van

    1983-01-01

    The nucleotide sequence of the tmr gene, encoded by the octopine Ti plasmid from Agrobacterium tumefaciens (pTiAch5), was determined. The T-DNA, which encompasses this gene, is involved in tumor formation and maintenance, and probably mediates the cytokinin-independent growth of transformed plant

  15. Nucleotide Sequence and Characterization of the Broad-Host-Range Lactococcal Plasmid pWVO1

    NARCIS (Netherlands)

    Leenhouts, Cornelis; Tolner, Berend; Bron, Sierd; Kok, Jan; Venema, Gerhardus; Seegers, Jozef

    The nucleotide sequence of the Lactococcus lactis broad-host-range plasmid pWVO1, replicating in both gram-positive and gram-negative bacteria, was determined. This analysis revealed four open reading frames (ORFs). ORF A appeared to encode a trans-acting 26.8-kDa protein (RepA), necessary for

  16. SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

    Science.gov (United States)

    Ramu, Chenna

    2003-07-01

    SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.

  17. Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins.

    Directory of Open Access Journals (Sweden)

    David Karlin

    Full Text Available Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa, several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains that could be detected simply by comparing orthologous proteins.

  18. Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs.

    Science.gov (United States)

    Huo, Tong; Liu, Wei; Guo, Yu; Yang, Cheng; Lin, Jianping; Rao, Zihe

    2015-03-26

    Emergence of multiple drug resistant strains of M. tuberculosis (MDR-TB) threatens to derail global efforts aimed at reigning in the pathogen. Co-infections of M. tuberculosis with HIV are difficult to treat. To counter these new challenges, it is essential to study the interactions between M. tuberculosis and the host to learn how these bacteria cause disease. We report a systematic flow to predict the host pathogen interactions (HPIs) between M. tuberculosis and Homo sapiens based on sequence motifs. First, protein sequences were used as initial input for identifying the HPIs by 'interolog' method. HPIs were further filtered by prediction of domain-domain interactions (DDIs). Functional annotations of protein and publicly available experimental results were applied to filter the remaining HPIs. Using such a strategy, 118 pairs of HPIs were identified, which involve 43 proteins from M. tuberculosis and 48 proteins from Homo sapiens. A biological interaction network between M. tuberculosis and Homo sapiens was then constructed using the predicted inter- and intra-species interactions based on the 118 pairs of HPIs. Finally, a web accessible database named PATH (Protein interactions of M. tuberculosis and Human) was constructed to store these predicted interactions and proteins. This interaction network will facilitate the research on host-pathogen protein-protein interactions, and may throw light on how M. tuberculosis interacts with its host.

  19. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Science.gov (United States)

    Grimm, Guido W.; Renner, Susanne S.; Stamatakis, Alexandros; Hemleben, Vera

    2007-01-01

    The multi-copy internal transcribed spacer (ITS) region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML) and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation) instead of the full (partly redundant) original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994) 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly. PMID:19455198

  20. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Directory of Open Access Journals (Sweden)

    Guido W. Grimm

    2006-01-01

    Full Text Available The multi-copy internal transcribed spacer (ITS region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation instead of the full (partly redundant original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

  1. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs

    DEFF Research Database (Denmark)

    van Beest, M; Dooijes, D; van De Wetering, M

    2000-01-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment...

  2. Molecular cloning and complete nucleotide sequence of a human ventricular myosin light chain 1

    Energy Technology Data Exchange (ETDEWEB)

    Hoffmann, E; Shi, Q W; Floroff, M; Mickle, D A.G.; Wu, T W; Olley, P M; Jackowski, G

    1988-03-25

    Human ventricular plasmid library was constructed. The library was screened with the oligonucleotide probe (17-mer) corresponding to a conserve region of myosin light chain 1 near the carboxy terminal. Full length cDNA recombinant plasmid containing 1100 bp insert was isolated. RNA blot hybridization with this insert detected a message of approximately 1500 bp corresponding to the size of VLCl and mRNA. Complete nucleotide sequence of the coding region was determined in M13 subclones using dideoxy chain termination method. With the isolation of this clone (pCD HLVCl), the publication of the complete nucleotide sequence of HVLCl and the predicted secondary structure of this protein will aid in understanding of the biochemistry of myosin and its function in contraction, the evolution of myosin light genes and the genetic, developmental and physiological regulation of myosin genes.

  3. An algorithm and program for finding sequence specific oligo-nucleotide probes for species identification

    Directory of Open Access Journals (Sweden)

    Tautz Diethard

    2002-03-01

    Full Text Available Abstract Background The identification of species or species groups with specific oligo-nucleotides as molecular signatures is becoming increasingly popular for bacterial samples. However, it shows also great promise for other small organisms that are taxonomically difficult to tract. Results We have devised here an algorithm that aims to find the optimal probes for any given set of sequences. The program requires only a crude alignment of these sequences as input and is optimized for performance to deal also with very large datasets. The algorithm is designed such that the position of mismatches in the probes influences the selection and makes provision of single nucleotide outloops. Program implementations are available for Linux and Windows.

  4. Markovian Model in High Order Sequence Prediction From Log-Motif Patterns in Agbada Paralic Section, Niger Delta, Nigeria

    International Nuclear Information System (INIS)

    Olabode, S. O.; Adekoya, J. A.

    2002-01-01

    Markovian model in the elucidation of high order sequence was applied to repetitive events of regressive and transgressive phases in the Agbada paralic section Niger Delta. The repetitive events are made up of delta front, delta topset and fluvio-deltaic sediments. The sediments consist of sands, sandstones, siltstones and shales in various proportions. Five wells: MN1, AA1, NP2, NP6 and NP8 were studied.Summary of biostratigraphic report and well log-motif patterns was used to delineate the third order depositional sequences in the wells.Various Markovian properties - observed transition frequency matrix, observed transition probability matrix, fixed probability vector, expected random matrix (randomised transition matrix) and difference matrix were determined for stacked high order sequence (high frequency cyclic events) nested within the third-order sequences using the log-motif patterns for the various sand bodies and shales. Flow diagrams were constructed for each of the depositional sequences to know the likely occurrence of number of cycles.Upward transition matrix between the log-motif patterns and flow diagram to elucidate cyclicity show that the overall regressive sequence of the Niger Delta has been modified by deltaic depositional elements and fluctuations in sea level. The predictions of higher order sequence within third order sequences from Markovian Properties provide good basis for correlation within the depositional sequences. The model has also been used to decipher the dominant depositional processes during the formation of the sequences. Discrete reservoir intervals and seal potentials within the sequences were also predicted from the flow diagrams constructed

  5. Nucleotide sequence analysis of regions of adenovirus 5 DNA containing the origins of DNA replication

    International Nuclear Information System (INIS)

    Steenbergh, P.H.

    1979-01-01

    The purpose of the investigations described is the determination of nucleotide sequences at the molecular ends of the linear adenovirus type 5 DNA. Knowledge of the primary structure at the termini of this DNA molecule is of particular interest in the study of the mechanism of replication of adenovirus DNA. The initiation- and termination sites of adenovirus DNA replication are located at the ends of the DNA molecule. (Auth.)

  6. Inferring epidemiological dynamics of infectious diseases using Tajima's D statistic on nucleotide sequences of pathogens.

    Science.gov (United States)

    Kim, Kiyeon; Omori, Ryosuke; Ito, Kimihito

    2017-12-01

    The estimation of the basic reproduction number is essential to understand epidemic dynamics, and time series data of infected individuals are usually used for the estimation. However, such data are not always available. Methods to estimate the basic reproduction number using genealogy constructed from nucleotide sequences of pathogens have been proposed so far. Here, we propose a new method to estimate epidemiological parameters of outbreaks using the time series change of Tajima's D statistic on the nucleotide sequences of pathogens. To relate the time evolution of Tajima's D to the number of infected individuals, we constructed a parsimonious mathematical model describing both the transmission process of pathogens among hosts and the evolutionary process of the pathogens. As a case study we applied this method to the field data of nucleotide sequences of pandemic influenza A (H1N1) 2009 viruses collected in Argentina. The Tajima's D-based method estimated basic reproduction number to be 1.55 with 95% highest posterior density (HPD) between 1.31 and 2.05, and the date of epidemic peak to be 10th July with 95% HPD between 22nd June and 9th August. The estimated basic reproduction number was consistent with estimation by birth-death skyline plot and estimation using the time series of the number of infected individuals. These results suggested that Tajima's D statistic on nucleotide sequences of pathogens could be useful to estimate epidemiological parameters of outbreaks. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  7. A structural study for the optimisation of functional motifs encoded in protein sequences

    Directory of Open Access Journals (Sweden)

    Helmer-Citterich Manuela

    2004-04-01

    Full Text Available Abstract Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases, the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of

  8. cDNA cloning and nucleotide sequence comparison of Chinese hamster metallothionein I and II mRNAs

    Energy Technology Data Exchange (ETDEWEB)

    Griffith, B B; Walters, R A; Enger, M D; Hildebrand, C E; Griffith, J K

    1983-01-01

    Polyadenylated RNA was extracted from a cadmium resistant Chinese hamster (CHO) cell line, enriched for metal-induced, abundant RNA sequences and cloned as double-stranded cDNA in the plasmid pBR322. Two cDNA clones, pCHMT1 and pCHMT2, encoding two Chinese hamster isometallothioneins were identified, and the nucleotide sequence of each insert was determined. The two Chinese hamster metallothioneins show nucleotide sequence homologies of 80% in the protein coding region and approximately 35% in both the 5' and 3' untranslated regions. Interestingly, an 8 nucleotide sequence (TGTAAATA) has been conserved in sequence and position in the 3' untranslated regions of each metallothionein mRNA sequenced thus far. Estimated nucleotide substitution rates derived from interspecies comparisons were used to calculate a metallothionein gene duplication time of 45 to 120 million years ago. 39 references, 1 figure, 1 table.

  9. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  10. Complete nucleotide sequence of a novel Hibiscus-infecting Cilevirus from Florida and its relationship with closely associated Cileviruses

    Science.gov (United States)

    The complete nucleotide sequence of a recently discovered Florida (FL) isolate of Hibiscus infecting Cilevirus (HiCV) was determined by Sanger sequencing. The movement- and coat- protein gene sequences of the HiCV-FL isolate are more divergent than other genes of the previously sequenced HiCV-HA (Ha...

  11. Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

    Directory of Open Access Journals (Sweden)

    William R. Gallaher

    2015-01-01

    Full Text Available Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP and the full length glycoprotein (GP, which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4 of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis.

  12. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

    Science.gov (United States)

    Dunn, Joshua G; Weissman, Jonathan S

    2016-11-22

    Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily

  13. The nucleotide sequence of human transition protein 1 cDNA

    Energy Technology Data Exchange (ETDEWEB)

    Luerssen, H; Hoyer-Fender, S; Engel, W [Universitaet Goettingen (West Germany)

    1988-08-11

    The authors have screened a human testis cDNA library with an oligonucleotide of 81 mer prepared according to a part of the published nucleotide sequence of the rat transition protein TP 1. They have isolated a cDNA clone with the length of 441 bp containing the coding region of 162 bp for human transition protein 1. There is about 84% homology in the coding region of the sequence compared to rat. The human cDNA-clone encodes a polypeptide of 54 amino acids of which 7 are different to that of rat.

  14. Flow Cytometry-Assisted Cloning of Specific Sequence Motifs from Complex 16S rRNA Gene Libraries

    DEFF Research Database (Denmark)

    Nielsen, Jeppe Lund; Schramm, Andreas; Bernhard, Anne E.

    2004-01-01

    for Systems Biology,3 Seattle, Washington, and Department of Ecological Microbiology, University of Bayreuth, Bayreuth, Germany2 A flow cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting......  FLOW CYTOMETRY-ASSISTED CLONING OF SPECIFIC SEQUENCE MOTIFS FROM COMPLEX 16S RRNA GENE LIBRARIES Jeppe L. Nielsen,1 Andreas Schramm,1,2 Anne E. Bernhard,1 Gerrit J. van den Engh,3 and David A. Stahl1* Department of Civil and Environmental Engineering, University of Washington,1 and Institute......-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant in a clone library of environmental 16S rRNA genes.  ...

  15. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    Energy Technology Data Exchange (ETDEWEB)

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.

    1987-02-01

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.

  16. Nucleotide sequence, transcript mapping, and regulation of the RAD2 gene of Saccharomyces cerevisiae

    International Nuclear Information System (INIS)

    Madura, K.; Prakash, S.

    1986-01-01

    The authors determined the nucleotide sequence, mapped the 5' and 3' nRNA termini, and examined the regulation of the RAD2 gene of Saccharomyces cerevisiae. A long open reading frame within the RAD2 transcribed region encodes a protein of 1031 amino acids with a calculated molecular weight of 117,847. A disruption of the RAD2 gene that deletes the 78 carboxyl terminal codons results in loss of RAD2 function. The 5' ends of RAD2 mRNA show considerable heterogeneity, mapping 5 to 62 nucleotides upstream of the first ATG codon of the long RAD2 open reading frame. The longest RAD2 transcripts also contain a short open reading frame of 37 codons that precedes and overlaps the 5' end of the long RAD2 open reading frame. The RAD2 3' nRNA end maps 171 nucleotides downstream of the TAA termination codon and 20 nucleotides downstream from a 12-base-pair inverted repeat that might function in transcript termination. Northern blot analysis showed a ninefold increase in steady-state levels of RAD2 mRNA after treatment of yeast cells with UV light. The 5' flanking region of the RAD2 gene contains several direct and inverted repeats and a 44-nuclotide-long purine-rich tract. The sequence T G G A G G C A T T A A found at position - 167 to -156 in the RAD2 gene is similar to at sequence present in the 5' flanking regions of the RAD7 and RAD10 genes

  17. Nucleotide sequence determination of the region in adenovirus 5 DNA involved in cell transformation

    International Nuclear Information System (INIS)

    Maat, J.

    1978-01-01

    A description is given of investigations into the primary structure of the transforming region of adenovirus type 5 DNA. The phenomenon of cell transformation is discussed in general terms and the principles of a number of fairly recent techniques, which have been in use for DNA sequence determination since 1975 are dealt with. A few of the author's own techniques are described which deal both with nucleotide sequence analysis and with the determination of DNA cleavage sites of restriction endonucleases. The results are given of the mapping of cleavage sites in the HpaI-E fragment of adenovirus DNA of HpaII, HaeIII, AluI, HinfI and TaqI and of the determination of the nucleotide sequence in the transforming region of adenovirus type 5 DNA. The results of the sequence determination of the Ad5 HindIII-G fragment are discussed in relation with the investigation on the transforming proteins isolated from in vitro and in vivo synthesizing systems. Labelling procedures of DNA are described including the exonuclease III/DNA polymerase 1 method and TA polynucleotide kinase labelling of DNA fragments. (Auth.)

  18. Nucleotide sequence and taxonomy of Cycas necrotic stunt virus. Brief report.

    Science.gov (United States)

    Han, S S; Karasev, A V; Ieki, H; Iwanami, T

    2002-11-01

    Cycas necrotic stunt virus (CNSV) is the only well-characterized virus from gymnosperm. cDNA segments corresponding to the bipartite genome RNAs (RNA1, RNA2) were synthesized and sequenced. Each RNA encoded a single polyprotein, flanked by the 5' and 3' non-coding regions (NCR) and followed by a poly (A) tail. The putative polyproteins encoded by RNA1 and RNA2 had sets of motifs, which were characteristic of viruses in the genus Nepovirus. The polyproteins showed higher sequence identities to Artichoke Italian latent virus, Grapevine chrome mosaic virus and Tomato black ring virus, all of which belong to subgroup b of the genus Nepovirus, than to other nepoviruses. Phylogenetic analysis of RNA dependent RNA polymerase and coat protein also showed closer relationships with these viruses than other viruses. The data obtained supported the taxonomical status of CNSV as a definitive member of the genus Nepovirus, subgroup b.

  19. A novel method to discover fluoroquinolone antibiotic resistance (qnr genes in fragmented nucleotide sequences

    Directory of Open Access Journals (Sweden)

    Boulund Fredrik

    2012-12-01

    Full Text Available Abstract Background Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered qnr genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, qnr genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of qnr genes in more detail. Results In this paper we describe a new method to identify qnr genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of qnr genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated qnr genes. In addition, several fragments from novel putative qnr genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature. Conclusions The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of qnr genes in nucleotide sequence data. The predicted novel putative qnr genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at http://bioinformatics.math.chalmers.se/qnr/.

  20. Analysis of the genome sequence of the pathogenic Muscovy duck parvovirus strain YY reveals a 14-nucleotide-pair deletion in the inverted terminal repeats.

    Science.gov (United States)

    Wang, Jianye; Huang, Yu; Zhou, Mingxu; Zhu, Guoqiang

    2016-09-01

    Genomic information about Muscovy duck parvovirus is still limited. In this study, the genome of the pathogenic MDPV strain YY was sequenced. The full-length genome of YY is 5075 nucleotides (nt) long, 57 nt shorter than that of strain FM. Sequence alignment indicates that the 5' and 3' inverted terminal repeats (ITR) of strain YY contain a 14-nucleotide-pair deletion in the stem of the palindromic hairpin structure in comparison to strain FM and FZ91-30. The deleted region contains one "E-box" site and one repeated motif with the sequence "TTCCGGT" or "ACCGGAA". Phylogenetic trees constructed based the protein coding genes concordantly showed that YY, together with nine other MDPV isolates from various places, clustered in a separate branch, distinct from the branch formed by goose parvovirus (GPV) strains. These results demonstrate that, despite the distinctive deletion, the YY strain still belongs to the classical MDPV group. Moreover, the deletion of ITR may contribute to the genome evolution of MDPV under immunization pressure.

  1. Conservation of nucleotide sequences for molecular diagnosis of Middle East respiratory syndrome coronavirus, 2015

    Directory of Open Access Journals (Sweden)

    Yuki Furuse

    2015-11-01

    Full Text Available Infection due to the Middle East respiratory syndrome coronavirus (MERS-CoV is widespread. The present study was performed to assess the protocols used for the molecular diagnosis of MERS-CoV by analyzing the nucleotide sequences of viruses detected between 2012 and 2015, including sequences from the large outbreak in eastern Asia in 2015. Although the diagnostic protocols were established only 2 years ago, mismatches between the sequences of primers/probes and viruses were found for several of the assays. Such mismatches could lead to a lower sensitivity of the assay, thereby leading to false-negative diagnosis. A slight modification in the primer design is suggested. Protocols for the molecular diagnosis of viral infections should be reviewed regularly after they are established, particularly for viruses that pose a great threat to public health such as MERS-CoV.

  2. Molecular characterisation and nucleotide sequence analysis of canine parvovirus strains in vaccines in India

    Directory of Open Access Journals (Sweden)

    Sukdeb Nandi

    2010-03-01

    Full Text Available Canine parvovirus 2 (CPV‑2 is one of the most important viruses that causes haemorrhagic gastroenteritis and myocarditis of dogs worldwide. The picture has been complicated further due to the emergence of new mutants of CPV, namely: CPV‑2a, CPV‑2b and CPV‑2c. In this study, the molecular characterisation of strains present in the CPV vaccines available on the Indian market was performed using polymerase chain reaction and DNA sequencing. The VP1/VP2 genes of two vaccine strains and a field strain (Bhopal were sequenced and the nucleotide and the deduced amino acid sequences were compared. The results indicated that the isolate belonged to CPV type 2b and the strains in the vaccines belonged to type CPV‑2. From the study, it is inferred that the CPV strain used in commercially available vaccine preparation differed from the strains present in CPV infection in dogs in India

  3. Molecular characterisation and nucleotide sequence analysis of canine parvovirus strains in vaccines in India.

    Science.gov (United States)

    Nandi, Sukdeb; Anbazhagan, Rajendra; Kumar, Manoj

    2010-01-01

    Canine parvovirus 2 (CPV-2) is one of the most important viruses that causes haemorrhagic gastroenteritis and myocarditis of dogs worldwide. The picture has been complicated further due to the emergence of new mutants of CPV, namely: CPV-2a, CPV-2b and CPV-2c. In this study, the molecular characterisation of strains present in the CPV vaccines available on the Indian market was performed using polymerase chain reaction and DNA sequencing. The VP1/VP2 genes of two vaccine strains and a field strain (Bhopal) were sequenced and the nucleotide and the deduced amino acid sequences were compared. The results indicated that the isolate belonged to CPV type 2b and the strains in the vaccines belonged to type CPV-2. From the study, it is inferred that the CPV strain used in commercially available vaccine preparation differed from the strains present in CPV infection in dogs in India.

  4. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    Science.gov (United States)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  5. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms.

    Science.gov (United States)

    Taillon-Miller, P; Gu, Z; Li, Q; Hillier, L; Kwok, P Y

    1998-07-01

    An efficient strategy to develop a dense set of single-nucleotide polymorphism (SNP) markers is to take advantage of the human genome sequencing effort currently under way. Our approach is based on the fact that bacterial artificial chromosomes (BACs) and P1-based artificial chromosomes (PACs) used in long-range sequencing projects come from diploid libraries. If the overlapping clones sequenced are from different lineages, one is comparing the sequences from 2 homologous chromosomes in the overlapping region. We have analyzed in detail every SNP identified while sequencing three sets of overlapping clones found on chromosome 5p15.2, 7q21-7q22, and 13q12-13q13. In the 200.6 kb of DNA sequence analyzed in these overlaps, 153 SNPs were identified. Computer analysis for repetitive elements and suitability for STS development yielded 44 STSs containing 68 SNPs for further study. All 68 SNPs were confirmed to be present in at least one of the three (Caucasian, African-American, Hispanic) populations studied. Furthermore, 42 of the SNPs tested (62%) were informative in at least one population, 32 (47%) were informative in two or more populations, and 23 (34%) were informative in all three populations. These results clearly indicate that developing SNP markers from overlapping genomic sequence is highly efficient and cost effective, requiring only the two simple steps of developing STSs around the known SNPs and characterizing them in the appropriate populations.

  6. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Science.gov (United States)

    2010-07-01

    ... may not include material other than part of the sequence listing. A fixed-width font should be used... integer expressing the number of bases or amino acid residues M. Type Whether presented sequence molecule is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type...

  7. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    Science.gov (United States)

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  8. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    Science.gov (United States)

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-04-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amino acid polypeptide of 31,372 daltons.

  9. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    OpenAIRE

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-01-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amin...

  10. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    Science.gov (United States)

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-01-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amino acid polypeptide of 31,372 daltons. Images PMID:3514578

  11. Association Mapping and Nucleotide Sequence Variation in Five Drought Tolerance Candidate Genes in Spring Wheat

    Directory of Open Access Journals (Sweden)

    Erena A. Edae

    2013-07-01

    Full Text Available Functional markers are needed for key genes involved in drought tolerance to improve selection for crop yield under moisture stress conditions. The objectives of this study were to (i characterize five drought tolerance candidate genes, namely dehydration responsive element binding 1A (, enhanced response to abscisic acid ( and , and fructan 1-exohydrolase ( and , in wheat ( L. for nucleotide and haplotype diversity, Tajima’s D value, and linkage disequilibrium (LD and (ii associate within-gene single nucleotide polymorphisms (SNPs with phenotypic traits in a spring wheat association mapping panel ( = 126. Field trials were grown under contrasting moisture regimes in Greeley, CO, and Melkassa, Ethiopia, in 2010 and 2011. Genome-specific amplification and DNA sequence analysis of the genes identified SNPs and revealed differences in nucleotide and haplotype diversity, Tajima’s D, and patterns of LD. showed associations (false discovery rate adjusted probability value = 0.1 with normalized difference vegetation index, heading date, biomass, and spikelet number. Both and were associated with harvest index, flag leaf width, and leaf senescence. was associated with grain yield, and was associated with thousand kernel weight and test weight. If validated in relevant genetic backgrounds, the identified marker–trait associations may be applied to functional marker-assisted selection.

  12. Two sequence motifs from HIF-1α bind to the DNA-binding site of p53

    OpenAIRE

    Hansson, Lars O.; Friedler, Assaf; Freund, Stefan; Rüdiger, Stefan; Fersht, Alan R.

    2002-01-01

    There is evidence that hypoxia-inducible factor-1α (HIF-1α) interacts with the tumor suppressor p53. To characterize the putative interaction, we mapped the binding of the core domain of p53 (p53c) to an array of immobilized HIF-1α-derived peptides and found two peptide-sequence motifs that bound to p53c with micromolar affinity in solution. One sequence was adjacent to and the other coincided with the two proline residues of the oxygen-dependent degradation domain (P402 and P564) that act as...

  13. Nucleotide sequence alignment of hdcA from Gram-positive bacteria.

    Science.gov (United States)

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; Del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A

    2016-03-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4].

  14. Molecular cloning and nucleotide sequence of CYP6BF1 from the diamondback moth, Plutella xylostella

    Science.gov (United States)

    Li, Hongshan; Dai, Huaguo; Wei, Hui

    2005-01-01

    A novel cDNA clong encoding a cytochrome P450 was screened from the insecticide-susceptible strain of Plutella xylostella (L.) (Lepidoptera:Yponomeutidae). The nucleotide sequence of the clone, designated CYP6BF1, was determined. This is the first full-length sequence of the CYP6 family from Plutella xylostella (L.). The cDNA is 1661bp in length and contains an open reading frame from base pairs 26 to 1570, encoding a protein of 514 amino acid residues. It is similar to the other insect P450s in gene family 6, including CYP6AE1 from Depressaria pastinacella, (46%). The GenBank accession number is AY971374. PMID:17119627

  15. Partial nucleotide sequence analysis of 18S ribosomal RNA gene of the four genotypes of Trypanosoma congolense

    International Nuclear Information System (INIS)

    Osanya, A.; Majiwa, P.A.O.; Kinyanjui, P.W.

    2006-01-01

    Specific oligonucleotide primers based on conserved nucleotide sequences of 18s ribisomal RNA (18s rRNA) gene of Trypanosoma brucei, Leishmania donovani, Triponema aequale and Lagenidium gigantum have been designed and used in the ploymerase chain reaction (PCR) to amplify genomic DNA from four different clones each representing a different genotypic group of T. congolence. PCR products of approximately 1Kb were generated using as template DNA from each of the trypanosomes. The PCR products cross-hybridized with genomic DNA from T.brucei, T. simiae and the four genotypes of T.congolense implying significant sequence homology of 18S rRNA gene among trypanosomes. The nucleotide sequence of a segment of the PCR products were determined by direct sequencing to provide partial nucleotide sequence of the 18s rRNA gene in each T.congolense genotypic group. The sequences obtained together with those that have been published for T.brucei reveals that although most regions show inter and intra species nucleotide identity, there are several sites where deletions, insertions and base changes have occured in nucleotide sequence of of T.brucei and the four genotypes of T.congolense.(author)

  16. Complete nucleotide sequence of the RNA-2 of grapevine deformation and Grapevine Anatolian ringspot viruses.

    Science.gov (United States)

    Ghanem-Sabanadzovic, Nina Abou; Sabanadzovic, Sead; Digiaro, Michele; Martelli, Giovanni P

    2005-05-01

    The nucleotide sequence of RNA-2 of Grapevine Anatolian ringspot virus (GARSV) and Grapevine deformation virus (GDefV), two recently described nepoviruses, has been determined. These RNAs are 3753 nt (GDefV) and 4607 nt (GARSV) in size and contain a single open reading frame encoding a polyprotein of 122 kDa (GDefV) and 150 kDa (GARSV). Full-length nucleotide sequence comparison disclosed 71-73% homology between GDefV RNA-2 and that of Grapevine fanleaf virus (GFLV) and Arabis mosaic virus (ArMV), and 62-64% homology between GARSV RNA-2 and that of Grapevine chrome mosaic virus (GCMV) and Tomato black ring virus (TBRV). As previously observed in other nepoviruses, the 5' non-coding regions of both RNAs are capable of forming stem-loop structures. Phylogenetic analysis of the three proteins encoded by RNA-2 (i.e. protein 2A, movement protein and coat protein) confirmed that GDefV and GARSV are distinct viruses which can be assigned as definitive species in subgroup A and subgroup B of the genus Nepovirus, respectively.

  17. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    Science.gov (United States)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  18. Nucleotide sequence of cloned cDNA for human sphingolipid activator protein 1 precursor

    International Nuclear Information System (INIS)

    Dewji, N.N.; Wenger, D.A.; O'Brien, J.S.

    1987-01-01

    Two cDNA clones encoding prepro-sphingolipid activator protein 1 (SAP-1) were isolated from a λ gt11 human hepatoma expression library using polyclonal antibodies. These had inserts of ≅ 2 kilobases (λ-S-1.2 and λ-S-1.3) and both were both homologous with a previously isolated clone (λ-S-1.1) for mature SAP-1. The authors report here the nucleotide sequence of the longer two EcoRI fragments of S-1.2 and S-1.3 that were not the same and the derived amino acid sequences of mature SAP-1 and its prepro form. The open reading frame encodes 19 amino acids, which are colinear with the amino-terminal sequence of mature SAP-1, and extends far beyond the predicted carboxyl terminus of mature SAP-1, indicating extensive carboxyl-terminal processing. The nucleotide sequence of cDNA encoding prepro-SAP-1 includes 1449 bases from the assigned initiation codon ATG at base-pair 472 to the stop codon TGA at base-pair 1921. The first 23 amino acids coded after the initiation ATG are characteristic of a signal peptide. The calculated molecular mass for a polypeptide encoded by 1449 bases is ≅ 53 kDa, in keeping with the reported value for pro-SAP-1. The data indicate that after removal of the signal peptide mature SAP-1 is generated by removing an additional 7 amino acids from the amino terminus and ≅ 373 amino acids from the carboxyl terminus. One potential glycosylation site was previously found in mature SAP-1. Three additional potential glycosylation sites are present in the processed carboxyl-terminal polypeptide, which they designate as P-2

  19. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.; Rangkuti, Farania; Schramm, Michael C.; Jankovic, Boris R.; Kamau, Allan; Chowdhary, Rajesh; Archer, John A.C.; Bajic, Vladimir B.

    2011-01-01

    . These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity

  20. Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

    Directory of Open Access Journals (Sweden)

    White Frank F

    2011-07-01

    Full Text Available Abstract Background Eight diverse sorghum (Sorghum bicolor L. Moench accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs. Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.

  1. Defining a conformational consensus motif in cotransin-sensitive signal sequences: a proteomic and site-directed mutagenesis study.

    Directory of Open Access Journals (Sweden)

    Wolfgang Klein

    Full Text Available The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity.

  2. Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

    Science.gov (United States)

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945

  3. The nucleotide sequence of parsnip yellow fleck virus: a plant picorna-like virus.

    Science.gov (United States)

    Turnbull-Ross, A D; Reavy, B; Mayo, M A; Murant, A F

    1992-12-01

    The complete sequence of 9871 nucleotides (nts) of parsnip yellow fleck virus (PYFV; isolate P-121) was determined from cDNA clones and by direct sequencing of viral RNA. The RNA contains a large open reading frame between nts 279 and 9362 which encodes a polyprotein of 3027 amino acids with a calculated M(r) of 336212 (336K). A PYFV polyclonal antiserum reacted with the proteins expressed from phage carrying cDNA clones from the 5' half of the PYFV genome. Comparison of the polyprotein sequence of PYFV with other viral polyprotein sequences reveals similarities to the putative NTP-binding and RNA polymerase domains of cowpea mosaic comovirus, tomato black ring nepovirus and several animal picornaviruses. The 3' untranslated region of PYFV RNA is 509 nts long and does not have a poly(A) tail. The 3'-terminal 121 nts may form a stem-loop structure which resembles that formed in the genomic RNA of mosquito-borne flaviviruses.

  4. The nucleotide sequence of a Polish isolate of Tomato torrado virus.

    Science.gov (United States)

    Budziszewska, Marta; Obrepalska-Steplowska, Aleksandra; Wieczorek, Przemysław; Pospieszny, Henryk

    2008-12-01

    A new virus was isolated from greenhouse tomato plants showing symptoms of leaf and apex necrosis in Wielkopolska province in Poland in 2003. The observed symptoms and the virus morphology resembled viruses previously reported in Spain called Tomato torrado virus (ToTV) and that in Mexico called Tomato marchitez virus (ToMarV). The complete genome of a Polish isolate Wal'03 was determined using RT-PCR amplification using oligonucleotide primers developed against the ToTV sequences deposited in Genbank, followed by cloning, sequencing, and comparison with the sequence of the type isolate. Phylogenetic analyses, performed on the basis of fragments of polyproteins sequences, established the relationship of Polish isolate Wal'03 with Spanish ToTV and Mexican ToMarV, as well as with other viruses from Sequivirus, Sadwavirus, and Cheravirus genera, reported to be the most similar to the new tomato viruses. Wal'03 genome strands has the same organization and very high homology with the ToTV type isolate, showing only some nucleotide and deduced amino acid changes, in contrast to ToMarV, which was significantly different. The phylogenetic tree clustered aforementioned viruses to the same group, indicating that they have a common origin.

  5. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  6. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Science.gov (United States)

    2010-07-01

    ... mature protein, with the number 1. When presented, the amino acids preceding the mature protein, e.g... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter... data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  7. Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.

    Science.gov (United States)

    Seo, Dong-Won; Oh, Jae-Don; Jin, Shil; Song, Ki-Duk; Park, Hee-Bok; Heo, Kang-Nyeong; Shin, Younhee; Jung, Myunghee; Park, Junhyung; Jo, Cheorun; Lee, Hak-Kyo; Lee, Jun-Heon

    2015-02-01

    There are five native chicken lines in Korea, which are mainly classified by plumage colors (black, white, red, yellow, gray). These five lines are very important genetic resources in the Korean poultry industry. Based on a next generation sequencing technology, whole genome sequence and reference assemblies were performed using Gallus_gallus_4.0 (NCBI) with whole genome sequences from these lines to identify common and novel single nucleotide polymorphisms (SNPs). We obtained 36,660,731,136 ± 1,257,159,120 bp of raw sequence and average 26.6-fold of 25-29 billion reference assembly sequences representing 97.288 % coverage. Also, 4,006,068 ± 97,534 SNPs were observed from 29 autosomes and the Z chromosome and, of these, 752,309 SNPs are the common SNPs across lines. Among the identified SNPs, the number of novel- and known-location assigned SNPs was 1,047,951 ± 14,956 and 2,948,648 ± 81,414, respectively. The number of unassigned known SNPs was 1,181 ± 150 and unassigned novel SNPs was 8,238 ± 1,019. Synonymous SNPs, non-synonymous SNPs, and SNPs having character changes were 26,266 ± 1,456, 11,467 ± 604, 8,180 ± 458, respectively. Overall, 443,048 ± 26,389 SNPs in each bird were identified by comparing with dbSNP in NCBI. The presently obtained genome sequence and SNP information in Korean native chickens have wide applications for further genome studies such as genetic diversity studies to detect causative mutations for economic and disease related traits.

  8. De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

    DEFF Research Database (Denmark)

    Ruzzo, Walter L; Gorodkin, Jan

    2014-01-01

    De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphas...... on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.......De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis...

  9. Requirement for asparagine in the aquaporin NPA sequence signature motifs for cation exclusion

    DEFF Research Database (Denmark)

    Wree, Dorothea; Wu, Binghua; Zeuthen, Thomas

    2011-01-01

    Two highly conserved NPA motifs are a hallmark of the aquaporin (AQP) family. The NPA triplets form N-terminal helix capping structures with the Asn side chains located in the centre of the water or solute-conducting channel, and are considered to play an important role in AQP selectivity. Although...... interchangeable at both NPA sites without affecting protein expression or water, glycerol and methylamine permeability. However, other mutations in the NPA region led to reduced permeability (S186C and S186D), to nonfunctional channels (N64D), or even to lack of protein expression (S186A and S186T). Using...... electrophysiology, we found that an analogous mammalian AQP1 N76S mutant excluded protons and potassium ions, but leaked sodium ions, providing an argument for the overwhelming prevalence of Asn over other amino acids. We conclude that, at the first position in the NPA motifs, only Asn provides efficient helix cap...

  10. Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs.

    Directory of Open Access Journals (Sweden)

    Michael Allevato

    Full Text Available The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX bind Enhancer box (E-box DNA elements (CANNTG and have the greatest affinity for the canonical MYC E-box (CME CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a "non-specific" fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87% of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought.

  11. Pervasive within-Mitochondrion Single-Nucleotide Variant Heteroplasmy as Revealed by Single-Mitochondrion Sequencing

    Directory of Open Access Journals (Sweden)

    Jacqueline Morris

    2017-12-01

    Full Text Available Summary: A number of mitochondrial diseases arise from single-nucleotide variant (SNV accumulation in multiple mitochondria. Here, we present a method for identification of variants present at the single-mitochondrion level in individual mouse and human neuronal cells, allowing for extremely high-resolution study of mitochondrial mutation dynamics. We identified extensive heteroplasmy between individual mitochondrion, along with three high-confidence variants in mouse and one in human that were present in multiple mitochondria across cells. The pattern of variation revealed by single-mitochondrion data shows surprisingly pervasive levels of heteroplasmy in inbred mice. Distribution of SNV loci suggests inheritance of variants across generations, resulting in Poisson jackpot lines with large SNV load. Comparison of human and mouse variants suggests that the two species might employ distinct modes of somatic segregation. Single-mitochondrion resolution revealed mitochondria mutational dynamics that we hypothesize to affect risk probabilities for mutations reaching disease thresholds. : Morris et al. use independent sequencing of multiple individual mitochondria from mouse and human brain cells to show high pervasiveness of mutations. The mutations are heteroplasmic within single mitochondria and within and between cells. These findings suggest mechanisms by which mutations accumulate over time, resulting in mitochondrial dysfunction and disease. Keywords: single mitochondrion, single cell, human neuron, mouse neuron, single-nucleotide variation

  12. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element.

    Science.gov (United States)

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-07-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5'-NNCCAC-3' and 5'-GCGMGN'N'-3' (M:A or C; N and N' form Watson-Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences.

  13. Evaluation of atpB nucleotide sequences for phylogenetic studies of ferns and other pteridophytes.

    Science.gov (United States)

    Wolf, P

    1997-10-01

    Inferring basal relationships among vascular plants poses a major challenge to plant systematists. The divergence events that describe these relationships occurred long ago and considerable homoplasy has since accrued for both molecular and morphological characters. A potential solution is to examine phylogenetic analyses from multiple data sets. Here I present a new source of phylogenetic data for ferns and other pteridophytes. I sequenced the chloroplast gene atpB from 23 pteridophyte taxa and used maximum parsimony to infer relationships. A 588-bp region of the gene appeared to contain a statistically significant amount of phylogenetic signal and the resulting trees were largely congruent with similar analyses of nucleotide sequences from rbcL. However, a combined analysis of atpB plus rbcL produced a better resolved tree than did either data set alone. In the shortest trees, leptosporangiate ferns formed a monophyletic group. Also, I detected a well-supported clade of Psilotaceae (Psilotum and Tmesipteris) plus Ophioglossaceae (Ophioglossum and Botrychium). The demonstrated utility of atpB suggests that sequences from this gene should play a role in phylogenetic analyses that incorporate data from chloroplast genes, nuclear genes, morphology, and fossil data.

  14. Nucleotide sequences of two genomic DNAs encoding peroxidase of Arabidopsis thaliana.

    Science.gov (United States)

    Intapruk, C; Higashimura, N; Yamamoto, K; Okada, N; Shinmyo, A; Takano, M

    1991-02-15

    The peroxidase (EC 1.11.1.7)-encoding gene of Arabidopsis thaliana was screened from a genomic library using a cDNA encoding a neutral isozyme of horseradish, Armoracia rusticana, peroxidase (HRP) as a probe, and two positive clones were isolated. From the comparison with the sequences of the HRP-encoding genes, we concluded that two clones contained peroxidase-encoding genes, and they were named prxCa and prxEa. Both genes consisted of four exons and three introns; the introns had consensus nucleotides, GT and AG, at the 5' and 3' ends, respectively. The lengths of each putative exon of the prxEa gene were the same as those of the HRP-basic-isozyme-encoding gene, prxC3, and coded for 349 amino acids (aa) with a sequence homology of 89% to that encoded by prxC3. The prxCa gene was very close to the HRP-neutral-isozyme-encoding gene, prxC1b, and coded for 354 aa with 91% homology to that encoded by prxC1b. The aa sequence homology was 64% between the two peroxidases encoded by prxCa and prxEa.

  15. A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays.

    Science.gov (United States)

    Lee, Mei-Ling Ting; Bulyk, Martha L; Whitmore, G A; Church, George M

    2002-12-01

    There is considerable scientific interest in knowing the probability that a site-specific transcription factor will bind to a given DNA sequence. Microarray methods provide an effective means for assessing the binding affinities of a large number of DNA sequences as demonstrated by Bulyk et al. (2001, Proceedings of the National Academy of Sciences, USA 98, 7158-7163) in their study of the DNA-binding specificities of Zif268 zinc fingers using microarray technology. In a follow-up investigation, Bulyk, Johnson, and Church (2002, Nucleic Acid Research 30, 1255-1261) studied the interdependence of nucleotides on the binding affinities of transcription proteins. Our article is motivated by this pair of studies. We present a general statistical methodology for analyzing microarray intensity measurements reflecting DNA-protein interactions. The log probability of a protein binding to a DNA sequence on an array is modeled using a linear ANOVA model. This model is convenient because it employs familiar statistical concepts and procedures and also because it is effective for investigating the probability structure of the binding mechanism.

  16. Complete nucleotide sequence of watermelon chlorotic stunt virus originating from Oman.

    Science.gov (United States)

    Khan, Akhtar J; Akhtar, Sohail; Briddon, Rob W; Ammara, Um; Al-Matrooshi, Abdulrahman M; Mansoor, Shahid

    2012-07-01

    Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6-99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93-98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed.

  17. Comparison of Nucleotide Sequence of P2C Region in Diabetogenic and Non-Diabetogenic Coxsackie Virus B5 Isolates

    Directory of Open Access Journals (Sweden)

    Cheng-Chong Chou

    2004-11-01

    Full Text Available Enteroviruses are environmental triggers in the pathogenesis of type 1 diabetes mellitus (DM. A sequence of six identical amino acids (PEVKEK is shared by the 2C protein of Coxsackie virus B and the glutamic acid decarboxylase (GAD molecules. Between 1995 and 2002, we investigated 22 Coxsackie virus B5 (CVB5 isolates from southern Taiwan. Four of these isolates were obtained from four new-onset type 1 DM patients with diabetic ketoacidosis. We compared a 300 nucleotide sequence in the 2C protein gene (p2C in 24 CVB5 isolates (4 diabetogenic, 18 non-diabetogenic and 2 prototype. We found 0.3-10% nucleotide differences. In the four isolates from type 1 DM patients, there was only 2.4-3.4% nucleotide difference, and there was only 1.7-7.1% nucleotide difference between type 1 DM isolates and non-diabetogenic isolates. Comparison of the nucleotide sequence between prototype virus and 22 CVB5 isolates revealed 18.4-24.1% difference. Twenty-one CVB5 isolates from type 1 DM and non-type 1 DM patients contained the PEVKEK sequence, as shown by the p2C nucleotide sequence. Our data showed that the viral p2C sequence with homology with GAD is highly conserved in CVB5 isolates. There was no difference between diabetogenic and non-diabetogenic CVB5 isolates. All four type 1 DM patients had at least one of the genetic susceptibility alleles HLA-DR, DQA1, DQB1. Other genetic and autoimmune factors such as HLA genetic susceptibility and GAD may also play important roles in the pathogenesis in type 1 DM.

  18. The complete nucleotide sequence of Alternanthera mosaic virus infecting Portulaca grandiflora represents a new strain distinct from phlox isolates.

    Science.gov (United States)

    Ivanov, Peter A; Mukhamedzhanova, Anna A; Smirnov, Alexander A; Rodionova, Nina P; Karpova, Olga V; Atabekov, Joseph G

    2011-04-01

    A southeastern European isolate of Alternanthera mosaic virus (AltMV-MU) of the genus Potexvirus (family Flexiviridae) was purified from the ornamental plant Portulaca grandiflora. The complete nucleotide sequence (6606 nucleotides) of AltMV-MU genomic RNA was defined. The AltMV-MU genome is different from those of all isolates described earlier and is most closely related to genomes of partly sequenced portulaca isolates AltMV-Po (America) and AltMV-It (Italy). Phylogenetic analysis supports the view that AltMV-MU belongs to a new "portulaca" genotype distinguishable from the "phlox" genotype.

  19. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition.

    Science.gov (United States)

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Acinas, Silvia G; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E; Stepanauskas, Ramunas; Sullivan, Matthew B; Brum, Jennifer R; Duhaime, Melissa B; Poulos, Bonnie T; Hurwitz, Bonnie L; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-08-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.

  20. Population structure of pigs determined by single nucleotide polymorphisms observed in assembled expressed sequence tags.

    Science.gov (United States)

    Matsumoto, Toshimi; Okumura, Naohiko; Uenishi, Hirohide; Hayashi, Takeshi; Hamasima, Noriyuki; Awata, Takashi

    2012-01-01

    We have collected more than 190000 porcine expressed sequence tags (ESTs) from full-length complementary DNA (cDNA) libraries and identified more than 2800 single nucleotide polymorphisms (SNPs). In this study, we tentatively chose 222 SNPs observed in assembled ESTs to study pigs of different breeds; 104 were selected by comparing the cDNA sequences of a Meishan pig and samples of three-way cross pigs (Landrace, Large White, and Duroc: LWD), and 118 were selected from LWD samples. To evaluate the genetic variation between the chosen SNPs from pig breeds, we determined the genotypes for 192 pig samples (11 pig groups) from our DNA reference panel with matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Of the 222 reference SNPs, 186 were successfully genotyped. A neighbor-joining tree showed that the pig groups were classified into two large clusters, namely, Euro-American and East Asian pig populations. F-statistics and the analysis of molecular variance of Euro-American pig groups revealed that approximately 25% of the genetic variations occurred because of intergroup differences. As the F(IS) values were less than the F(ST) values(,) the clustering, based on the Bayesian inference, implied that there was strong genetic differentiation among pig groups and less divergence within the groups in our samples. © 2011 The Authors. Animal Science Journal © 2011 Japanese Society of Animal Science.

  1. Characterization of Sri Lanka rabies virus isolates using nucleotide sequence analysis of nucleoprotein gene.

    Science.gov (United States)

    Arai, Y T; Takahashi, H; Kameoka, Y; Shiino, T; Wimalaratne, O; Lodmell, D L

    2001-01-01

    Thirty-four suspected rabid brain samples from 2 humans, 24 dogs, 4 cats, 2 mongooses, I jackal and I water buffalo were collected in 1995-1996 in Sri Lanka. Total RNA was extracted directly from brain suspensions and examined using a one-step reverse transcription-polymerase chain reaction (RT-PCR) for the rabies virus nucleoprotein (N) gene. Twenty-eight samples were found positive for the virus N gene by RT-PCR and also for the virus antigens by fluorescent antibody (FA) test. Rabies virus isolates obtained from different animal species in different regions of Sri Lanka were genetically homogenous. Sequences of 203 nucleotides (nt)-long RT-PCR products obtained from 16 of 27 samples were found identical. Sequences of 1350 nt of N genes of 14 RT-PCR products were determined. The Sri Lanka isolates under study formed a specific cluster that included also an earlier isolate from India but did not include the known isolates from China, Thailand, Malaysia, Israel, Iran, Oman, Saudi Arabia, Russia, Nepal, Philippines, Japan and from several other countries. These results suggest that one type of rabies virus is circulating among human, dog, cat, mongoose, jackal and water buffalo living near Colombo City and in other five remote regions in Sri Lanka.

  2. Complete Nucleotide Sequence Analysis of the Norovirus GII.4 Sydney Variant in South Korea

    Directory of Open Access Journals (Sweden)

    Ji-Sun Park

    2015-01-01

    Full Text Available Norovirus is the primary cause of acute gastroenteritis in individuals of all ages. In Australia, a new strain of norovirus (GII.4 was identified in March 2012, and this strain has spread rapidly around the world. In August 2012, this new GII.4 strain was identified in patients in South Korea. Therefore, to examine the characteristics of the epidemic norovirus GII.4 2012 variant in South Korea, we conducted KM272334 full-length genomic analysis. The genome of the gg-12-08-04 strain consisted of 7,558 bp and contained three open reading frame (ORF composites throughout the whole genome: ORF1 (5,100 bp, ORF2 (1,623 bp, and ORF3 (807 bp. Phylogenetic analyses showed that gg-12-08-04 belonged to the GII.4 Sydney 2012 variant, sharing 98.92% nucleotide similarity with this variant strain. According to SimPlot analysis, the gg-12-08-04 strain was a recombinant strain with breakpoint at the ORF1/2 junction between Osaka 2007 and Apeldoorn 2008 strains. This study is the first report of the complete sequence of the GII.4 Sydney 2012 strain in South Korea. Therefore, this may represent the standard sequence of the norovirus GII.4 2012 variant in South Korea and could therefore be useful for the development of norovirus vaccines.

  3. Exploring the correlation between the sequence composition of the nucleotide binding G5 loop of the FeoB GTPase domain (NFeoB) and intrinsic rate of GDP release.

    Science.gov (United States)

    Guilfoyle, Amy P; Deshpande, Chandrika N; Schenk, Gerhard; Maher, Megan J; Jormakka, Mika

    2014-12-12

    GDP release from GTPases is usually extremely slow and is in general assisted by external factors, such as association with guanine exchange factors or membrane-embedded GPCRs (G protein-coupled receptors), which accelerate the release of GDP by several orders of magnitude. Intrinsic factors can also play a significant role; a single amino acid substitution in one of the guanine nucleotide recognition motifs, G5, results in a drastically altered GDP release rate, indicating that the sequence composition of this motif plays an important role in spontaneous GDP release. In the present study, we used the GTPase domain from EcNFeoB (Escherichia coli FeoB) as a model and applied biochemical and structural approaches to evaluate the role of all the individual residues in the G5 loop. Our study confirms that several of the residues in the G5 motif have an important role in the intrinsic affinity and release of GDP. In particular, a T151A mutant (third residue of the G5 loop) leads to a reduced nucleotide affinity and provokes a drastically accelerated dissociation of GDP.

  4. Prevalence of single nucleotide polymorphism among 27 diverse alfalfa genotypes as assessed by transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Li Xuehui

    2012-10-01

    Full Text Available Abstract Background Alfalfa, a perennial, outcrossing species, is a widely planted forage legume producing highly nutritious biomass. Currently, improvement of cultivated alfalfa mainly relies on recurrent phenotypic selection. Marker assisted breeding strategies can enhance alfalfa improvement efforts, particularly if many genome-wide markers are available. Transcriptome sequencing enables efficient high-throughput discovery of single nucleotide polymorphism (SNP markers for a complex polyploid species. Result The transcriptomes of 27 alfalfa genotypes, including elite breeding genotypes, parents of mapping populations, and unimproved wild genotypes, were sequenced using an Illumina Genome Analyzer IIx. De novo assembly of quality-filtered 72-bp reads generated 25,183 contigs with a total length of 26.8 Mbp and an average length of 1,065 bp, with an average read depth of 55.9-fold for each genotype. Overall, 21,954 (87.2% of the 25,183 contigs represented 14,878 unique protein accessions. Gene ontology (GO analysis suggested that a broad diversity of genes was represented in the resulting sequences. The realignment of individual reads to the contigs enabled the detection of 872,384 SNPs and 31,760 InDels. High resolution melting (HRM analysis was used to validate 91% of 192 putative SNPs identified by sequencing. Both allelic variants at about 95% of SNP sites identified among five wild, unimproved genotypes are still present in cultivated alfalfa, and all four US breeding programs also contain a high proportion of these SNPs. Thus, little evidence exists among this dataset for loss of significant DNA sequence diversity from either domestication or breeding of alfalfa. Structure analysis indicated that individuals from the subspecies falcata, the diploid subspecies caerulea, and the tetraploid subspecies sativa (cultivated tetraploid alfalfa were clearly separated. Conclusion We used transcriptome sequencing to discover large numbers of SNPs

  5. Nucleotide and Predicted Amino Acid Sequence-Based Analysis of the Avian Metapneumovirus Type C Cell Attachment Glycoprotein Gene: Phylogenetic Analysis and Molecular Epidemiology of U.S. Pneumoviruses

    Science.gov (United States)

    Alvarez, Rene; Lwamba, Humphrey M.; Kapczynski, Darrell R.; Njenga, M. Kariuki; Seal, Bruce S.

    2003-01-01

    A serologically distinct avian metapneumovirus (aMPV) was isolated in the United States after an outbreak of turkey rhinotracheitis (TRT) in February 1997. The newly recognized U.S. virus was subsequently demonstrated to be genetically distinct from European subtypes and was designated aMPV serotype C (aMPV/C). We have determined the nucleotide sequence of the gene encoding the cell attachment glycoprotein (G) of aMPV/C (Colorado strain and three Minnesota isolates) and predicted amino acid sequence by sequencing cloned cDNAs synthesized from intracellular RNA of aMPV/C-infected cells. The nucleotide sequence comprised 1,321 nucleotides with only one predicted open reading frame encoding a protein of 435 amino acids, with a predicted Mr of 48,840. The structural characteristics of the predicted G protein of aMPV/C were similar to those of the human respiratory syncytial virus (hRSV) attachment G protein, including two mucin-like regions (heparin-binding domains) flanking both sides of a CX3C chemokine motif present in a conserved hydrophobic pocket. Comparison of the deduced G-protein amino acid sequence of aMPV/C with those of aMPV serotypes A, B, and D, as well as hRSV revealed overall predicted amino acid sequence identities ranging from 4 to 16.5%, suggesting a distant relationship. However, G-protein sequence identities ranged from 72 to 97% when aMPV/C was compared to other members within the aMPV/C subtype or 21% for the recently identified human MPV (hMPV) G protein. Ratios of nonsynonymous to synonymous nucleotide changes were greater than one in the G gene when comparing the more recent Minnesota isolates to the original Colorado isolate. Epidemiologically, this indicates positive selection among U.S. isolates since the first outbreak of TRT in the United States. PMID:12682171

  6. MicroRNA sequence motifs reveal asymmetry between the stem arms

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Havgaard, Jakob Hull; Ensterö, M.

    2006-01-01

    The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature miRNAs in their gen......The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature mi...

  7. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library

    Directory of Open Access Journals (Sweden)

    Salem Mohamed

    2009-11-01

    Full Text Available Abstract Background To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs have been used for single nucleotide polymorphism (SNP discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA broodstock population. Results The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends. Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183 of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In

  8. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library.

    Science.gov (United States)

    Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E

    2009-11-25

    To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the

  9. Finding the right coverage : The impact of coverage and sequence quality on single nucleotide polymorphism genotyping error rates

    NARCIS (Netherlands)

    Fountain, Emily D.; Pauli, Jonathan N.; Reid, Brendan N.; Palsboll, Per J.; Peery, M. Zachariah

    Restriction-enzyme-based sequencing methods enable the genotyping of thousands of single nucleotide polymorphism (SNP) loci in nonmodel organisms. However, in contrast to traditional genetic markers, genotyping error rates in SNPs derived from restriction-enzyme-based methods remain largely unknown.

  10. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    Science.gov (United States)

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  11. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    Directory of Open Access Journals (Sweden)

    Meiler Arno

    2012-09-01

    Full Text Available Abstract Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  12. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    Science.gov (United States)

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  13. Polyadenylation of RNA transcribed from mammalian SINEs by RNA polymerase III: Complex requirements for nucleotide sequences.

    Science.gov (United States)

    Borodulina, Olga R; Golubchikova, Julia S; Ustyantsev, Ilia G; Kramerov, Dmitri A

    2016-02-01

    It is generally accepted that only transcripts synthesized by RNA polymerase II (e.g., mRNA) were subject to AAUAAA-dependent polyadenylation. However, we previously showed that RNA transcribed by RNA polymerase III (pol III) from mouse B2 SINE could be polyadenylated in an AAUAAA-dependent manner. Many species of mammalian SINEs end with the pol III transcriptional terminator (TTTTT) and contain hexamers AATAAA in their A-rich tail. Such SINEs were united into Class T(+), whereas SINEs lacking the terminator and AATAAA sequences were classified as T(-). Here we studied the structural features of SINE pol III transcripts that are necessary for their polyadenylation. Eight and six SINE families from classes T(+) and T(-), respectively, were analyzed. The replacement of AATAAA with AACAAA in T(+) SINEs abolished the RNA polyadenylation. Interestingly, insertion of the polyadenylation signal (AATAAA) and pol III transcription terminator in T(-) SINEs did not result in polyadenylation. The detailed analysis of three T(+) SINEs (B2, DIP, and VES) revealed areas important for the polyadenylation of their pol III transcripts: the polyadenylation signal and terminator in A-rich tail, β region positioned immediately downstream of the box B of pol III promoter, and τ region located upstream of the tail. In DIP and VES (but not in B2), the τ region is a polypyrimidine motif which is also characteristic of many other T(+) SINEs. Most likely, SINEs of different mammals acquired these structural features independently as a result of parallel evolution. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  15. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer

    Science.gov (United States)

    Morrison, Carl D.; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M.; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R.; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H.; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C.; Johnson, Candace S.; Trump, Donald L.

    2014-01-01

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as “stitchers,” to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication–licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer. PMID:24469795

  16. Complete nucleotide sequence and genome organization of a Chinese isolate of Tobacco vein distorting virus.

    Science.gov (United States)

    Mo, Xiao-han; Chen, Zheng-bin; Chen, Jian-ping

    2010-12-01

    Tobacco bushy top disease is caused by tobacco bushy top virus (TBTV, a member of the genus Umbravirus) which is dependent on tobacco vein-distorting virus (TVDV) to act as a helper virus encapsidating TBTV and enabling its transmission by aphids. Isometric virions from diseased tobacco plants were purified and disease symptoms were reproduced after experimental aphid transmission. The complete genome of TVDV was determined from cloned RT-PCR products derived from viral RNA. It was 5,920 nucleotides (nts) long and had the six major open reading frames (ORFs) typical of a member of the genus Polerovirus. Sequence comparisons showed that it differed significantly from any of the other species in the genus and this was confirmed by phylogenetic analyses of the RdRp and coat protein. SDS-PAGE analysis of purified virions gave two protein bands of about 26 and 59 kDa both of which reacted strongly in Western blots with antiserum produced to prokaryotically expressed TVDV CP showing that the two forms of the TVDV CP were the only protein components of the capsid.

  17. Uncommon nucleotide excision repair phenotypes revealed by targeted high-throughput sequencing.

    Science.gov (United States)

    Calmels, Nadège; Greff, Géraldine; Obringer, Cathy; Kempf, Nadine; Gasnier, Claire; Tarabeux, Julien; Miguet, Marguerite; Baujat, Geneviève; Bessis, Didier; Bretones, Patricia; Cavau, Anne; Digeon, Béatrice; Doco-Fenzy, Martine; Doray, Bérénice; Feillet, François; Gardeazabal, Jesus; Gener, Blanca; Julia, Sophie; Llano-Rivas, Isabel; Mazur, Artur; Michot, Caroline; Renaldo-Robin, Florence; Rossi, Massimiliano; Sabouraud, Pascal; Keren, Boris; Depienne, Christel; Muller, Jean; Mandel, Jean-Louis; Laugel, Vincent

    2016-03-22

    Deficient nucleotide excision repair (NER) activity causes a variety of autosomal recessive diseases including xeroderma pigmentosum (XP) a disorder which pre-disposes to skin cancer, and the severe multisystem condition known as Cockayne syndrome (CS). In view of the clinical overlap between NER-related disorders, as well as the existence of multiple phenotypes and the numerous genes involved, we developed a new diagnostic approach based on the enrichment of 16 NER-related genes by multiplex amplification coupled with next-generation sequencing (NGS). Our test cohort consisted of 11 DNA samples, all with known mutations and/or non pathogenic SNPs in two of the tested genes. We then used the same technique to analyse samples from a prospective cohort of 40 patients. Multiplex amplification and sequencing were performed using AmpliSeq protocol on the Ion Torrent PGM (Life Technologies). We identified causative mutations in 17 out of the 40 patients (43%). Four patients showed biallelic mutations in the ERCC6(CSB) gene, five in the ERCC8(CSA) gene: most of them had classical CS features but some had very mild and incomplete phenotypes. A small cohort of 4 unrelated classic XP patients from the Basque country (Northern Spain) revealed a common splicing mutation in POLH (XP-variant), demonstrating a new founder effect in this population. Interestingly, our results also found ERCC2(XPD), ERCC3(XPB) or ERCC5(XPG) mutations in two cases of UV-sensitive syndrome and in two cases with mixed XP/CS phenotypes. Our study confirms that NGS is an efficient technique for the analysis of NER-related disorders on a molecular level. It is particularly useful for phenotypes with combined features or unusually mild symptoms. Targeted NGS used in conjunction with DNA repair functional tests and precise clinical evaluation permits rapid and cost-effective diagnosis in patients with NER-defects.

  18. Nucleotide sequence of the melA gene, coding for alpha-galactosidase in Escherichia coli K-12.

    OpenAIRE

    Liljeström, P L; Liljeström, P

    1987-01-01

    Melibiose uptake and hydrolysis in E.coli is performed by the MelB and MelA proteins, respectively. We report the cloning and sequencing of the melA gene. The nucleotide sequence data showed that melA codes for a 450 amino acid long protein with a molecular weight of 50.6 kd. The sequence data also supported the assumption that the mel locus forms an operon with melA in proximal position. A comparison of MelA with alpha-galactosidase proteins from yeast and human origin showed that these prot...

  19. Nucleotide sequence of a chickpea chlorotic stunt virus relative that infects pea and faba bean in China.

    Science.gov (United States)

    Zhou, Cui-Ji; Xiang, Hai-Ying; Zhuo, Tao; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2012-07-01

    We determined the genome sequence of a new polerovirus that infects field pea and faba bean in China. Its entire nucleotide sequence (6021 nt) was most closely related (83.3% identity) to that of an Ethiopian isolate of chickpea chlorotic stunt virus (CpCSV-Eth). With the exception of the coat protein (encoded by ORF3), amino acid sequence identities of all gene products of this virus to those of CpCSV-Eth and other poleroviruses were Polerovirus, and the name pea mild chlorosis virus is proposed.

  20. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

    Science.gov (United States)

    Gilbert, N; Labuda, D

    1999-03-16

    A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.

  1. Structural analysis of a repetitive protein sequence motif in strepsirrhine primate amelogenin.

    Directory of Open Access Journals (Sweden)

    Rodrigo S Lacruz

    2011-03-01

    Full Text Available Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL, the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates.

  2. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2007-02-01

    Full Text Available Abstract Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

  3. Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna

    Directory of Open Access Journals (Sweden)

    Souche Erika L

    2011-06-01

    Full Text Available Abstract Background Daphnia (Crustacea: Cladocera plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. Daphnia magna is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, climate change, and antagonistic interactions with predators and parasites, which have all been demonstrated to induce micro-evolutionary and adaptive responses. Although its ecology and evolutionary biology is intensively studied, little is known on the functional genomics underpinning of phenotypic responses to environmental stressors. The aim of the present study was to find genes expressed in presence of environmental stressors, and target such genes for single nucleotide polymorphic (SNP marker development. Results We developed three expressed sequence tag (EST libraries using clonal lineages of D. magna exposed to ecological stressors, namely fish predation, parasite infection and pesticide exposure. We used these newly developed ESTs and other Daphnia ESTs retrieved from NCBI GeneBank to mine for SNP markers targeting synonymous as well as non synonymous genetic variation. We validate the developed SNPs in six natural populations of D. magna distributed at regional scale. Conclusions A large proportion (47% of the produced ESTs are Daphnia lineage specific genes, which are potentially involved in responses to environmental stress rather than to general cellular functions and metabolic activities, or reflect the arthropod's aquatic lifestyle. The characterization of genes expressed under stress and the validation of their SNPs for population genetic study is important for identifying ecologically responsive genes in D. magna.

  4. Sequence and structural analysis of the chitinase insertion domain reveals two conserved motifs involved in chitin-binding.

    Directory of Open Access Journals (Sweden)

    Hai Li

    2010-01-01

    Full Text Available Chitinases are prevalent in life and are found in species including archaea, bacteria, fungi, plants, and animals. They break down chitin, which is the second most abundant carbohydrate in nature after cellulose. Hence, they are important for maintaining a balance between carbon and nitrogen trapped as insoluble chitin in biomass. Chitinases are classified into two families, 18 and 19 glycoside hydrolases. In addition to a catalytic domain, which is a triosephosphate isomerase barrel, many family 18 chitinases contain another module, i.e., chitinase insertion domain. While numerous studies focus on the biological role of the catalytic domain in chitinase activity, the function of the chitinase insertion domain is not completely understood. Bioinformatics offers an important avenue in which to facilitate understanding the role of residues within the chitinase insertion domain in chitinase function.Twenty-seven chitinase insertion domain sequences, which include four experimentally determined structures and span five kingdoms, were aligned and analyzed using a modified sequence entropy parameter. Thirty-two positions with conserved residues were identified. The role of these conserved residues was explored by conducting a structural analysis of a number of holo-enzymes. Hydrogen bonding and van der Waals calculations revealed a distinct subset of four conserved residues constituting two sequence motifs that interact with oligosaccharides. The other conserved residues may be key to the structure, folding, and stability of this domain.Sequence and structural studies of the chitinase insertion domains conducted within the framework of evolution identified four conserved residues which clearly interact with the substrates. Furthermore, evolutionary studies propose a link between the appearance of the chitinase insertion domain and the function of family 18 chitinases in the subfamily A.

  5. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    Science.gov (United States)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  6. The NS1 polypeptide of the murine parvovirus minute virus of mice binds to DNA sequences containing the motif [ACCA]2-3.

    Science.gov (United States)

    Cotmore, S F; Christensen, J; Nüesch, J P; Tattersall, P

    1995-03-01

    A DNA fragment containing the minute virus of mice 3' replication origin was specifically coprecipitated in immune complexes containing the virally coded NS1, but not the NS2, polypeptide. Antibodies directed against the amino- or carboxy-terminal regions of NS1 precipitated the NS1-origin complexes, but antibodies directed against NS1 amino acids 284 to 459 blocked complex formation. Using affinity-purified histidine-tagged NS1 preparations, we have shown that the specific protein-DNA interaction is of moderate affinity, being stable in 0.1 M salt but rapidly lost at higher salt concentrations. In contrast, generalized (or nonspecific) DNA binding by NS1 could be demonstrated only in low salt. Addition of ATP or gamma S-ATP enhanced specific DNA binding by wild-type NS1 severalfold, but binding was lost under conditions which favored ATP hydrolysis. NS1 molecules with mutations in a critical lysine residue (amino acid 405) in the consensus ATP-binding site bound to the origin, but this binding could not be enhanced by ATP addition. DNase I protection assays carried out with wild-type NS1 in the presence of gamma S-ATP gave footprints which extended over 43 nucleotides on both DNA strands, from the middle of the origin bubble sequence to a position some 14 bp beyond the nick site. The DNA-binding site for NS1 was mapped to a 22-bp fragment from the middle of the 3' replication origin which contains the sequence ACCAACCA. This conforms to a reiterated motif (ACCA)2-3, which occurs, in more or less degenerate form, at many sites throughout the minute virus of mice genome (J. W. Bodner, Virus Genes 2:167-182, 1989). Insertion of a single copy of the sequence (ACCA)3 was shown to be sufficient to confer NS1 binding on an otherwise unrecognized plasmid fragment. The functions of NS1 in the viral life cycle are reevaluated in the light of this result.

  7. Comparison of the nucleotide sequence of wild-type hepatitis - A virus and its attenuated candidate vaccine derivative

    International Nuclear Information System (INIS)

    Cohen, J.I.; Rosenblum, B.; Ticehurst, J.R.; Daemer, R.; Feinstone, S.; Purcell, R.H.

    1987-01-01

    Development of attenuated mutants for use as vaccines is in progress for other viruses, including influenza, rotavirus, varicella-zoster, cytomegalovirus, and hepatitis-A virus (HAV). Attenuated viruses may be derived from naturally occurring mutants that infect human or nonhuman hosts. Alternatively, attenuated mutants may be generated by passage of wild-type virus in cell culture. Production of attenuated viruses in cell culture is a laborious and empiric process. Despite previous empiric successes, understanding the molecular basis for attenuation of vaccine viruses could facilitate future development and use of live-virus vaccines. Comparison of the complete nucleotide sequences of wild-type (virulent) and vaccine (attenuated) viruses has been reported for polioviruses and yellow fever virus. Here, the authors compare the nucleotide sequence of wild-type HAV HM-175 with that of a candidate vaccine derivative

  8. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

    Science.gov (United States)

    Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

    2013-07-01

    The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.

  9. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  10. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  11. Molecular Identification of Necrophagous Muscidae and Sarcophagidae Fly Species Collected in Korea by Mitochondrial Cytochrome c Oxidase Subunit I Nucleotide Sequences

    Directory of Open Access Journals (Sweden)

    Yu-Hoon Kim

    2014-01-01

    Full Text Available Identification of insect species is an important task in forensic entomology. For more convenient species identification, the nucleotide sequences of cytochrome c oxidase subunit I (COI gene have been widely utilized. We analyzed full-length COI nucleotide sequences of 10 Muscidae and 6 Sarcophagidae fly species collected in Korea. After DNA extraction from collected flies, PCR amplification and automatic sequencing of the whole COI sequence were performed. Obtained sequences were analyzed for a phylogenetic tree and a distance matrix. Our data showed very low intraspecific sequence distances and species-level monophylies. However, sequence comparison with previously reported sequences revealed a few inconsistencies or paraphylies requiring further investigation. To the best of our knowledge, this study is the first report of COI nucleotide sequences from Hydrotaea occulta, Muscina angustifrons, Muscina pascuorum, Ophyra leucostoma, Sarcophaga haemorrhoidalis, Sarcophaga harpax, and Phaonia aureola.

  12. Molecular cloning of a human glycophorin B cDNA: nucleotide sequence and genomic relationship to glycophorin A

    International Nuclear Information System (INIS)

    Siebert, P.D.; Fukuda, M.

    1987-01-01

    The authors describe the isolation and nucleotide sequence of a human glycophorin B cDNA. The cDNA was identified by differential hybridization of synthetic oligonucleotide probes to a human erythroleukemic cell line (K562) cDNA library constructed in phage vector λgt10. The nucleotide sequence of the glycophorin B cDNA was compared with that of a previously cloned glycophorin A cDNA. The nucleotide sequences encoding the NH 2 -terminal leader peptide and first 26 amino acids of the two proteins are nearly identical. This homologous region is followed by areas specific to either glycophorin A or B and a number of small regions of homology, which in turn are followed by a very homologous region encoding the presumed membrane-spanning portion of the proteins. They used RNA blot hybridization with both cDNA and synthetic oligonucleotide probes to prove our previous hypothesis that glycophorin B is encoded by a single 0.5- to 0.6-kb mRNA and to show that glycophorins A and B are negatively and coordinately regulated by a tumor-promoting phorbol ester, phorbol 12-myristate 13-acetate. They established the intron/exon structure of the glycophorin A and B genes by oligonucleotide mapping; the results suggest a complex evolution of the glycophorin genes

  13. Salt-bridging effects on short amphiphilic helical structure and introducing sequence-based short beta-turn motifs.

    Science.gov (United States)

    Guarracino, Danielle A; Gentile, Kayla; Grossman, Alec; Li, Evan; Refai, Nader; Mohnot, Joy; King, Daniel

    2018-02-01

    Determining the minimal sequence necessary to induce protein folding is beneficial in understanding the role of protein-protein interactions in biological systems, as their three-dimensional structures often dictate their activity. Proteins are generally comprised of discrete secondary structures, from α-helices to β-turns and larger β-sheets, each of which is influenced by its primary structure. Manipulating the sequence of short, moderately helical peptides can help elucidate the influences on folding. We created two new scaffolds based on a modestly helical eight-residue peptide, PT3, we previously published. Using circular dichroism (CD) spectroscopy and changing the possible salt-bridging residues to new combinations of Lys, Arg, Glu, and Asp, we found that our most helical improvements came from the Arg-Glu combination, whereas the Lys-Asp was not significantly different from the Lys-Glu of the parent scaffold, PT3. The marked 3 10 -helical contributions in PT3 were lessened in the Arg-Glu-containing peptide with the beginning of cooperative unfolding seen through a thermal denaturation. However, a unique and unexpected signature was seen for the denaturation of the Lys-Asp peptide which could help elucidate the stages of folding between the 3 10 and α-helix. In addition, we developed a short six-residue peptide with β-turn/sheet CD signature, again to help study minimal sequences needed for folding. Overall, the results indicate that improvements made to short peptide scaffolds by fine-tuning the salt-bridging residues can enhance scaffold structure. Likewise, with the results from the new, short β-turn motif, these can help impact future peptidomimetic designs in creating biologically useful, short, structured β-sheet-forming peptides.

  14. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Science.gov (United States)

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  15. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Directory of Open Access Journals (Sweden)

    Zing Tsung-Yeh Tsai

    2015-08-01

    Full Text Available Transcription factor (TF binding is determined by the presence of specific sequence motifs (SM and chromatin accessibility, where the latter is influenced by both chromatin state (CS and DNA structure (DS properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  16. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

    Science.gov (United States)

    Nishizawa, M; Nishizawa, K

    2000-10-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.

  17. Single nucleotide polymorphism barcoding of cytochrome c oxidase I sequences for discriminating 17 species of Columbidae by decision tree algorithm.

    Science.gov (United States)

    Yang, Cheng-Hong; Wu, Kuo-Chuan; Dahms, Hans-Uwe; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-07-01

    DNA barcodes are widely used in taxonomy, systematics, species identification, food safety, and forensic science. Most of the conventional DNA barcode sequences contain the whole information of a given barcoding gene. Most of the sequence information does not vary and is uninformative for a given group of taxa within a monophylum. We suggest here a method that reduces the amount of noninformative nucleotides in a given barcoding sequence of a major taxon, like the prokaryotes, or eukaryotic animals, plants, or fungi. The actual differences in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, provide a tool for developing a rapid, reliable, and high-throughput assay for the discrimination between known species. Here, we investigated SNPs as robust markers of genetic variation for identifying different pigeon species based on available cytochrome c oxidase I (COI) data. We propose here a decision tree-based SNP barcoding (DTSB) algorithm where SNP patterns are selected from the DNA barcoding sequence of several evolutionarily related species in order to identify a single species with pigeons as an example. This approach can make use of any established barcoding system. We here firstly used as an example the mitochondrial gene COI information of 17 pigeon species (Columbidae, Aves) using DTSB after sequence trimming and alignment. SNPs were chosen which followed the rule of decision tree and species-specific SNP barcodes. The shortest barcode of about 11 bp was then generated for discriminating 17 pigeon species using the DTSB method. This method provides a sequence alignment and tree decision approach to parsimoniously assign a unique and shortest SNP barcode for any known species of a chosen monophyletic taxon where a barcoding sequence is available.

  18. Sequence-specific DNA binding activity of the cross-brace zinc finger motif of the piggyBac transposase

    Science.gov (United States)

    Morellet, Nelly; Li, Xianghong; Wieninger, Silke A; Taylor, Jennifer L; Bischerour, Julien; Moriau, Séverine; Lescop, Ewen; Bardiaux, Benjamin; Mathy, Nathalie; Assrir, Nadine; Bétermier, Mireille; Nilges, Michael; Hickman, Alison B; Dyda, Fred; Craig, Nancy L; Guittet, Eric

    2018-01-01

    Abstract The piggyBac transposase (PB) is distinguished by its activity and utility in genome engineering, especially in humans where it has highly promising therapeutic potential. Little is known, however, about the structure–function relationships of the different domains of PB. Here, we demonstrate in vitro and in vivo that its C-terminal Cysteine-Rich Domain (CRD) is essential for DNA breakage, joining and transposition and that it binds to specific DNA sequences in the left and right transposon ends, and to an additional unexpectedly internal site at the left end. Using NMR, we show that the CRD adopts the specific fold of the cross-brace zinc finger protein family. We determine the interaction interfaces between the CRD and its target, the 5′-TGCGT-3′/3′-ACGCA-5′ motifs found in the left, left internal and right transposon ends, and use NMR results to propose docking models for the complex, which are consistent with our site-directed mutagenesis data. Our results provide support for a model of the PB/DNA interactions in the context of the transpososome, which will be useful for the rational design of PB mutants with increased activity. PMID:29385532

  19. NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole

    2011-01-01

    Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity...... to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs...... associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can...

  20. Fusion protein gene nucleotide sequence similarities, shared antigenic sites and phylogenetic analysis suggest that phocid distemper virus 2 and canine distemper virus belong to the same virus entity.

    NARCIS (Netherlands)

    I.K.G. Visser (Ilona); R.W.J. van der Heijden (Roger); M.W.G. van de Bildt (Marco); M.J.H. Kenter (Marcel); C. Örvell; A.D.M.E. Osterhaus (Albert)

    1993-01-01

    textabstractNucleotide sequencing of the fusion protein (F) gene of phocid distemper virus-2 (PDV-2), recently isolated from Baikal seals (Phoca sibirica), revealed an open reading frame (nucleotides 84 to 2075) with two potential in-frame ATG translation initiation codons. We suggest that the

  1. The nucleotide sequence of the right-hand terminus of adenovirus type 5 DNA: Implications for the mechanism of DNA replication

    NARCIS (Netherlands)

    Steenbergh, P.H.; Sussenbach, J.S.

    The nucleotide sequence of the right-hand terminal 3% of adenovirus type 5 (Ad5) DNA has been determined, using the chemical degradation technique developed by Maxam and Gilbert (1977). This region of the genome comprises the 1003 basepair long HindIII-I fragment and the first 75 nucleotides of the

  2. Analysis of nucleotide sequence variations in herpes simplex virus types 1 and 2, and varicella-zoster virus

    International Nuclear Information System (INIS)

    Chiba, A.; Suzutani, T.; Koyano, S.; Azuma, M.; Saijo, M.

    1998-01-01

    To analyze the difference in the degree of divergence between genes from identical herpes virus species, we examined the nucleotide sequence of genes from the herpes simplex virus type 1 (HSV-l ) strains VR-3 and 17 encoding thymidine kinase (TK), deoxyribonuclease (DNase), protein kinase (PK; UL13) and virion-associated host shut off (vhs) protein (UL41). The frequency of nucleotide substitutions per 1 kb in TK gene was 2.5 to 4.3 times higher than those in the other three genes. To prove that the polymorphism of HSV-1 TK gene is common characteristic of herpes virus TK genes, we compared the diversity of TK genes among eight HSV-l , six herpes simplex virus type 2 (HSV-2) and seven varicella-zoster virus (VZV) strains. The average frequency of nucleotide substitutions per 1 kb in the TK gene of HSV-l strains was 4-fold higher than that in the TK gene of HSV-2 strains. The VZV TK gene was highly conserved and only two nucleotide changes were evident in VZV strains. However, the rate of non-synonymous substitutions in total nucleotide substitutions was similar among the TK genes of the three viruses. This result indicated that the mutational rates differed, but there were no significant differences in selective pressure. We conclude that HSV-l TK gene is highly diverged and analysis of variations in the gene is a useful approach for understanding the molecular evolution of HSV-l in a short period. (authors)

  3. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  4. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Science.gov (United States)

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.

  5. Nucleotide sequences from the genomes of diverse cowpea accessions for discovery of genetic variation as part of the Feed the Future Innovation Lab for Climate Resilient Cowpea

    Data.gov (United States)

    US Agency for International Development — Nucleotide sequences were generated from 37 cowpea (Vigna unguiculata L. Walp.) accessions relevant to Africa, China and the USA to discover at type of genetic...

  6. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome

    Directory of Open Access Journals (Sweden)

    Santosh K. Tiwari

    2011-01-01

    Full Text Available The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs, in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0 software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5 software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  7. [Molecular phylogeny of Turbellaria, based on data from comparing the nucleotide sequences of 18S ribosomal RNA genes].

    Science.gov (United States)

    Kuznedelov, K D; Timoshkin, O A

    1995-01-01

    Polymerase chain reaction and direct sequencing of the 5'-end region of the 18S ribosomal RNA gene were used to infer phylogenetic relationship among turbellarian flatworms from Lake Baikal. Representatives of 5 orders (Tricladida--10 spp., Lecithoepitheliata--5 spp., Prolecithophora--3 spp., Proseriata and Kalyptorhynchia one for each) were studied; nucleotide sequence of more than 340 nucleotides was determined for each species. Consensus sequence for each order having more than one representative species was determined. Distance matrix and maximum parsimony approaches were applied to infer phylogenies. Bootstrap procedure was used to estimate confidence limits, at the 100% level by bootstrapping, the group of three orders: Kalyptorhynchia, Proseriata and Lecithoepitheliata was found to be monophyletic. However, subsets inside the group had no significant support to be preferred or rejected. Our data do not support traditional systematics which joins two suborders Tricladida and Proseriata into the single order Seriata, and also do not support comparative anatomical data which show close relationship of Lecithoepitheliata and lower Prolecithophora.

  8. A cyclic nucleotide-gated channel mutation associated with canine daylight blindness provides insight into a role for the S2 segment tri-Asp motif in channel biogenesis.

    Directory of Open Access Journals (Sweden)

    Naoto Tanaka

    Full Text Available Cone cyclic nucleotide-gated channels are tetramers formed by CNGA3 and CNGB3 subunits; CNGA3 subunits function as homotetrameric channels but CNGB3 exhibits channel function only when co-expressed with CNGA3. An aspartatic acid (Asp to asparagine (Asn missense mutation at position 262 in the canine CNGB3 (D262N subunit results in loss of cone function (daylight blindness, suggesting an important role for this aspartic acid residue in channel biogenesis and/or function. Asp 262 is located in a conserved region of the second transmembrane segment containing three Asp residues designated the Tri-Asp motif. This motif is conserved in all CNG channels. Here we examine mutations in canine CNGA3 homomeric channels using a combination of experimental and computational approaches. Mutations of these conserved Asp residues result in the absence of nucleotide-activated currents in heterologous expression. A fluorescent tag on CNGA3 shows mislocalization of mutant channels. Co-expressing CNGB3 Tri-Asp mutants with wild type CNGA3 results in some functional channels, however, their electrophysiological characterization matches the properties of homomeric CNGA3 channels. This failure to record heteromeric currents suggests that Asp/Asn mutations affect heteromeric subunit assembly. A homology model of S1-S6 of the CNGA3 channel was generated and relaxed in a membrane using molecular dynamics simulations. The model predicts that the Tri-Asp motif is involved in non-specific salt bridge pairings with positive residues of S3/S4. We propose that the D262N mutation in dogs with CNGB3-day blindness results in the loss of these inter-helical interactions altering the electrostatic equilibrium within in the S1-S4 bundle. Because residues analogous to Tri-Asp in the voltage-gated Shaker potassium channel family were implicated in monomer folding, we hypothesize that destabilizing these electrostatic interactions impairs the monomer folding state in D262N mutant CNG

  9. Nucleotide Sequences and Comparison of Two Large Conjugative Plasmids from Different Campylobacter species

    National Research Council Canada - National Science Library

    Batchelor, Roger A; Pearson, Bruce M; Friis, Lorna M; Guerry, Patricia; Wells, Jerry M

    2004-01-01

    .... Both plasmids are mosaic in structure, having homologues of genes found in a variety of different commensal and pathogenic bacteria, but nevertheless, showed striking similarities in DNA sequence...

  10. Nucleotide sequence of the 3' ends of the double-stranded RNAs of grapevine chrome mosaic nepovirus.

    Science.gov (United States)

    Le Gall, O; Candresse, T; Dunez, J

    1988-02-01

    Attempts were made to label the termini of dsRNAs corresponding to the two genomic RNAs of grapevine chrome mosaic nepovirus (GCMV). It was not possible to label the 5' ends of the dsRNAs with [gamma-32P]ATP, which suggests that a genome-linked protein blocks their 5' ends. Both dsRNA species were labelled at their 3' ends with pCp. The 3'-terminal sequences were determined by 'wandering spot' or by partial enzymic cleavage analysis. One strand (presumably positive) ended in a poly(A) 30 to 50 nucleotides long whereas the other (presumably negative) ended in 3'-ACCUUUUAAAAAG (RNA1) or 3'-ACCUUUUAAUAAAG (RNA2). The sequences resemble closely those complementary to the 5' ends of the RNAs of tomato black ring virus (strain S), which is distantly related to GCMV.

  11. Complete nucleotide sequence and genome organization of Olive latent virus 3, a new putative member of the family Tymoviridae.

    Science.gov (United States)

    Alabdullah, Abdulkader; Minafra, Angelantonio; Elbeaino, Toufic; Saponari, Maria; Savino, Vito; Martelli, Giovanni P

    2010-09-01

    The complete nucleotide sequence and the genome organization were determined of a putative new member of the family Tymoviridae, tentatively named Olive latent virus 3 (OLV-3), recovered in southern Italy from a symptomless olive tree. The sequenced ssRNA genome comprises 7148 nucleotides excluding the poly(A) tail and contains four open reading frames (ORFs). ORF1 encodes a polyprotein of 221.6kDa in size, containing the conserved signatures of the methyltransferase (MTR), papain-like protease (PRO), helicase (HEL) and RNA-dependent RNA polymerase (RdRp) domains of the replication-associated proteins of positive-strand RNA viruses. ORF2 overlaps completely ORF1 and encodes a putative protein of 43.33kDa showing limited sequence similarity with the putative movement protein of Maize rayado fino virus (MRFV). ORF3 codes for a protein with predicted molecular mass of 28.46kDa, identified as the coat protein (CP), whereas ORF4 overlaps ORF3 and encodes a putative protein of 16kDa with sequence similarity to the p16 and p31 proteins of Citrus sudden death-associated virus (CSDaV) and Grapevine fleck virus (GFkV), respectively. Within the family Tymoviridae, OLV-3 genome has the closest identity level (49-52%) with members of the genus Marafivirus, from which, however, it differs because of the diverse genome organization and the presence of a single type of CP subunits. Copyright (c) 2010 Elsevier B.V. All rights reserved.

  12. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

    KAUST Repository

    Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M

    2018-01-01

    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

  13. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

    KAUST Repository

    Teng, Haotian

    2018-04-10

    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

  14. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active...... related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein...... sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally...

  15. Efficient farnesylation of an extended C-terminal C(x)3X sequence motif expands the scope of the prenylated proteome.

    Science.gov (United States)

    Blanden, Melanie J; Suazo, Kiall F; Hildebrandt, Emily R; Hardgrove, Daniel S; Patel, Meet; Saunders, William P; Distefano, Mark D; Schmidt, Walter K; Hougland, James L

    2018-02-23

    Protein prenylation is a post-translational modification that has been most commonly associated with enabling protein trafficking to and interaction with cellular membranes. In this process, an isoprenoid group is attached to a cysteine near the C terminus of a substrate protein by protein farnesyltransferase (FTase) or protein geranylgeranyltransferase type I or II (GGTase-I and GGTase-II). FTase and GGTase-I have long been proposed to specifically recognize a four-amino acid C AAX C-terminal sequence within their substrates. Surprisingly, genetic screening reveals that yeast FTase can modify sequences longer than the canonical C AAX sequence, specifically C( x ) 3 X sequences with four amino acids downstream of the cysteine. Biochemical and cell-based studies using both peptide and protein substrates reveal that mammalian FTase orthologs can also prenylate C( x ) 3 X sequences. As the search to identify physiologically relevant C( x ) 3 X proteins begins, this new prenylation motif nearly doubles the number of proteins within the yeast and human proteomes that can be explored as potential FTase substrates. This work expands our understanding of prenylation's impact within the proteome, establishes the biologically relevant reactivity possible with this new motif, and opens new frontiers in determining the impact of non-canonically prenylated proteins on cell function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  16. Identification of mitochondrial DNA sequence variation and development of single nucleotide polymorphic markers for CMS-D8 in cotton.

    Science.gov (United States)

    Suzuki, Hideaki; Yu, Jiwen; Wang, Fei; Zhang, Jinfa

    2013-06-01

    Cytoplasmic male sterility (CMS), which is a maternally inherited trait and controlled by novel chimeric genes in the mitochondrial genome, plays a pivotal role in the production of hybrid seed. In cotton, no PCR-based marker has been developed to discriminate CMS-D8 (from Gossypium trilobum) from its normal Upland cotton (AD1, Gossypium hirsutum) cytoplasm. The objective of the current study was to develop PCR-based single nucleotide polymorphic (SNP) markers from mitochondrial genes for the CMS-D8 cytoplasm. DNA sequence variation in mitochondrial genes involved in the oxidative phosphorylation chain including ATP synthase subunit 1, 4, 6, 8 and 9, and cytochrome c oxidase 1, 2 and 3 subunits were identified by comparing CMS-D8, its isogenic maintainer and restorer lines on the same nuclear genetic background. An allelic specific PCR (AS-PCR) was utilized for SNP typing by incorporating artificial mismatched nucleotides into the third or fourth base from the 3' terminus in both the specific and nonspecific primers. The result indicated that the method modifying allele-specific primers was successful in obtaining eight SNP markers out of eight SNPs using eight primer pairs to discriminate two alleles between AD1 and CMS-D8 cytoplasms. Two of the SNPs for atp1 and cox1 could also be used in combination to discriminate between CMS-D8 and CMS-D2 cytoplasms. Additionally, a PCR-based marker from a nine nucleotide insertion-deletion (InDel) sequence (AATTGTTTT) at the 59-67 bp positions from the start codon of atp6, which is present in the CMS and restorer lines with the D8 cytoplasm but absent in the maintainer line with the AD1 cytoplasm, was also developed. A SNP marker for two nucleotide substitutions (AA in AD1 cytoplasm to CT in CMS-D8 cytoplasm) in the intron (1,506 bp) of cox2 gene was also developed. These PCR-based SNP markers should be useful in discriminating CMS-D8 and AD1 cytoplasms, or those with CMS-D2 cytoplasm as a rapid, simple, inexpensive, and

  17. Nucleotide sequence of a cDNA for branched chain acyltransferase with analysis of the deduced protein structure

    International Nuclear Information System (INIS)

    Hummel, K.B.; Litwer, S.; Bradford, A.P.; Aitken, A.; Danner, D.J.; Yeaman, S.J.

    1988-01-01

    Nucleotide sequence was determined for a 1.6-kilobase human cDNA putative for the branched chain acyltransferase protein of the branched chain α-ketoacid dehydrogenase complex. Translation of the sequence reveals an open reading frame encoding a 315-amino acid protein of molecular weight 35,759 followed by 560 bases of 3'-untranslated sequence. Three repeats of the polyadenylation signal hexamer ATTAAA are present prior to the polyadenylate tail. Within the open reading frame is a 10-amino acid fragment which matches exactly the amino acid sequence around the lipoate-lysine residue in bovine kidney branched chain acyltransferase, thus confirming the identity of the cDNA. Analysis of the deduced protein structure for the human branched chain acyltransferase revealed an organization into domains similar to that reported for the acyltransferase proteins of the pyruvate and α-ketoglutarate dehydrogenase complexes. This similarity in organization suggests that a more detailed analysis of the proteins will be required to explain the individual substrate and multienzyme complex specificity shown by these acyltransferases

  18. Mason: a JavaScript web site widget for visualizing and comparing annotated features in nucleotide or protein sequences.

    Science.gov (United States)

    Jaschob, Daniel; Davis, Trisha N; Riffle, Michael

    2015-03-07

    Sequence feature annotations (e.g., protein domain boundaries, binding sites, and secondary structure predictions) are an essential part of biological research. Annotations are widely used by scientists during research and experimental design, and are frequently the result of biological studies. A generalized and simple means of disseminating and visualizing these data via the web would be of value to the research community. Mason is a web site widget designed to visualize and compare annotated features of one or more nucleotide or protein sequence. Annotated features may be of virtually any type, ranging from annotating transcription binding sites or exons and introns in DNA to secondary structure or domain boundaries in proteins. Mason is simple to use and easy to integrate into web sites. Mason has a highly dynamic and configurable interface supporting multiple sets of annotations per sequence, overlapping regions, customization of interface and user-driven events (e.g., clicks and text to appear for tooltips). It is written purely in JavaScript and SVG, requiring no 3(rd) party plugins or browser customization. Mason is a solution for dissemination of sequence annotation data on the web. It is highly flexible, customizable, simple to use, and is designed to be easily integrated into web sites. Mason is open source and freely available at https://github.com/yeastrc/mason.

  19. Biological characterization and complete nucleotide sequence of a Tunisian isolate of Moroccan watermelon mosaic virus.

    Science.gov (United States)

    Yakoubi, S; Desbiez, C; Fakhfakh, H; Wipf-Scheibel, C; Marrakchi, M; Lecoq, H

    2008-01-01

    During a survey conducted in October 2005, cucurbit leaf samples showing virus-like symptoms were collected from the major cucurbit-growing areas in Tunisia. DAS-ELISA showed the presence of Moroccan watermelon mosaic virus (MWMV, Potyvirus), detected for the first time in Tunisia, in samples from the region of Cap Bon (Northern Tunisia). MWMV isolate TN05-76 (MWMV-Tn) was characterized biologically and its full-length genome sequence was established. MWMV-Tn was found to have biological properties similar to those reported for the MWMV type strain from Morocco. Phylogenetic analysis including the comparison of complete amino-acid sequences of 42 potyviruses confirmed that MWMV-Tn is related (65% amino-acid sequence identity) to Papaya ringspot virus (PRSV) isolates but is a member of a distinct virus species. Sequence analysis on parts of the CP gene of MWMV isolates from different geographical origins revealed some geographic structure of MWMV variability, with three different clusters: one cluster including isolates from the Mediterranean region, a second including isolates from western and central Africa, and a third one including isolates from the southern part of Africa. A significant correlation was observed between geographic and genetic distances between isolates. Isolates from countries in the Mediterranean region where MWMV has recently emerged (France, Spain, Portugal) have highly conserved sequences, suggesting that they may have a common and recent origin. MWMV from Sudan, a highly divergent variant, may be considered an evolutionary intermediate between MWMV and PRSV.

  20. The complete nucleotide sequences of the five genetically distinct plastid genomes of Oenothera, subsection Oenothera: I. sequence evaluation and plastome evolution.

    Science.gov (United States)

    Greiner, Stephan; Wang, Xi; Rauwolf, Uwe; Silber, Martina V; Mayer, Klaus; Meurer, Jörg; Haberer, Georg; Herrmann, Reinhold G

    2008-04-01

    The flowering plant genus Oenothera is uniquely suited for studying molecular mechanisms of speciation. It assembles an intriguing combination of genetic features, including permanent translocation heterozygosity, biparental transmission of plastids, and a general interfertility of well-defined species. This allows an exchange of plastids and nuclei between species often resulting in plastome-genome incompatibility. For evaluation of its molecular determinants we present the complete nucleotide sequences of the five basic, genetically distinguishable plastid chromosomes of subsection Oenothera (=Euoenothera) of the genus, which are associated in distinct combinations with six basic genomes. Sizes of the chromosomes range from 163 365 bp (plastome IV) to 165 728 bp (plastome I), display between 96.3% and 98.6% sequence similarity and encode a total of 113 unique genes. Plastome diversification is caused by an abundance of nucleotide substitutions, small insertions, deletions and repetitions. The five plastomes deviate from the general ancestral design of plastid chromosomes of vascular plants by a subsection-specific 56 kb inversion within the large single-copy segment. This inversion disrupted operon structures and predates the divergence of the subsection presumably 1 My ago. Phylogenetic relationships suggest plastomes I-III in one clade, while plastome IV appears to be closest to the common ancestor.

  1. The complete nucleotide sequences of the five genetically distinct plastid genomes of Oenothera, subsection Oenothera: I. Sequence evaluation and plastome evolution†

    Science.gov (United States)

    Greiner, Stephan; Wang, Xi; Rauwolf, Uwe; Silber, Martina V.; Mayer, Klaus; Meurer, Jörg; Haberer, Georg; Herrmann, Reinhold G.

    2008-01-01

    The flowering plant genus Oenothera is uniquely suited for studying molecular mechanisms of speciation. It assembles an intriguing combination of genetic features, including permanent translocation heterozygosity, biparental transmission of plastids, and a general interfertility of well-defined species. This allows an exchange of plastids and nuclei between species often resulting in plastome–genome incompatibility. For evaluation of its molecular determinants we present the complete nucleotide sequences of the five basic, genetically distinguishable plastid chromosomes of subsection Oenothera (=Euoenothera) of the genus, which are associated in distinct combinations with six basic genomes. Sizes of the chromosomes range from 163 365 bp (plastome IV) to 165 728 bp (plastome I), display between 96.3% and 98.6% sequence similarity and encode a total of 113 unique genes. Plastome diversification is caused by an abundance of nucleotide substitutions, small insertions, deletions and repetitions. The five plastomes deviate from the general ancestral design of plastid chromosomes of vascular plants by a subsection-specific 56 kb inversion within the large single-copy segment. This inversion disrupted operon structures and predates the divergence of the subsection presumably 1 My ago. Phylogenetic relationships suggest plastomes I–III in one clade, while plastome IV appears to be closest to the common ancestor. PMID:18299283

  2. Selection, Recombination and History in a Parasitic Flatworm (Echinococcus Inferred from Nucleotide Sequences

    Directory of Open Access Journals (Sweden)

    Haag KL

    1998-01-01

    Full Text Available Three species of flatworms from the genus Echinococcus (E. granulosus, E. multilocularis and E. vogeli and four strains of E. granulosus (cattle, horse, pig and sheep strains were analysed by the PCR-SSCP method followed by sequencing, using as targets two non-coding and two coding (one nuclear and one mitochondrial genomic regions. The sequencing data was used to evaluate hypothesis about the parasite breeding system and the causes of genetic diversification. The calculated recombination parameters suggested that cross-fertilisation was rare in the history of the group. However, the relative rates of substitution in the coding sequences showed that positive selection (instead of purifying selection drove the evolution of an elastase and neutrophil chemotaxis inhibitor gene (AgB/1. The phylogenetic analyses revealed several ambiguities, indicating that the taxonomic status of the E. granulosus horse strain should be revised

  3. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    Science.gov (United States)

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  4. Amino acid sequence motifs essential for P0-mediated suppression of RNA silencing in an isolate of potato leafroll virus from Inner Mongolia.

    Science.gov (United States)

    Zhuo, Tao; Li, Yuan-Yuan; Xiang, Hai-Ying; Wu, Zhan-Yu; Wang, Xian-Bin; Wang, Ying; Zhang, Yong-Liang; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2014-06-01

    Polerovirus P0 suppressors of host gene silencing contain a consensus F-box-like motif with Leu/Pro (L/P) requirements for suppressor activity. The Inner Mongolian Potato leafroll virus (PLRV) P0 protein (P0(PL-IM)) has an unusual F-box-like motif that contains a Trp/Gly (W/G) sequence and an additional GW/WG-like motif (G139/W140/G141) that is lacking in other P0 proteins. We used Agrobacterium infiltration-mediated RNA silencing assays to establish that P0(PL-IM) has a strong suppressor activity. Mutagenesis experiments demonstrated that the P0(PL-IM) F-box-like motif encompasses amino acids 76-LPRHLHYECLEWGLLCG THP-95, and that the suppressor activity is abolished by L76A, W87A, or G88A substitution. The suppressor activity is also weakened substantially by mutations within the G139/W140/G141 region and is eliminated by a mutation (F220R) in a C-terminal conserved sequence of P0(PL-IM). As has been observed with other P0 proteins, P0(PL-IM) suppression is correlated with reduced accumulation of the host AGO1-silencing complex protein. However, P0(PL-IM) fails to bind SKP1, which functions in a proteasome pathway that may be involved in AGO1 degradation. These results suggest that P0(PL-IM) may suppress RNA silencing by using an alternative pathway to target AGO1 for degradation. Our results help improve our understanding of the molecular mechanisms involved in PLRV infection.

  5. Molecular cloning and nucleotide sequence of cDNA for human liver arginase

    International Nuclear Information System (INIS)

    Haraguchi, Y.; Takiguchi, M.; Amaya, Y.; Kawamoto, S.; Matsuda, I.; Mori, M.

    1987-01-01

    Arginase (EC3.5.3.1) catalyzes the last step of the urea cycle in the liver of ureotelic animals. Inherited deficiency of the enzyme results in argininemia, an autosomal recessive disorder characterized by hyperammonemia. To facilitate investigation of the enzyme and gene structures and to elucidate the nature of the mutation in argininemia, the authors isolated cDNA clones for human liver arginase. Oligo(dT)-primed and random primer human liver cDNA libraries in λ gt11 were screened using isolated rat arginase cDNA as a probe. Two of the positive clones, designated λ hARG6 and λ hARG109, contained an overlapping cDNA sequence with an open reading frame encoding a polypeptide of 322 amino acid residues (predicted M/sub r/, 34,732), a 5'-untranslated sequence of 56 base pairs, a 3'-untranslated sequence of 423 base pairs, and a poly(A) segment. Arginase activity was detected in Escherichia coli cells transformed with the plasmid carrying λ hARG6 cDNA insert. RNA gel blot analysis of human liver RNA showed a single mRNA of 1.6 kilobases. The predicted amino acid sequence of human liver arginase is 87% and 41% identical with those of the rat liver and yeast enzymes, respectively. There are several highly conserved segments among the human, rat, and yeast enzymes

  6. Cloning, nucleotide sequence and transcriptional analysis of the uvrA gene from Neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Black, C.G.; Fyfe, J.A.M.; Davies, J.K.

    1997-01-01

    A recombinant plasmid capable of restoring UV resistance to an Escherichia coli uvrA mutant was isolated from a genomic library of Neisseria gonorrhoeae. Sequence analysis revealed an open reading frame whose deduced amino acid sequence displayed significant similarity to those of the UvrA proteins of other bacterial species. A second open reading frame (ORF259) was identified upstream from, and in the opposite orientation to the gonococcal uvrA gene. Transcriptional fusions between portions of the gonococcal uvrA upstream region and a reporter gene were used to localise promoter activity in both E. coli and N. gonorrhoeae. The transcriptional starting points of uvrA and ORF259 were mapped in E. coli by primer extension analysis, and corresponding σ 70 promoters were identified. The arrangement of the uvrA-ORF259 intergenic region is similar to that of the gonococcal recA-aroD intergenic region. Both contain inverted copies of the 10 bp neisserial DNA uptake sequence situated between divergently transcribed genes. However, there is no evidence that either the uptake sequence or the proximity of the promoters influences expression of these genes. (author)

  7. Nucleotide sequences of the genes encoding fructosebisphosphatase and phosphoribulokinase from Xanthobacter flavus H4-14

    NARCIS (Netherlands)

    Meijer, Wilhelmus; Enequist, H.G.; Terpstra, Peter; Dijkhuizen, L.

    The genes encoding fructosebisphosphatase and phosphoribulokinase present on a 2.5 kb SalI fragment from Xanthobacter flavus H4-14 were sequenced. Two large open reading frames (ORFs) were identified, preceded by plausible ribosome-binding sites. The ORFs were transcribed in the same direction and

  8. Symbolic complexity for nucleotide sequences: a sign of the genome structure

    International Nuclear Information System (INIS)

    Salgado-García, R; Ugalde, E

    2016-01-01

    We introduce a method for estimating the complexity function (which counts the number of observable words of a given length) of a finite symbolic sequence, which we use to estimate the complexity function of coding DNA sequences for several species of the Hominidae family. In all cases, the obtained symbolic complexities show the same characteristic behavior: exponential growth for small word lengths, followed by linear growth for larger word lengths. The symbolic complexities of the species we consider exhibit a systematic trend in correspondence with the phylogenetic tree. Using our method, we estimate the complexity function of sequences obtained by some known evolution models, and in some cases we observe the characteristic exponential-linear growth of the Hominidae coding DNA complexity. Analysis of the symbolic complexity of sequences obtained from a specific evolution model points to the following conclusion: linear growth arises from the random duplication of large segments during the evolution of the genome, while the decrease in the overall complexity from one species to another is due to a difference in the speed of accumulation of point mutations. (paper)

  9. Nucleotide and amino acid sequences of a coat protein of an Ukrainian isolate of Potato virus Y: comparison with homologous sequences of other isolates and phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Budzanivska I. G.

    2014-03-01

    Full Text Available Aim. Identification of the widespread Ukrainian isolate(s of PVY (Potato virus Y in different potato cultivars and subsequent phylogenetic analysis of detected PVY isolates based on NA and AA sequences of coat protein. Methods. ELISA, RT-PCR, DNA sequencing and phylogenetic analysis. Results. PVY has been identified serologically in potato cultivars of Ukrainian selection. In this work we have optimized a method for total RNA extraction from potato samples and offered a sensitive and specific PCR-based test system of own design for diagnostics of the Ukrainian PVY isolates. Part of the CP gene of the Ukrainian PVY isolate has been sequenced and analyzed phylogenetically. It is demonstrated that the Ukrainian isolate of Potato virus Y (CP gene has a higher percentage of homology with the recombinant isolates (strains of this pathogen (approx. 98.8– 99.8 % of homology for both nucleotide and translated amino acid sequences of the CP gene. The Ukrainian isolate of PVY is positioned in the separate cluster together with the isolates found in Syria, Japan and Iran; these isolates possibly have common origin. The Ukrainian PVY isolate is confirmed to be recombinant. Conclusions. This work underlines the need and provides the means for accurate monitoring of Potato virus Y in the agroecosystems of Ukraine. Most importantly, the phylogenetic analysis demonstrated the recombinant nature of this PVY isolate which has been attributed to the strain group O, subclade N:O.

  10. Mapping vaccinia virus DNA replication origins at nucleotide level by deep sequencing.

    Science.gov (United States)

    Senkevich, Tatiana G; Bruno, Daniel; Martens, Craig; Porcella, Stephen F; Wolf, Yuri I; Moss, Bernard

    2015-09-01

    Poxviruses reproduce in the host cytoplasm and encode most or all of the enzymes and factors needed for expression and synthesis of their double-stranded DNA genomes. Nevertheless, the mode of poxvirus DNA replication and the nature and location of the replication origins remain unknown. A current but unsubstantiated model posits only leading strand synthesis starting at a nick near one covalently closed end of the genome and continuing around the other end to generate a concatemer that is subsequently resolved into unit genomes. The existence of specific origins has been questioned because any plasmid can replicate in cells infected by vaccinia virus (VACV), the prototype poxvirus. We applied directional deep sequencing of short single-stranded DNA fragments enriched for RNA-primed nascent strands isolated from the cytoplasm of VACV-infected cells to pinpoint replication origins. The origins were identified as the switching points of the fragment directions, which correspond to the transition from continuous to discontinuous DNA synthesis. Origins containing a prominent initiation point mapped to a sequence within the hairpin loop at one end of the VACV genome and to the same sequence within the concatemeric junction of replication intermediates. These findings support a model for poxvirus genome replication that involves leading and lagging strand synthesis and is consistent with the requirements for primase and ligase activities as well as earlier electron microscopic and biochemical studies implicating a replication origin at the end of the VACV genome.

  11. Complete nucleotide sequences and virion particle association of two satellite RNAs of panicum mosaic virus.

    Science.gov (United States)

    Pyle, Jesse D; Monis, Judit; Scholthof, Karen-Beth

    2017-08-15

    Over six decades ago, panicum mosaic virus (PMV) was identified as the first viral pathogen of cultivated switchgrass (Panicum virgatum). Subsequently, PMV was demonstrated to support the replication of both a satellite RNA virus (SPMV) and satellite RNA (satRNA) agents during natural infections of host grasses. In this study, we report the isolation and full-length sequences of two PMV satRNAs identified in 1988 from St. Augustinegrass (Stenotaphrum secundatum) and centipedegrass (Eremochloa ophiuroides) hosts. Each of these satellites have sequence relatedness at their 5'- and 3'-ends. In addition, satC has a region of ∼100 nt complementary to the 3'-end of the PMV genome. These agents are associated with purified virions of SPMV infections. Additionally, satS and satC RNAs contain conserved in-frame open reading frames in the complementary-sense sequences that could potentially generate 6.6- and 7.9-kDa proteins, respectively. In protoplasts and plants satS is infectious, when co-inoculated with the PMV RNA alone or PMV+SPMV RNAs, and negatively affects their accumulation. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Molecular cloning, nucleotide sequence, and expression of the gene encoding human eosinophil differentiation factor (interleukin 5)

    International Nuclear Information System (INIS)

    Campbell, H.D.; Tucker, W.Q.J.; Hort, Y.; Martinson, M.E.; Mayo, G.; Clutterbuck, E.J.; Sanderson, C.J.; Young, I.G.

    1987-01-01

    The human eosinophil differentiation factor (EDF) gene was cloned from a genomic library in λ phage EMBL3A by using a murine EDF cDNA clone as a probe. The DNA sequence of a 3.2-kilobase BamHI fragment spanning the gene was determined. The gene contains three introns. The predicted amino acid sequence of 134 amino acids is identical with that recently reported for human interleukin 5 but shows no significant homology with other known hemopoietic growth regulators. The amino acid sequence shows strong homology (∼ 70% identity) with that of murine EDF. Recombinant human EDF, expressed from the human EDF gene after transfection into monkey COS cells, stimulated the production of eosinophils and eosinophil colonies from normal human bone marrow but had no effect on the production of neutrophils or mononuclear cells (monocytes and lymphoid cells). The apparent specificity of human EDF for the eosinophil lineage in myeloid hemopoiesis contrasts with the properties of human interleukin 3 and granulocyte/macrophage and granulocyte colony-stimulating factors but is directly analogous to the biological properties of murine EDF. Human EDF therefore represents a distinct hemopoietic growth factor that could play a central role in the regulation of eosinophilia

  13. Molecular Properties of Poliovirus Isolates: Nucleotide Sequence Analysis, Typing by PCR and Real-Time RT-PCR.

    Science.gov (United States)

    Burns, Cara C; Kilpatrick, David R; Iber, Jane C; Chen, Qi; Kew, Olen M

    2016-01-01

    Virologic surveillance is essential to the success of the World Health Organization initiative to eradicate poliomyelitis. Molecular methods have been used to detect polioviruses in tissue culture isolates derived from stool samples obtained through surveillance for acute flaccid paralysis. This chapter describes the use of realtime PCR assays to identify and serotype polioviruses. In particular, a degenerate, inosine-containing, panpoliovirus (panPV) PCR primer set is used to distinguish polioviruses from NPEVs. The high degree of nucleotide sequence diversity among polioviruses presents a challenge to the systematic design of nucleic acid-based reagents. To accommodate the wide variability and rapid evolution of poliovirus genomes, degenerate codon positions on the template were matched to mixed-base or deoxyinosine residues on both the primers and the TaqMan™ probes. Additional assays distinguish between Sabin vaccine strains and non-Sabin strains. This chapter also describes the use of generic poliovirus specific primers, along with degenerate and inosine-containing primers, for routine VP1 sequencing of poliovirus isolates. These primers, along with nondegenerate serotype-specific Sabin primers, can also be used to sequence individual polioviruses in mixtures.

  14. Complete nucleotide sequences of a new bipartite begomovirus from Malvastrum sp. plants with bright yellow mosaic symptoms in South Texas.

    Science.gov (United States)

    Alabi, Olufemi J; Villegas, Cecilia; Gregg, Lori; Murray, K Daniel

    2016-06-01

    Two isolates of a novel bipartite begomovirus, tentatively named malvastrum bright yellow mosaic virus (MaBYMV), were molecularly characterized from naturally infected plants of the genus Malvastrum showing bright yellow mosaic disease symptoms in South Texas. Six complete DNA-A and five DNA-B genome sequences of MaBYMV obtained from the isolates ranged in length from 2,608 to 2,609 nucleotides (nt) and 2,578 to 2,605 nt, respectively. Both genome segments shared a 178- to 180-nt common region. In pairwise comparisons, the complete DNA-A and DNA-B sequences of MaBYMV were most similar (87-88 % and 79-81 % identity, respectively) and phylogenetically related to the corresponding sequences of sida mosaic Sinaloa virus-[MX-Gua-06]. Further analysis revealed that MaBYMV is a putative recombinant virus, thus supporting the notion that malvaceous hosts may be influencing the evolution of several begomoviruses. The design of new diagnostic primers enabled the detection of MaBYMV in cohorts of Bemisia tabaci collected from symptomatic Malvastrum sp. plants, thus implicating whiteflies as potential vectors of the virus.

  15. The complete nucleotide sequence, genome organization, and origin of human adenovirus type 11

    International Nuclear Information System (INIS)

    Stone, Daniel; Furthmann, Anne; Sandig, Volker; Lieber, Andre

    2003-01-01

    The complete DNA sequence and transcription map of human adenovirus type 11 are reported here. This is the first published sequence for a subgenera B human adenovirus and demonstrates a genome organization highly similar to those of other human adenoviruses. All of the genes from the early, intermediate, and late regions are present in the expected locations of the genome for a human adenovirus. The genome size is 34,794 bp in length and has a GC content of 48.9%. Sequence alignment with genomes of groups A (Ad12), C (Ad5), D (Ad17), E (Simian adenovirus 25), and F (Ad40) revealed homologies of 64, 54, 68, 75, and 52%, respectively. Detailed genomic analysis demonstrated that Ads 11 and 35 are highly conserved in all areas except the hexon hypervariable regions and fiber. Similarly, comparison of Ad11 with subgroup E SAV25 revealed poor homology between fibers but high homology in proteins encoded by all other areas of the genome. We propose an evolutionary model in which functional viruses can be reconstituted following fiber substitution from one serotype to another. According to this model either the Ad11 genome is a derivative of Ad35, from which the fiber was substituted with Ad7, or the Ad35 genome is the product of a fiber substitution from Ad21 into the Ad11 genome. This model also provides a possible explanation for the origin of group E Ads, which are evolutionarily derived from a group C fiber substitution into a group B genome

  16. The nucleotide sequence and organization of nuclear 5S rRNA genes in yellow lupine

    International Nuclear Information System (INIS)

    Nuc, K.; Nuc, P.; Pawelkiewicz, J.

    1993-01-01

    We have isolated a genomic clone containing 'Lupinus luteus' 5S ribosomal RNA genes by screening with 5S rDNA probe clones that were hybridized previously with the initiator methionine tRNA preparation (contaminated) with traces of rRNA or its degradation products). The clone isolated contains ten repeat units of 342 bp with 119 bp fragment showing 100% homology to the 5S rRNA from yellow lupine. Sequence analysis indicates only point heterogeneities among the flanking regions of the genes. (author). 6 refs, 3 figs

  17. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias

    DEFF Research Database (Denmark)

    Kjær, Jonas; Belsham, Graham J.

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long) which induces a non-proteolytic, co-translational, "cleavage" at its own C......-terminus. A conserved feature among variants of 2A is the C-terminal motif N16P17G18/P19 where P19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E14, S15 and N16 within the 2A sequence of infectious FMDVs but no variants at residues P17, G18...... or P19 have been identified. In this study, using highly degenerate primers, we analysed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after 2, 3 or 4 passages. However...

  18. Update on Pneumocystis carinii f. sp. hominis Typing Based on Nucleotide Sequence Variations in Internal Transcribed Spacer Regions of rRNA Genes

    Science.gov (United States)

    Lee, Chao-Hung; Helweg-Larsen, Jannik; Tang, Xing; Jin, Shaoling; Li, Baozheng; Bartlett, Marilyn S.; Lu, Jang-Jih; Lundgren, Bettina; Lundgren, Jens D.; Olsson, Mats; Lucas, Sebastian B.; Roux, Patricia; Cargnel, Antonietta; Atzori, Chiara; Matos, Olga; Smith, James W.

    1998-01-01

    Pneumocystis carinii f. sp. hominis isolates from 207 clinical specimens from nine countries were typed based on nucleotide sequence variations in the internal transcribed spacer regions I and II (ITS1 and ITS2, respectively) of rRNA genes. The number of ITS1 nucleotides has been revised from the previously reported 157 bp to 161 bp. Likewise, the number of ITS2 nucleotides has been changed from 177 to 192 bp. The number of ITS1 sequence types has increased from 2 to 15, and that of ITS2 has increased from 3 to 14. The 15 ITS1 sequence types are designated types A through O, and the 14 ITS2 types are named types a through n. A total of 59 types of P. carinii f. sp. hominis were found in this study. PMID:9508304

  19. The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans.

    Science.gov (United States)

    Kumazaki, T; Hori, H; Osawa, S; Ishii, N; Suzuki, K

    1982-11-11

    The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans have been determined. The rotifer has two 5S rRNA species that are composed of 120 and 121 nucleotides, respectively. The sequences of these two 5S rRNAs are the same except that the latter has an additional base at its 3'-terminus. The 5S rRNAs from the two nematode species are both 119 nucleotides long. The sequence similarity percents are 79% (Brachionus/Rhabditis), 80% (Brachionus/Caenorhabditis), and 95% (Rhabditis/Caenorhabditis) among these three species. Brachionus revealed the highest similarity to Lingula (89%), but not to the nematodes (79%).

  20. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  1. Sequence motif upstream of the Hendra virus fusion protein cleavage site is not sufficient to promote efficient proteolytic processing

    International Nuclear Information System (INIS)

    Craft, Willie Warren; Dutch, Rebecca Ellis

    2005-01-01

    The Hendra virus fusion (HeV F) protein is synthesized as a precursor, F 0 , and proteolytically cleaved into the mature F 1 and F 2 heterodimer, following an HDLVDGVK 109 motif. This cleavage event is required for fusogenic activity. To determine the amino acid requirements for processing of the HeV F protein, we constructed multiple mutants. Individual and simultaneous alanine substitutions of the eight residues immediately upstream of the cleavage site did not eliminate processing. A chimeric SV5 F protein in which the furin site was substituted for the VDGVK 109 motif of the HeV F protein was not processed but was expressed on the cell surface. Another chimeric SV5 F protein containing the HDLVDGVK 109 motif of the HeV F protein underwent partial cleavage. These data indicate that the upstream region can play a role in protease recognition, but is neither absolutely required nor sufficient for efficient processing of the HeV F protein

  2. Appendix: a solution hybridization assay to detect radioactive globin messenger RNA nucleotide sequences

    Energy Technology Data Exchange (ETDEWEB)

    Ross, J

    1976-09-15

    In view of the sensitivity and specificity of the solution hybridization assay for unlabeled globin mRNA a similar technique has been devised to detect radioactive globin mRNA sequences with unlabeled globin cDNA. Several properties of the hybridization reaction are presented since RNA kinetic experiments reported recently depend on the validity of this assay. Data on hybridization analysis of (/sup 3/H)RNA from mouse fetal liver or erythroleukemia cell cytoplasm are presented. These data indicate that the excess cDNA solution assay for radioactive globin mRNA detection is specific for globin mRNA sequences. It can be performed rapidly and is highly reproducible from experiment. It is at least 500-fold less sensitive than the assay for unlabeled globin mRNA, due to the RNAase backgrounds of 0.05 to 0.15 %. However, this limitation has not affected kinetic experiments with non-dividing fetal liver erythroid cells, which synthesize relatively large quantities of globin mRNA.

  3. Nucleotide sequence of the promoter region of the gene encoding chicken Calbindin D28K

    Energy Technology Data Exchange (ETDEWEB)

    Ferrari, S; Drusiani, E; Battini, R; Fregni, M

    1988-01-11

    Calbindin D28K (formerly Vitamin D-Dependent Calcium Binding Protein) is a protein induced by 1,25-dihydroxycholecalciferol in several chicken tissues. A chicken genomic DNA library was screened with a synthetic oligonucleotide representing the sequence of Calbindin D18K cDNA from nt 146 to nt 176. The positive clone CBAl extends the 5'-end of the first exon by 451 bp. The sequence of a BamHI-SacII restriction fragment with coordinates -451 + 50 is shown. The BamHI-SacII fragment was subcloned 5' to the CAT gene of pUCCAT. The result is shown of a CAT assay on mouse fibroblasts 3T6 transiently transfected with pUCCAT, pUCCAT containing the BamHI-SacII fragment in the correct or opposite orientation or the SV40 promoter. /sup 14/C-chloramphenicol and its acetyl derivatives generated by purified CAT are also shown. The expression of CAT appears to be constitutive since the enzyme activity is not influenced by the presence (+) or absence (-) of 1,25-dihydroxycholecalciferol in the culture medium.

  4. Identification and nucleotide sequence of the thymidine kinase gene of Shope fibroma virus

    International Nuclear Information System (INIS)

    Upton, C.; McFadden, G.

    1986-01-01

    The thymidine kinase (TK) gene of Shope fibroma virus (SFV), a tumorigenic leporipoxvirus, was localized within the viral genome with degenerate oligonucleotide probes. These probes were constructed to two regions of high sequence conservation between the vaccinia virus TK gene and those of several known eucaryotic cellular TK genes, including human, mouse, hamster, and chicken TK genes. The oligonucleotide probes initially localized the SFV TK gene 50 kilobases (kb) from the right terminus of the 160-kb SFV genome within the 9.5-kb BamHI-HindIII fragment E. Fine-mapping analysis indicated that the TK Gene was within a 1.2-kb AvaI-HaeIII fragment, and DNA sequencing of this region revealed an open reading frame capable of encoding a polypeptide of 187 amino acids possessing considerable homology to the TK genes of the vaccinia, variola, and monkeypox orthopoxviruses and also to a variety of cellular TK genes. Homology matrix analysis and homology scores suggest that the SFV TK gene has diverged significantly from its counterpart members in the orthopoxvirus genus. Nevertheless, the presence of conserved upstream open reading frames on the 5' side of all of the poxvirus TK genes indicates a similarity of functional organization between the orthopoxviruses and leporipoxviruses. These data suggest a common ancestral origin for at least some of the unique internal regions of the leporipoxviruses and orthopoxviruses as exemplified by SFV and vaccinia virus, respectively

  5. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    Science.gov (United States)

    Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

    2014-07-04

    Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was

  6. The Bryopsis hypnoides plastid genome: multimeric forms and complete nucleotide sequence.

    Directory of Open Access Journals (Sweden)

    Fang Lü

    Full Text Available BACKGROUND: Bryopsis hypnoides Lamouroux is a siphonous green alga, and its extruded protoplasm can aggregate spontaneously in seawater and develop into mature individuals. The chloroplast of B. hypnoides is the biggest organelle in the cell and shows strong autonomy. To better understand this organelle, we sequenced and analyzed the chloroplast genome of this green alga. PRINCIPAL FINDINGS: A total of 111 functional genes, including 69 potential protein-coding genes, 5 ribosomal RNA genes, and 37 tRNA genes were identified. The genome size (153,429 bp, arrangement, and inverted-repeat (IR-lacking structure of the B. hypnoides chloroplast DNA (cpDNA closely resembles that of Chlorella vulgaris. Furthermore, our cytogenomic investigations using pulsed-field gel electrophoresis (PFGE and southern blotting methods showed that the B. hypnoides cpDNA had multimeric forms, including monomer, dimer, trimer, tetramer, and even higher multimers, which is similar to the higher order organization observed previously for higher plant cpDNA. The relative amounts of the four multimeric cpDNA forms were estimated to be about 1, 1/2, 1/4, and 1/8 based on molecular hybridization analysis. Phylogenetic analyses based on a concatenated alignment of chloroplast protein sequences suggested that B. hypnoides is sister to all Chlorophyceae and this placement received moderate support. CONCLUSION: All of the results suggest that the autonomy of the chloroplasts of B. hypnoides has little to do with the size and gene content of the cpDNA, and the IR-lacking structure of the chloroplasts indirectly demonstrated that the multimeric molecules might result from the random cleavage and fusion of replication intermediates instead of recombinational events.

  7. The complete nucleotide sequence of the barley yellow dwarf GPV isolate from China shows that it is a new member of the genus Polerovirus.

    Science.gov (United States)

    Zhang, Wenwei; Cheng, Zhuomin; Xu, Lei; Wu, Maosen; Waterhouse, Peter; Zhou, Guanghe; Li, Shifang

    2009-01-01

    The complete nucleotide sequence of the ssRNA genome of a Chinese GPV isolate of barley yellow dwarf virus (BYDV) was determined. It comprised 5673 nucleotides, and the deduced genome organization resembled that of members of the genus Polerovirus. It was most closely related to cereal yellow dwarf virus-RPV (77% nt identity over the entire genome; coat protein amino acid identity 79%). The GPV isolate also differs in vector specificity from other BYDV strains. Biological properties, phylogenetic analyses and detailed sequence comparisons suggest that GPV should be considered a member of a new species within the genus, and the name Wheat yellow dwarf virus-GPV is proposed.

  8. The R package otu2ot for implementing the entropy decomposition of nucleotide variation in sequence data

    Directory of Open Access Journals (Sweden)

    Alban eRamette

    2014-11-01

    Full Text Available Oligotyping is a novel, supervised computational method that classifies closely related sequences into oligotypes (OTs based on subtle nucleotide variations (Eren et al. 2013. Its application to microbial datasets has helped reveal ecological patterns which are often hidden by the way sequence data are currently clustered to define operational taxonomic units (OTUs. Here, we implemented the OT entropy decomposition procedure and its unsupervised version, Minimal Entropy Decomposition (MED; Eren et al. 2014, in the statistical programming language and environment, R. The aims are to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework. In addition, two complementary approaches are implemented: 1 An analytical method (the broken stick model is proposed to help identify oligotypes of low abundance that could be generated by chance alone and 2 a one-pass profiling (OP method, to efficiently identify those OTUs whose subsequent oligotyping would be most promising. These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible. The package and procedures are illustrated by several tutorials and examples.

  9. Nucleotide sequence of the coat protein gene of Lettuce big-vein virus.

    Science.gov (United States)

    Sasaya, T; Ishikawa, K; Koganezawa, H

    2001-06-01

    A sequence of 1425 nt was established that included the complete coat protein (CP) gene of Lettuce big-vein virus (LBVV). The LBVV CP gene encodes a 397 amino acid protein with a predicted M(r) of 44486. Antisera raised against synthetic peptides corresponding to N-terminal or C-terminal parts of the LBVV CP reacted in Western blot analysis with a protein with an M(r) of about 48000. RNA extracted from purified particles of LBVV by using proteinase K, SDS and phenol migrated in gels as two single-stranded RNA species of approximately 7.3 kb (ss-1) and 6.6 kb (ss-2). After denaturation by heat and annealing at room temperature, the RNA migrated as four species, ss-1, ss-2 and two additional double-stranded RNAs (ds-1 and ds-2). The Northern blot hybridization analysis using riboprobes from a full-length clone of the LBVV CP gene indicated that ss-2 has a negative-sense nature and contains the LBVV CP gene. Moreover, ds-2 is a double-stranded form of ss-2. Database searches showed that the LBVV CP most resembled the nucleocapsid proteins of rhabdoviruses. These results indicate that it would be appropriate to classify LBVV as a negative-sense single-stranded RNA virus rather than as a double-stranded RNA virus.

  10. Spectrometric study of the folding process of i-motif-forming DNA sequences upstream of the c-kit transcription initiation site

    International Nuclear Information System (INIS)

    Bucek, Pavel; Gargallo, Raimundo; Kudrev, Andrei

    2010-01-01

    The c-kit oncogene shows a cytosine-rich DNA region upstream of the transcription initiation site which forms an i-motif structure at slightly acidic pH values (Bucek et al. ). In the present study, the pH-induced formation of i-motif - forming sequences 5'-CCC CTC CCT CGC GCC CGC CCG-3' (ckitC1, native), 5'-CCC TTC CCT TGT GCC CGC CCG-3' (ckitC2) and 5'-CCCTT CCC TTTTT CCC T CCC T-3' (ckitC3) was studied by spectroscopic techniques, such as UV molecular absorption and circular dichroism (CD), in tandem with two multivariate data analysis methods, the hard modelling-based matrix method and the soft modelling-based MCR-ALS approach. Use of the hard chemical modelling enabled us to propose the equilibrium model, which describes spectral changes as functions of solution acidity. Additionally, the intrinsic protonation constant, K in , and the cooperativity parameters, ω c , and ω a , were calculated from the fitting procedure of the coupled CD and molecular absorption spectra. In the case of ckitC2 and ckitC3, the hard model correctly reproduced the spectral variations observed experimentally. The results indicated that folding was accompanied by a cooperative process, i.e. the enhancement of protonated structure stability upon protonation. In contrast, unfolding was accompanied by an anticooperative process. Finally, folding of the native sequence, ckitC1, seemed to follow a more complex mechanism.

  11. A survey of endogenous retrovirus (ERV) sequences in the vicinity of multiple sclerosis (MS)-associated single nucleotide polymorphisms (SNPs).

    Science.gov (United States)

    Brütting, Christine; Emmer, Alexander; Kornhuber, Malte; Staege, Martin S

    2016-08-01

    Although multiple sclerosis (MS) is one of the most common central nervous system diseases in young adults, little is known about its etiology. Several human endogenous retroviruses (ERVs) are considered to play a role in MS. We are interested in which ERVs can be identified in the vicinity of MS associated genetic marker to find potential initiators of MS. We analysed the chromosomal regions surrounding 58 single nucleotide polymorphisms (SNPs) that are associated with MS identified in one of the last major genome wide association studies. We scanned these regions for putative endogenous retrovirus sequences with large open reading frames (ORFs). We observed that more retrovirus-related putative ORFs exist in the relatively close vicinity of SNP marker indices in multiple sclerosis compared to control SNPs. We found very high homologies to HERV-K, HCML-ARV, XMRV, Galidia ERV, HERV-H/env62 and XMRV-like mouse endogenous retrovirus mERV-XL. The associated genes (CYP27B1, CD6, CD58, MPV17L2, IL12RB1, CXCR5, PTGER4, TAGAP, TYK2, ICAM3, CD86, GALC, GPR65 as well as the HLA DRB1*1501) are mainly involved in the immune system, but also in vitamin D regulation. The most frequently detected ERV sequences are related to the multiple sclerosis-associated retrovirus, the human immunodeficiency virus 1, HERV-K, and the Simian foamy virus. Our data shows that there is a relation between MS associated SNPs and the number of retroviral elements compared to control. Our data identifies new ERV sequences that have not been associated with MS, so far.

  12. The Saccharomyces cerevisiae RAD18 gene encodes a protein that contains potential zinc finger domains for nucleic acid binding and a putative nucleotide binding sequence

    Energy Technology Data Exchange (ETDEWEB)

    Jones, J.S.; Prakash, L. (Univ. of Rochester School of Medicine, NY (USA)); Weber, S. (Kodak Research Park, Rochester, NY (USA))

    1988-07-25

    The RAD18 gene of Saccharomyces cerevisiae is required for postreplication repair of UV damaged DNA. The authors have isolated the RAD18 gene, determined its nucleotide sequence and examined if deletion mutations of this gene show different or more pronounced phenotypic effects than the previously described point mutations. The RAD18 gene open reading frame encodes a protein of 487 amino acids, with a calculated molecular weight of 55,512. The RAD18 protein contains three potential zinc finger domains for nucleic acid binding, and a putative nucleotide binding sequence that is present in many proteins that bind and hydrolyze ATP. The DNA binding and nucleotide binding activities could enable the RAD18 protein to bind damaged sites in the template DNA with high affinity. Alternatively, or in addition, RAD18 protein may be a transcriptional regulator. The RAD18 deletion mutation resembles the previously described point mutations in its effects on viability, DNA repair, UV mutagenesis, and sporulation.

  13. Identities among actin-encoding cDNAs of the Nile tilapia (Oreochromis niloticus and other eukaryote species revealed by nucleotide and amino acid sequence analyses

    Directory of Open Access Journals (Sweden)

    Andréia B. Poletto

    2008-01-01

    Full Text Available Actin-encoding cDNAs of Nile tilapia (Oreochromis niloticus were isolated by RT-PCR using total RNA samples of different tissues and further characterized by nucleotide sequencing and in silico amino acid (aa sequence analysis. Comparisons among the actin gene sequences of O. niloticus and those of other species evidenced that the isolated genes present a high similarity to other fish and other vertebrate actin genes. The highest nucleotide resemblance was observed between O. niloticus and O. mossambicus a-actin and b-actin genes. Analysis of the predicted aa sequences revealed two distinct types of cytoplasmic actins, one cardiac muscle actin type and one skeletal muscle actin type that were expressed in different tissues of Nile tilapia. The evolutionary relationships between the Nile tilapia actin genes and diverse other organisms is discussed.

  14. Nucleotide sequence analyses of genomic RNAs of peanut stunt virus Mi, the type strain representative of a novel PSV subgroup from China

    NARCIS (Netherlands)

    Yan, L.; Xu, Z.; Goldbach, R.W.; Chen, Y.K.; Prins, M.W.

    2005-01-01

    The complete nucleotide sequence of Peanut stunt virus strain Mi (PSV-Mi) from China was determined and compared to other viruses of the genus Cucumovirus. The tripartite genome of PSV-Mi encoded five open reading frames (ORFs) typical of cucumoviruses. Distance analyses of four ORFs indicated that

  15. Nucleotide Sequence and Analysis of an orotate transporter-containing plasmid isolated from the Lactococcus lactis ssp. lactis biovar diacetylactis strain DB0410

    DEFF Research Database (Denmark)

    Defoor, Els Marie Celine; Martinussen, Jan

    A new lactococcal plasmid, pDBORO, was isolated from the Lactococcus lactis ssp. lactis biovar diacetylactis strain DB0410 responsible for the sensitivity of DB0410 towards the pyrimidine-analog 5´-fluoroorotate. The plasmid pDBORO amounts to 16404 bp and its complete nucleotide sequence has been...

  16. A resource of genome-wide single-nucleotide polymorphisms generated by RAD tag sequencing in the critically endangered European eel

    DEFF Research Database (Denmark)

    Pujolar, J.M.; Jacobsen, M.W.; Frydenberg, J.

    2013-01-01

    Reduced representation genome sequencing such as restriction-site-associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the Eu...... 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome...

  17. Genetic differentiation between fake abalone and genuine Haliotis species using the forensically informative nucleotide sequencing (FINS) method.

    Science.gov (United States)

    Ha, Wai Y; Reid, David G; Kam, Wan L; Lau, Yuk Y; Sham, Wing C; Tam, Silvia Y K; Sin, Della W M; Mok, Chuen S

    2011-05-25

    Abalones ( Haliotis species) are a popular delicacy and commonly preserved in dried form either whole or in slices or small pieces for consumption in Asian countries. Driven by the huge profit from trading abalones, dishonest traders may substitute other molluscan species for processed abalone, of which the morphological characteristics are frequently lost in the processed form. For protection of consumer rights and law enforcement against fraud, there is a need for an effective methodology to differentiate between fake and genuine abalone. This paper describes a method (validated according to the international forensic guidelines provided by SWGDAM) for the identification of fake abalone species using forensically informative nucleotide sequence (FINS) analysis. A study of the local market revealed that many claimed "abalone slice" samples on sale are not genuine. The fake abalone samples were found to be either volutids of the genus Cymbium (93%) or the muricid Concholepas concholepas (7%). This is the first report of Cymbium species being used for the preparation and sale as "abalone" in dried sliced form in Hong Kong.

  18. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    Directory of Open Access Journals (Sweden)

    Kei-ichi Morita

    Full Text Available Gorlin syndrome (GS is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs. In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals, whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  19. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    Science.gov (United States)

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  20. The nucleotide sequence and a first generation gene transfer vector of species B human adenovirus serotype 3.

    Science.gov (United States)

    Sirena, Dominique; Ruzsics, Zsolt; Schaffner, Walter; Greber, Urs F; Hemmi, Silvio

    2005-12-20

    Human adenovirus (Ad) serotype 3 causes respiratory infections. It is considered highly virulent, accounting for about 13% of all Ad isolates. We report here the complete Ad3 DNA sequence of 35,343 base pairs (GenBank accession DQ086466). Ad3 shares 96.43% nucleotide identity with Ad7, another virulent subspecies B1 serotype, and 82.56 and 62.75% identity with the less virulent species B2 Ad11 and species C Ad5, respectively. The genomic organization of Ad3 is similar to the other human Ads comprising five early transcription units, E1A, E1B, E2, E3, and E4, two delayed early units IX and IVa2, and the major late unit, in total 39 putative and 7 hypothetical open reading frames. A recombinant E1-deleted Ad3 was generated on a bacterial artificial chromosome. This prototypic virus efficiently transduced CD46-positive rodent and human cells. Our results will help in clarifying the biology and pathology of adenoviruses and enhance therapeutic applications of viral vectors in clinical settings.

  1. Development of Prevotella intermedia-specific PCR primers based on the nucleotide sequences of a DNA probe Pig27.

    Science.gov (United States)

    Kim, Min Jung; Hwang, Kyung Hwan; Lee, Young-Seok; Park, Jae-Yoon; Kook, Joong-Ki

    2011-03-01

    The aim of this study was to develop Prevotella intermedia-specific PCR primers based on the P. intermedia-specific DNA probe. The P. intermedia-specific DNA probe was screened by inverted dot blot hybridization and confirmed by Southern blot hybridization. The nucleotide sequences of the species-specific DNA probes were determined using a chain termination method. Southern blot analysis showed that the DNA probe, Pig27, detected only the genomic DNA of P. intermedia strains. PCR showed that the PCR primers, Pin-F1/Pin-R1, had species-specificity for P. intermedia. The detection limits of the PCR primer sets were 0.4pg of the purified genomic DNA of P. intermedia ATCC 49046. These results suggest that the PCR primers, Pin-F1/Pin-R1, could be useful in the detection of P. intermedia as well as in the development of a PCR kit in epidemiological studies related to periodontal diseases. Crown Copyright © 2010. Published by Elsevier B.V. All rights reserved.

  2. High-resolution melting genotyping of Enterococcus faecium based on multilocus sequence typing derived single nucleotide polymorphisms.

    Directory of Open Access Journals (Sweden)

    Steven Y C Tong

    Full Text Available We have developed a single nucleotide polymorphism (SNP nucleated high-resolution melting (HRM technique to genotype Enterococcus faecium. Eight SNPs were derived from the E. faecium multilocus sequence typing (MLST database and amplified fragments containing these SNPs were interrogated by HRM. We tested the HRM genotyping scheme on 85 E. faecium bloodstream isolates and compared the results with MLST, pulsed-field gel electrophoresis (PFGE and an allele specific real-time PCR (AS kinetic PCR SNP typing method. In silico analysis based on predicted HRM curves according to the G+C content of each fragment for all 567 sequence types (STs in the MLST database together with empiric data from the 85 isolates demonstrated that HRM analysis resolves E. faecium into 231 "melting types" (MelTs and provides a Simpson's Index of Diversity (D of 0.991 with respect to MLST. This is a significant improvement on the AS kinetic PCR SNP typing scheme that resolves 61 SNP types with D of 0.95. The MelTs were concordant with the known ST of the isolates. For the 85 isolates, there were 13 PFGE patterns, 17 STs, 14 MelTs and eight SNP types. There was excellent concordance between PFGE, MLST and MelTs with Adjusted Rand Indices of PFGE to MelT 0.936 and ST to MelT 0.973. In conclusion, this HRM based method appears rapid and reproducible. The results are concordant with MLST and the MLST based population structure.

  3. Single nucleotide polymorphism discovery via genotyping by sequencing to assess population genetic structure and recurrent polyploidization in Andropogon gerardii.

    Science.gov (United States)

    McAllister, Christine A; Miller, Allison J

    2016-07-01

    Autopolyploidy, genome duplication within a single lineage, can result in multiple cytotypes within a species. Geographic distributions of cytotypes may reflect the evolutionary history of autopolyploid formation and subsequent population dynamics including stochastic (drift) and deterministic (differential selection among cytotypes) processes. Here, we used a population genomic approach to investigate whether autopolyploidy occurred once or multiple times in Andropogon gerardii, a widespread, North American grass with two predominant cytotypes. Genotyping by sequencing was used to identify single nucleotide polymorphisms (SNPs) in individuals collected from across the geographic range of A. gerardii. Two independent approaches to SNP calling were used: the reference-free UNEAK pipeline and a reference-guided approach based on the sequenced Sorghum bicolor genome. SNPs generated using these pipelines were analyzed independently with genetic distance and clustering. Analyses of the two SNP data sets showed very similar patterns of population-level clustering of A. gerardii individuals: a cluster of A. gerardii individuals from the southern Plains, a northern Plains cluster, and a western cluster. Groupings of individuals corresponded to geographic localities regardless of cytotype: 6x and 9x individuals from the same geographic area clustered together. SNPs generated using reference-guided and reference-free pipelines in A. gerardii yielded unique subsets of genomic data. Both data sets suggest that the 9x cytotype in A. gerardii likely evolved multiple times from 6x progenitors across the range of the species. Genomic approaches like GBS and diverse bioinformatics pipelines used here facilitate evolutionary analyses of complex systems with multiple ploidy levels. © 2016 Botanical Society of America.

  4. F-Type Lectins: A Highly Diversified Family of Fucose-Binding Proteins with a Unique Sequence Motif and Structural Fold, Involved in Self/Non-Self-Recognition

    Directory of Open Access Journals (Sweden)

    Gerardo R. Vasta

    2017-11-01

    Full Text Available The F-type lectin (FTL family is one of the most recent to be identified and structurally characterized. Members of the FTL family are characterized by a fucose recognition domain [F-type lectin domain (FTLD] that displays a novel jellyroll fold (“F-type” fold and unique carbohydrate- and calcium-binding sequence motifs. This novel lectin family comprises widely distributed proteins exhibiting single, double, or greater multiples of the FTLD, either tandemly arrayed or combined with other structurally and functionally distinct domains, yielding lectin subunits of pleiotropic properties even within a single species. Furthermore, the extraordinary variability of FTL sequences (isoforms that are expressed in a single individual has revealed genetic mechanisms of diversification in ligand recognition that are unique to FTLs. Functions of FTLs in self/non-self-recognition include innate immunity, fertilization, microbial adhesion, and pathogenesis, among others. In addition, although the F-type fold is distinctive for FTLs, a structure-based search revealed apparently unrelated proteins with minor sequence similarity to FTLs that displayed the FTLD fold. In general, the phylogenetic analysis of FTLD sequences from viruses to mammals reveals clades that are consistent with the currently accepted taxonomy of extant species. However, the surprisingly discontinuous distribution of FTLDs within each taxonomic category suggests not only an extensive structural/functional diversification of the FTLs along evolutionary lineages but also that this intriguing lectin family has been subject to frequent gene duplication, secondary loss, lateral transfer, and functional co-option.

  5. Complete nucleotide sequence and analysis of two conjugative broad host range plasmids from a marine microbial biofilm.

    Directory of Open Access Journals (Sweden)

    Peter Norberg

    Full Text Available The complete nucleotide sequence of plasmids pMCBF1 and pMCBF6 was determined and analyzed. pMCBF1 and pMCBF6 form a novel clade within the IncP-1 plasmid family designated IncP-1 ς. The plasmids were exogenously isolated earlier from a marine biofilm. pMCBF1 (62 689 base pairs; bp and pMCBF6 (66 729 bp have identical backbones, but differ in their mercury resistance transposons. pMCBF1 carries Tn5053 and pMCBF6 carries Tn5058. Both are flanked by 5 bp direct repeats, typical of replicative transposition. Both insertions are in the vicinity of a resolvase gene in the backbone, supporting the idea that both transposons are "res-site hunters" that preferably insert close to and use external resolvase functions. The similarity of the backbones indicates recent insertion of the two transposons and the ongoing dynamics of plasmid evolution in marine biofilms. Both plasmids also carry the insertion sequence ISPst1, albeit without flanking repeats. ISPs1is located in an unusual site within the control region of the plasmid. In contrast to most known IncP-1 plasmids the pMCBF1/pMCBF6 backbone has no insert between the replication initiation gene (trfA and the vegetative replication origin (oriV. One pMCBF1/pMCBF6 block of about 2.5 kilo bases (kb has no similarity with known sequences in the databases. Furthermore, insertion of three genes with similarity to the multidrug efflux pump operon mexEF and a gene from the NodT family of the tripartite multi-drug resistance-nodulation-division (RND system in Pseudomonas aeruginosa was found. They do not seem to confer antibiotic resistance to the hosts of pMCBF1/pMCBF6, but the presence of RND on promiscuous plasmids may have serious implications for the spread of antibiotic multi-resistance.

  6. Genetic diversity of the captive Asian tapir population in Thailand, based on mitochondrial control region sequence data and the comparison of its nucleotide structure with Brazilian tapir.

    Science.gov (United States)

    Muangkram, Yuttamol; Amano, Akira; Wajjwalku, Worawidh; Pinyopummintr, Tanu; Thongtip, Nikorn; Kaolim, Nongnid; Sukmak, Manakorn; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Maikaew, Umaporn; Thomas, Warisara; Polsrila, Kanda; Dongsaard, Kwanreaun; Sanannu, Saowaphang; Wattananorrasate, Anuwat

    2017-07-01

    The Asian tapir (Tapirus indicus) has been classified as Endangered on the IUCN Red List of Threatened Species (2008). Genetic diversity data provide important information for the management of captive breeding and conservation of this species. We analyzed mitochondrial control region (CR) sequences from 37 captive Asian tapirs in Thailand. Multiple alignments of the full-length CR sequences sized 1268 bp comprised three domains as described in other mammal species. Analysis of 16 parsimony-informative variable sites revealed 11 haplotypes. Furthermore, the phylogenetic analysis using median-joining network clearly showed three clades correlated with our earlier cytochrome b gene study in this endangered species. The repetitive motif is located between first and second conserved sequence blocks, similar to the Brazilian tapir. The highest polymorphic site was located in the extended termination associated sequences domain. The results could be applied for future genetic management based in captivity and wild that shows stable populations.

  7. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins.

    Science.gov (United States)

    Foulk, Michael S; Urban, John M; Casella, Cinzia; Gerbi, Susan A

    2015-05-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na(+) instead of K(+) in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq. © 2015 Foulk et al.; Published by Cold Spring Harbor Laboratory Press.

  8. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps

    Science.gov (United States)

    Drory Retwitzer, Matan; Polishchuk, Maya; Churkin, Elena; Kifer, Ilona; Yakhini, Zohar; Barash, Danny

    2015-01-01

    Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular—no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site. PMID:25940619

  9. Complete nucleotide sequence of Bacillus subtilis (natto) bacteriophage PM1, a phage associated with disruption of food production.

    Science.gov (United States)

    Umene, Kenichi; Shiraishi, Atsushi

    2013-06-01

    "Natto", considered a traditional food, is made by fermenting boiled soybeans with Bacillus subtilis (natto), which is a natto-producing strain related to B. subtilis. The production of natto is disrupted by phage infections of B. subtilis (natto); hence, it is necessary to control phage infections. PM1, a phage of B. subtilis (natto), was isolated during interrupted natto production in a factory. In a previous study, PM1 was classified morphologically into the family Siphoviridae, and its genome, comprising approximately 50 kbp of linear double-stranded DNA, was assumed to be circularly permuted. In the present study, the complete nucleotide sequence of the PM1 genomic DNA of 50,861 bp (41.3 %G+C) was determined, and 86 open reading frames (ORFs) were deduced. Forty-one ORFs of PM1 shared similarities with proteins deduced from the genome of phages reported so far. Twenty-three ORFs of PM1 were associated with functions related to the phage multiplication process of gene control, DNA replication/modification, DNA packaging, morphogenesis, and cell lysis. Bacillus subtilis (natto) produces a capsular polypeptide of glutamate with a γ-linkage (called poly-γ-glutamate), which appears to serve as a physical barrier to phage adsorption. One ORF of PM1 had similarity with a poly-γ-glutamate hydrolase, which is assumed to degrade the capsular barrier to allow phage progenies to infect encapsulated host cells. The genome analysis of PM1 revealed the characteristics of the phage that are consistent as Bacillus subtilis (natto)-infecting phage.

  10. Species composition of the genus Saprolegnia in fin fish aquaculture environments, as determined by nucleotide sequence analysis of the nuclear rDNA ITS regions.

    Science.gov (United States)

    de la Bastide, Paul Y; Leung, Wai Lam; Hintz, William E

    2015-01-01

    The ITS region of the rDNA gene was compared for Saprolegnia spp. in order to improve our understanding of nucleotide sequence variability within and between species of this genus, determine species composition in Canadian fin fish aquaculture facilities, and to assess the utility of ITS sequence variability in genetic marker development. From a collection of more than 400 field isolates, ITS region nucleotide sequences were studied and it was determined that there was sufficient consistent inter-specific variation to support the designation of species identity based on ITS sequence data. This non-subjective approach to species identification does not rely upon transient morphological features. Phylogenetic analyses comparing our ITS sequences and species designations with data from previous studies generally supported the clade scheme of Diéguez-Uribeondo et al. (2007) and found agreement with the molecular taxonomic cluster system of Sandoval-Sierra et al. (2014). Our Canadian ITS sequence collection will thus contribute to the public database and assist the clarification of Saprolegnia spp. taxonomy. The analysis of ITS region sequence variability facilitated genus- and species-level identification of unknown samples from aquaculture facilities and provided useful information on species composition. A unique ITS-RFLP for the identification of S. parasitica was also described. Copyright © 2014 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  11. Cations form sequence selective motifs within DNA grooves via a combination of cation-pi and ion-dipole/hydrogen bond interactions.

    Science.gov (United States)

    Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori

    2013-01-01

    The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl⁺) and the polarized first hydration shell waters of divalent cations (Mg²⁺, Ca²⁺) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves.

  12. Identification of cyclic nucleotide gated channels using regular expressions

    KAUST Repository

    Zelman, Alice K.

    2013-09-03

    Cyclic nucleotide-gated channels (CNGCs) are nonselective cation channels found in plants, animals, and some bacteria. They have a six-transmembrane/one- pore structure, a cytosolic cyclic nucleotide-binding domain, and a cytosolic calmodulin-binding domain. Despite their functional similarities, the plant CNGC family members appear to have different conserved amino acid motifs within corresponding functional domains than animal and bacterial CNGCs do. Here we describe the development and application of methods employing plant CNGC-specific sequence motifs as diagnostic tools to identify novel candidate channels in different plants. These methods are used to evaluate the validity of annotations of putative orthologs of CNGCs from plant genomes. The methods detail how to employ regular expressions of conserved amino acids in functional domains of annotated CNGCs and together with Web tools such as PHI-BLAST and ScanProsite to identify novel candidate CNGCs in species including Physcomitrella patens. © Springer Science+Business Media New York 2013.

  13. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  14. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage...

  15. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    Science.gov (United States)

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of

  16. The proviral genome of radiation leukemia virus: Molecular cloning, nucleotide sequence of its long terminal repeat and integration in lymphoma cell DNA

    International Nuclear Information System (INIS)

    Janowski, M.; Merregaert, J.; Boniver, J.; Maisin, J.R.

    1985-01-01

    The proviral genome of a thymotropic and leukemogenic C57BL/Ka mouse retrovirus, RadLV/VL/sub 3/(T+L+), was cloned as a biologically active PstI insert in the bacterial plasmid pBR322. Its restriction map was compared to those, already known, of two nonthymotropic and nonleukemogenic viruses of the same mouse strain, the ecotropic BL/Ka(B) and the xenotropic constituent of the radiation leukemia virus complex (RadLV). Differences were observed in the pol gene and in the env gene. Moreover, the nucleotide sequence of the RadLV/VL/sub 3/(T+L+) long terminal repeat revealed the existence of two copies of a 42 bp long sequence, separated by 11 nucleotides and of which BL/Ka(B) possesses only one copy

  17. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome.

    Science.gov (United States)

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-09-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.

  18. Genotyping of human parvovirus B19 in clinical samples from Brazil and Paraguay using heteroduplex mobility assay, single-stranded conformation polymorphism and nucleotide sequencing

    Directory of Open Access Journals (Sweden)

    Marcos César Lima de Mendonça

    2011-06-01

    Full Text Available Heteroduplex mobility assay, single-stranded conformation polymorphism and nucleotide sequencing were utilised to genotype human parvovirus B19 samples from Brazil and Paraguay. Ninety-seven serum samples were collected from individuals presenting with abortion or erythema infectiosum, arthropathies, severe anaemia and transient aplastic crisis; two additional skin samples were collected by biopsy. After the procedure, all clinical samples were classified as genotype 1.

  19. Complete nucleotide sequence of the self-transmissible TOL plasmid pD2RT provides new insight into arrangement of toluene catabolic plasmids

    DEFF Research Database (Denmark)

    Jutkina, Jekaterina; Hansen, Lars Hestbjerg; Li, Lili

    2013-01-01

    In the present study we report the complete nucleotide sequence of the toluene catabolic plasmid pD2RT of Pseudomonas migulae strain D2RT isolated from Baltic Sea water. The pD2RT is 129,894 base pairs in size with an average G+ C content of 53.75%. A total of 135 open reading frames (ORFs) were ...

  20. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    Science.gov (United States)

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  1. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

    DEFF Research Database (Denmark)

    Foulk, M. S.; Urban, J. M.; Casella, Cinzia

    2015-01-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (lambda-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent...... strands intact. We used genomics and biochemical approaches to determine if lambda-exo digests all parental DNA sequences equally. We report that lambda-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, lambda-exo digestion of nonreplicating genomic DNA (LexoG0) enriches...... GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent lambda-exo biases in NSseq and validated this approach at the rDNA locus. The lambda-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s...

  2. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  3. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  4. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

    DEFF Research Database (Denmark)

    Liu, Siyang; Huang, Shujia; Rao, Junhua

    2015-01-01

    present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...

  5. Conserved binding of GCAC motifs by MEC-8, couch potato, and the RBPMS protein family

    Science.gov (United States)

    Soufari, Heddy

    2017-01-01

    Precise regulation of mRNA processing, translation, localization, and stability relies on specific interactions with RNA-binding proteins whose biological function and target preference are dictated by their preferred RNA motifs. The RBPMS family of RNA-binding proteins is defined by a conserved RNA recognition motif (RRM) domain found in metazoan RBPMS/Hermes and RBPMS2, Drosophila couch potato, and MEC-8 from Caenorhabditis elegans. In order to determine the parameters of RNA sequence recognition by the RBPMS family, we have first used the N-terminal domain from MEC-8 in binding assays and have demonstrated a preference for two GCAC motifs optimally separated by >6 nucleotides (nt). We have also determined the crystal structure of the dimeric N-terminal RRM domain from MEC-8 in the unbound form, and in complex with an oligonucleotide harboring two copies of the optimal GCAC motif. The atomic details reveal the molecular network that provides specificity to all four bases in the motif, including multiple hydrogen bonds to the initial guanine. Further studies with human RBPMS, as well as Drosophila couch potato, confirm a general preference for this double GCAC motif by other members of the protein family and the presence of this motif in known targets. PMID:28003515

  6. Nucleotide sequence analysis of the recA gene and discrimination of the three isolates of urease-positive thermophilic Campylobacter (UPTC) isolated from seagulls (Larus spp.) in Northern Ireland.

    Science.gov (United States)

    Matsuda, M; Tai, K; Moore, J E; Millar, B C; Murayama, O

    2004-01-01

    Nucleotide sequencing after TA cloning of the amplicon of the almost-full length recA gene from three strains of UPTC (A1, A2, and A3) isolated from seagulls in Northern Ireland, the phenotypical and genotypical characteristics of which have been demonstrated to be indistinguishable, clarified nucleotide differences at three nucleotide positions among the three strains. In conclusion, the nucleotide sequences of the recA gene were found to discriminate among the three strains of UPTC, A1, A2, and A3, which are indistinguishable phenotypically and genotypically. Thus, the present study strongly suggests that nucleotide sequence data of the amplicon of a suitable gene or region could aid in discriminating among isolates of the UPTC group, which are indistinguishable phenotypically and genotypically. Copyright 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  7. Complete nucleotide sequence of CTX-M-15-plasmids from clinical Escherichia coli isolates: insertional events of transposons and insertion sequences.

    Directory of Open Access Journals (Sweden)

    Annemieke Smet

    Full Text Available BACKGROUND: CTX-M-producing Escherichia coli strains are regarded as major global pathogens. METHODOLOGY/PRINCIPAL FINDINGS: The nucleotide sequence of three plasmids (pEC_B24: 73801-bp; pEC_L8: 118525-bp and pEC_L46: 144871-bp from Escherichia coli isolates obtained from patients with urinary tract infections and one plasmid (pEC_Bactec: 92970-bp from an Escherichia coli strain isolated from the joint of a horse with arthritis were determined. Plasmid pEC_Bactec belongs to the IncI1 group and carries two resistance genes: bla(TEM-1 and bla(CTX-M-15. It shares more than 90% homology with a previously published bla(CTX-M-plasmid from E. coli of human origin. Plasmid pEC_B24 belongs to the IncFII group whereas plasmids pEC_L8 and pEC_L46 represent a fusion of two replicons of type FII and FIA. On the pEC_B24 backbone, two resistance genes, bla(TEM-1 and bla(CTX-M-15, were found. Six resistance genes, bla(TEM-1, bla(CTX-M-15, bla(OXA-1, aac6'-lb-cr, tetA and catB4, were detected on the pEC_L8 backbone. The same antimicrobial drug resistance genes, with the exception of tetA, were also identified on the pEC_L46 backbone. Genome analysis of all 4 plasmids studied provides evidence of a seemingly frequent transposition event of the bla(CTX-M-15-ISEcp1 element. This element seems to have a preferred insertion site at the tnpA gene of a bla(TEM-carrying Tn3-like transposon, the latter itself being inserted by a transposition event. The IS26-composite transposon, which contains the bla(OXA-1, aac6'-lb-cr and catB4 genes, was inserted into plasmids pEC_L8 and pEC_L46 by homologous recombination rather than a transposition event. Results obtained for pEC_L46 indicated that IS26 also plays an important role in structural rearrangements of the plasmid backbone and seems to facilitate the mobilisation of fragments from other plasmids. CONCLUSIONS: Collectively, these data suggests that IS26 together with ISEcp1 could play a critical role in the evolution of

  8. Main: Nucleotide Analysis [KOME

    Lifescience Database Archive (English)

    Full Text Available Nucleotide Analysis Japonica genome blast search result Result of blastn search against jap...onica genome sequence kome_japonica_genome_blast_search_result.zip kome_japonica_genome_blast_search_result ...

  9. AFLP fragment isolation technique as a method to produce random sequences for single nucleotide polymorphism discovery in the green turtle, Chelonia mydas.

    Science.gov (United States)

    Roden, Suzanne E; Dutton, Peter H; Morin, Phillip A

    2009-01-01

    The green sea turtle, Chelonia mydas, was used as a case study for single nucleotide polymorphism (SNP) discovery in a species that has little genetic sequence information available. As green turtles have a complex population structure, additional nuclear markers other than microsatellites could add to our understanding of their complex life history. Amplified fragment length polymorphism technique was used to generate sets of random fragments of genomic DNA, which were then electrophoretically separated with precast gels, stained with SYBR green, excised, and directly sequenced. It was possible to perform this method without the use of polyacrylamide gels, radioactive or fluorescent labeled primers, or hybridization methods, reducing the time, expense, and safety hazards of SNP discovery. Within 13 loci, 2547 base pairs were screened, resulting in the discovery of 35 SNPs. Using this method, it was possible to yield a sufficient number of loci to screen for SNP markers without the availability of prior sequence information.

  10. A Chromosome 7 Pericentric Inversion Defined at Single-Nucleotide Resolution Using Diagnostic Whole Genome Sequencing in a Patient with Hand-Foot-Genital Syndrome.

    Science.gov (United States)

    Watson, Christopher M; Crinnion, Laura A; Harrison, Sally M; Lascelles, Carolina; Antanaviciute, Agne; Carr, Ian M; Bonthron, David T; Sheridan, Eamonn

    2016-01-01

    Next generation sequencing methodologies are facilitating the rapid characterisation of novel structural variants at nucleotide resolution. These approaches are particularly applicable to variants initially identified using alternative molecular methods. We report a child born with bilateral postaxial syndactyly of the feet and bilateral fifth finger clinodactyly. This was presumed to be an autosomal recessive syndrome, due to the family history of consanguinity. Karyotype analysis revealed a homozygous pericentric inversion of chromosome 7 (46,XX,inv(7)(p15q21)x2) which was confirmed to be heterozygous in both unaffected parents. Since the resolution of the karyotype was insufficient to identify any putatively causative gene, we undertook medium-coverage whole genome sequencing using paired-end reads, in order to elucidate the molecular breakpoints. In a two-step analysis, we first narrowed down the region by identifying discordant read-pairs, and then determined the precise molecular breakpoint by analysing the mapping locations of "soft-clipped" breakpoint-spanning reads. PCR and Sanger sequencing confirmed the identified breakpoints, both of which were located in intergenic regions. Significantly, the 7p15 breakpoint was located 523 kb upstream of HOXA13, the locus for hand-foot-genital syndrome. By inference from studies of HOXA locus control in the mouse, we suggest that the inversion has delocalised a HOXA13 enhancer to produce the phenotype observed in our patient. This study demonstrates how modern genetic diagnostic approach can characterise structural variants at nucleotide resolution and provide potential insights into functional regulation.

  11. Complete nucleotide sequence of pGA45, a 140,698-bp incFIIY plasmid encoding blaIMI-3-mediated carbapenem resistance, from river sediment

    Directory of Open Access Journals (Sweden)

    Bingjun eDang

    2016-02-01

    Full Text Available Plasmid pGA45 was isolated from the sediment of Haihe River using E. coli CV601 (gfp-tagged as recipients and indigenous bacteria from sediment as donors. This plasmid confers reduced susceptibility to imipenem which belongs to carbapenem group. Plasmid pGA45 was fully sequenced on an Illumina HiSeq 2000 sequencing system. The complete sequence of plasmid pGA45 was 140,698 bp in length with an average G+C content of 52.03%. Sequence analysis shows that pGA45 belongs to incFIIY group and harbors a backbone region shares high homology and gene synteny to several other incF plasmids including pNDM1_EC14653, pYDC644, pNDM-Ec1GN574, pRJF866, pKOX_NDM1 and pP10164-NDM. In addition to the backbone region, plasmid pGA45 harbors two notable features including one blaIMI-3-containing region and one type VI secretion system region. The blaIMI-3-containing region is responsible for bacteria carbapenem resistance and the type VI secretion system region is probably involved in bacteria virulence, respectively. Plasmid pGA45 represents the first complete nucleotide sequence of the blaIMI-harboring plasmid from environment sample and the sequencing of this plasmid provided insight into the architecture used for the dissemination of blaIMI carbapenemase genes.

  12. Comparative anatomy of the human APRT gene and enzyme: nucleotide sequence divergence and conservation of a nonrandom CpG dinucleotide arrangement

    International Nuclear Information System (INIS)

    Broderick, T.P.; Schaff, D.A.; Bertino, A.M.; Dush, M.K.; Tischfield, J.A.; Stambrook, P.J.

    1987-01-01

    The functional human adenine phosphoribosyltransferase (APRT) gene is <2.6 kilobases in length and contains five exons. The amino acid sequences of APRTs have been highly conserved throughout evolution. The human enzyme is 82%, 90%, and 40% identical to the mouse, hamster, and Escherichia coli enzymes, respectively. The promoter region of the human APRT gene, like that of several other housekeeping genes, lacks TATA and CCAAT boxes but contains five GC boxes that are potential binding sites for the Sp1 transcription factor. The distal three, however, are dispensable for gene expression. Comparison between human and mouse APRT gene nucleotide sequences reveals a high degree of homology within protein coding regions but an absence of significant homology in 5' flanking, 3' untranslated, and intron sequences, except for similarly positioned GC boxes in the promoter region and a 26-base-pair region in intron 3. This 26-base-pair sequence is 92% identical with a similarly positioned sequence in the mouse gene and is also found in intron 3 of the hamster gene, suggesting that its retention may be a consequence of stringent selection. The positions of all introns have been precisely retained in the human and both rodent genes. Retention of an elevated CpG dinucleotide content, despite loss of sequence homology, suggests that there may be selection for CpG dinucleotides in these regions and that their maintenance may be important for APRT gene function

  13. Karyological characterization and identification of four repetitive element groups (the 18S – 28S rRNA gene, telomeric sequences, microsatellite repeat motifs, Rex retroelements) of the Asian swamp eel (Monopterus albus)

    Science.gov (United States)

    Suntronpong, Aorarat; Thapana, Watcharaporn; Twilprawat, Panupon; Prakhongcheep, Ornjira; Somyong, Suthasinee; Muangmai, Narongrit; Surin Peyachoknagul; Srikulnath, Kornsorn

    2017-01-01

    Abstract Among teleost fishes, Asian swamp eel (Monopterus albus Zuiew, 1793) possesses the lowest chromosome number, 2n = 24. To characterize the chromosome constitution and investigate the genome organization of repetitive sequences in M. albus, karyotyping and chromosome mapping were performed with the 18S – 28S rRNA gene, telomeric repeats, microsatellite repeat motifs, and Rex retroelements. The 18S – 28S rRNA genes were observed to the pericentromeric region of chromosome 4 at the same position with large propidium iodide and C-positive bands, suggesting that the molecular structure of the pericentromeric regions of chromosome 4 has evolved in a concerted manner with amplification of the 18S – 28S rRNA genes. (TTAGGG)n sequences were found at the telomeric ends of all chromosomes. Eight of 19 microsatellite repeat motifs were dispersedly mapped on different chromosomes suggesting the independent amplification of microsatellite repeat motifs in M. albus. Monopterus albus Rex1 (MALRex1) was observed at interstitial sites of all chromosomes and in the pericentromeric regions of most chromosomes whereas MALRex3 was scattered and localized to all chromosomes and MALRex6 to several chromosomes. This suggests that these retroelements were independently amplified or lost in M. albus. Among MALRexs (MALRex1, MALRex3, and MALRex6), MALRex6 showed higher interspecific sequence divergences from other teleost species in comparison. This suggests that the divergence of Rex6 sequences of M. albus might have occurred a relatively long time ago. PMID:29093797

  14. Nucleotide sequences of two cellulase genes from alkalophilic Bacillus sp. strain N-4 and their strong homology.

    OpenAIRE

    Fukumori, F; Sashihara, N; Kudo, T; Horikoshi, K

    1986-01-01

    Two genes for cellulases of alkalophilic Bacillus sp. strain N-4 (ATCC 21833) have been sequenced. From the DNA sequences the cellulases encoded in the plasmids pNK1 and pNK2 consist of 488 and 409 amino acids, respectively. The DNA and protein sequences of the pNK1-encoded cellulase are related to those of the pNK2-encoded cellulase. The pNK2-encoded cellulase lacks the direct repeat sequence of a stretch of 60 amino acids near the C-terminal end of the pNK1-encoded cellulase. The duplicatio...

  15. MSDmotif: exploring protein sites and motifs

    Directory of Open Access Journals (Sweden)

    Henrick Kim

    2008-07-01

    Full Text Available Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.

  16. Target motifs affecting natural immunity by a constitutive CRISPR-Cas system in Escherichia coli.

    Directory of Open Access Journals (Sweden)

    Cristóbal Almendros

    Full Text Available Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR and CRISPR associated (cas genes conform the CRISPR-Cas systems of various bacteria and archaea and produce degradation of invading nucleic acids containing sequences (protospacers that are complementary to repeat intervening spacers. It has been demonstrated that the base sequence identity of a protospacer with the cognate spacer and the presence of a protospacer adjacent motif (PAM influence CRISPR-mediated interference efficiency. By using an original transformation assay with plasmids targeted by a resident spacer here we show that natural CRISPR-mediated immunity against invading DNA occurs in wild type Escherichia coli. Unexpectedly, the strongest activity is observed with protospacer adjoining nucleotides (interference motifs that differ from the PAM both in sequence and location. Hence, our results document for the first time native CRISPR activity in E. coli and demonstrate that positions next to the PAM in invading DNA influence their recognition and degradation by these prokaryotic immune systems.

  17. Adenovirus fibre shaft sequences fold into the native triple beta-spiral fold when N-terminally fused to the bacteriophage T4 fibritin foldon trimerisation motif.

    Science.gov (United States)

    Papanikolopoulou, Katerina; Teixeira, Susana; Belrhali, Hassan; Forsyth, V Trevor; Mitraki, Anna; van Raaij, Mark J

    2004-09-03

    Adenovirus fibres are trimeric proteins that consist of a globular C-terminal domain, a central fibrous shaft and an N-terminal part that attaches to the viral capsid. In the presence of the globular C-terminal domain, which is necessary for correct trimerisation, the shaft segment adopts a triple beta-spiral conformation. We have replaced the head of the fibre by the trimerisation domain of the bacteriophage T4 fibritin, the foldon. Two different fusion constructs were made and crystallised, one with an eight amino acid residue linker and one with a linker of only two residues. X-ray crystallographic studies of both fusion proteins shows that residues 319-391 of the adenovirus type 2 fibre shaft fold into a triple beta-spiral fold indistinguishable from the native structure, although this is now resolved at a higher resolution of 1.9 A. The foldon residues 458-483 also adopt their natural structure. The intervening linkers are not well ordered in the crystal structures. This work shows that the shaft sequences retain their capacity to fold into their native beta-spiral fibrous fold when fused to a foreign C-terminal trimerisation motif. It provides a structural basis to artificially trimerise longer adenovirus shaft segments and segments from other trimeric beta-structured fibre proteins. Such artificial fibrous constructs, amenable to crystallisation and solution studies, can offer tractable model systems for the study of beta-fibrous structure. They can also prove useful for gene therapy and fibre engineering applications.

  18. The nucleotide sequence of the RNA-2 of an isolate of the English serotype of tomato black ring virus: RNA recombination in the history of nepoviruses.

    Science.gov (United States)

    Le Gall, O L; Lanneau, M; Candresse, T; Dunez, J

    1995-05-01

    The RNA-2 of a carrot isolate from the English serotype of tomato black ring nepovirus (TBRV-ED) has been sequenced. It is 4618 nucleotides long and contains one open reading frame encoding a polypeptide of 1344 amino acids. The 5' non-coding region contains three repetitions of a stem-loop structure also conserved in TBRV-Scottish and grapevine chrome mosaic nepovirus (GCMV). The coat protein domain was mapped to the carboxy-terminal one-third of the polyprotein. Sequence comparisons indicate that TBRV-ED RNA-2 probably arose by an RNA recombination event that resulted in the exchange of the putative movement protein gene between TBRV and GCMV.

  19. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

    Science.gov (United States)

    Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

    2018-04-01

    Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  20. Molecular characterization and phylogenetic analysis of Explanatum explanatum in India based on nucleotide sequences of ribosomal ITS2 and the mitochondrial gene nad1.

    Science.gov (United States)

    Hayashi, Kei; Mohanta, Uday K; Ohari, Yuma; Neeraja, Tambireddy; Singh, T Shantikumar; Sugiyama, Hiromu; Itagaki, Tadashi

    2016-12-01

    The aim of this study was to analyze the phylogenetic relationship between Explanatum explanatum populations in India and other countries of the Indian subcontinent. Seventy liver amphistomes collected from four localities in India were identified as E. explanatum based on the nucleotide sequences of ribosomal ITS2. The flukes were then analyzed phylogenetically based on the nucleotide sequence of the mitochondrial gene nad1 in comparison with flukes from Bangladesh and Nepal. In the resulting phylogenetic tree, the nad1 haplotypes from India were divided into four clades, and the flukes showing the haplotypes of clades A and C were predominant in India. The haplotypes of the clades A and C have also been detected in Bangladesh and Nepal, and therefore, it seems they occur commonly throughout the Indian subcontinent. The results of AMOVA suggested that gene flow was likely to occur between E. explanatum populations in these countries. These countries are geographically close and have been historically and culturally connected to each other, and therefore, the movements of host ruminants among these countries might have been involved in the migration of the flukes and their gene flow.

  1. The nucleotide sequence of RNA1 of Lettuce big-vein virus, genus Varicosavirus, reveals its relation to nonsegmented negative-strand RNA viruses.

    Science.gov (United States)

    Sasaya, Takahide; Ishikawa, Koichi; Koganezawa, Hiroki

    2002-06-05

    The complete nucleotide sequence of RNA1 from Lettuce big-vein virus (LBVV), the type member of the genus Varicosavirus, was determined. LBVV RNA1 consists of 6797 nucleotides and contains one large ORF that encodes a large (L) protein of 2040 amino acids with a predicted M(r) of 232,092. Northern blot hybridization analysis indicated that the LBVV RNA1 is a negative-sense RNA. Database searches showed that the amino acid sequence of L protein is homologous to those of L polymerases of nonsegmented negative-strand RNA viruses. A cluster dendrogram derived from alignments of the LBVV L protein and the L polymerases indicated that the L protein is most closely related to the L polymerases of plant rhabdoviruses. Transcription termination/polyadenylation signal-like poly(U) tracts that resemble those in rhabdovirus and paramyxovirus RNAs were present upstream and downstream of the coding region. Although LBVV is related to rhabdoviruses, a key distinguishing feature is that the genome of LBVV is segmented. The results reemphasize the need to reconsider the taxonomic position of varicosaviruses.

  2. Detection of de novo single nucleotide variants in offspring of atomic-bomb survivors close to the hypocenter by whole-genome sequencing.

    Science.gov (United States)

    Horai, Makiko; Mishima, Hiroyuki; Hayashida, Chisa; Kinoshita, Akira; Nakane, Yoshibumi; Matsuo, Tatsuki; Tsuruda, Kazuto; Yanagihara, Katsunori; Sato, Shinya; Imanishi, Daisuke; Imaizumi, Yoshitaka; Hata, Tomoko; Miyazaki, Yasushi; Yoshiura, Koh-Ichiro

    2018-03-01

    Ionizing radiation released by the atomic bombs at Hiroshima and Nagasaki, Japan, in 1945 caused many long-term illnesses, including increased risks of malignancies such as leukemia and solid tumours. Radiation has demonstrated genetic effects in animal models, leading to concerns over the potential hereditary effects of atomic bomb-related radiation. However, no direct analyses of whole DNA have yet been reported. We therefore investigated de novo variants in offspring of atomic-bomb survivors by whole-genome sequencing (WGS). We collected peripheral blood from three trios, each comprising a father (atomic-bomb survivor with acute radiation symptoms), a non-exposed mother, and their child, none of whom had any past history of haematological disorders. One trio of non-exposed individuals was included as a control. DNA was extracted and the numbers of de novo single nucleotide variants in the children were counted by WGS with sequencing confirmation. Gross structural variants were also analysed. Written informed consent was obtained from all participants prior to the study. There were 62, 81, and 42 de novo single nucleotide variants in the children of atomic-bomb survivors, compared with 48 in the control trio. There were no gross structural variants in any trio. These findings are in accord with previously published results that also showed no significant genetic effects of atomic-bomb radiation on second-generation survivors.

  3. Complete nucleotide sequence of the multidrug resistance IncA/C plasmid pR55 from Klebsiella pneumoniae isolated in 1969.

    Science.gov (United States)

    Doublet, Benoît; Boyd, David; Douard, Gregory; Praud, Karine; Cloeckaert, Axel; Mulvey, Michael R

    2012-10-01

    To determine the complete nucleotide sequence of the multidrug resistance IncA/C plasmid pR55 from a clinical Klebsiella pneumoniae strain that was isolated from a urinary tract infection in 1969 in a French hospital and compare it with those of contemporary emerging IncA/C plasmids. The plasmid was purified and sequenced using a 454 sequencing approach. After draft assembly, additional PCRs and walking reads were performed for gap closure. Sequence comparisons and multiple alignments with other IncA/C plasmids were done using the BLAST algorithm and CLUSTAL W, respectively. Plasmid pR55 (170 810 bp) revealed a shared plasmid backbone (>99% nucleotide identity) with current members of the IncA/C(2) multidrug resistance plasmid family that are widely disseminating antibiotic resistance genes. Nevertheless, two specific multidrug resistance gene arrays probably acquired from other genetic elements were identified inserted at conserved hotspot insertion sites in the IncA/C backbone. A novel transposon named Tn6187 showed an atypical mixed transposon configuration composed of two mercury resistance operons and two transposition modules that are related to Tn21 and Tn1696, respectively, and an In0-type integron. IncA/C(2) multidrug resistance plasmids have a broad host range and have been implicated in the dissemination of antibiotic resistance among Enterobacteriaceae from humans and animals. This typical IncA/C(2) genetic scaffold appears to carry various multidrug resistance gene arrays and is now also a successful vehicle for spreading AmpC-like cephalosporinase and metallo-β-lactamase genes, such as bla(CMY) and bla(NDM), respectively.

  4. Nucleotide sequence analysis of HTLV-I isolated from cerebrospinal fluid of a patient with TSP/HAM: comparison to other HTLV-I isolates.

    Science.gov (United States)

    Mukhopadhyaya, R; Sadaie, M R

    1993-02-01

    Human T-cell leukemia virus type I (HTLV-I) has been associated with adult T-cell leukemia/lymphoma and the chronic neurologic disorder tropical spastic paraparesis/HTLV-I-associated myelopathy (TSP/HAM). To study the genetic structure of the virus associated with TSP/HAM, we have obtained and sequenced a partial genomic clone from an HTLV-I-positive cell line established from cerebrospinal fluid (CSF) of a Jamaican patient with TSP/HAM. This clone consisted of a 4.3-kb viral sequence containing the 5' long terminal repeat (LTR), gag, and N-terminal portion of the pol gene, with an overall 1.3% sequence variation resulting from mostly nucleotide substitutions, as compared to the prototype HTLV-I ATK-1. The gag and pol regions showed only 1.4% and 1.2% nucleotide variations, respectively. However, the U3 region of the LTR showed the highest sequence variation (3.6%), where several changes appear to be common among certain TSP/HAM isolates. Several of these changes reside within the 21-bp boundaries and the Tax-responsive element. It would be important to determine if the observed changes are sufficient to cause neurologic disorders similar to the murine leukemia virus system or simply reflect the divergent pool of HTLV-I from different geographic locations. At this time, we cannot rule out the possibility that the observed changes have either direct or indirect significance for the HTLV-I pathogenesis in TSP/HAM.

  5. Discovery and validation of information theory-based transcription factor and cofactor binding site motifs.

    Science.gov (United States)

    Lu, Ruipeng; Mucaki, Eliseos J; Rogan, Peter K

    2017-03-17

    Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Nucleotide sequences of cDNAs for human papillomavirus type 18 transcripts in HeLa cells

    International Nuclear Information System (INIS)

    Inagaki, Yutaka; Tsunokawa, Youko; Takebe, Naoko; Terada, Masaaki; Sugimura, Takashi; Nawa, Hiroyuki; Nakanishi, Shigetada

    1988-01-01

    HeLa cells expressed 3.4- and 1.6-kilobase (kb) transcripts of the integrated human papillomavirus (HPV) type 18 genome. Two types of cDNA clones representing each size of HPV type 18 transcript were isolated. Sequence analysis of these two types of cDNA clones revealed that the 3.4-kb transcript contained E6, E7, the 5' portion of E1, and human sequence and that the 1.6-kb transcript contained spliced and frameshifted E6 (E6 * ), E7, and human sequence. There was a common human sequence containing a poly(A) addition signal in the 3' end portions of both transcripts, indicating that they were transcribed from the HPV genome at the same integration site with different splicing. Furthermore, the 1.6-kb transcript contained both of the two viral TATA boxes upstream of E6, strongly indicating that a cellular promoter was used for its transcription

  7. Nucleotide sequence analysis of the Legionella micdadei mip gene, encoding a 30-kilodalton analog of the Legionella pneumophila Mip protein

    DEFF Research Database (Denmark)

    Bangsborg, Jette Marie; Cianciotto, N P; Hindersson, P

    1991-01-01

    After the demonstration of analogs of the Legionella pneumophila macrophage infectivity potentiator (Mip) protein in other Legionella species, the Legionella micdadei mip gene was cloned and expressed in Escherichia coli. DNA sequence analysis of the L. micdadei mip gene contained in the plasmid p...... homology with the mip-like genes of several Legionella species. Furthermore, amino acid sequence comparisons revealed significant homology to two eukaryotic proteins with isomerase activity (FK506-binding proteins)....

  8. Complete nucleotide sequence and genome analysis of bacteriophage BFK20 — A lytic phage of the industrial producer Brevibacterium flavum

    Czech Academy of Sciences Publication Activity Database

    Bukovska, G.; Klucar, L.; Vlček, Čestmír; Adamovic, J.; Turna, J.; Timko, J.

    2006-01-01

    Roč. 348, č. 1 (2006), s. 57-71 ISSN 0042-6822 Grant - others:Slovenská akademie věd(SK) VEGA2/5068/25; Science and Technology Assistance Agency(SK) APVT-51-025004 Institutional research plan: CEZ:AV0Z50520514 Keywords : Bacteriophage * Complete genome sequence * Sequence analysis Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.525, year: 2006

  9. Extended region of nodulation genes in Rhizobium meliloti 1021. II. Nucleotide sequence, transcription start sites and protein products

    International Nuclear Information System (INIS)

    Fisher, R.F.; Swanson, J.A.; Mulligan, J.T.; Long, S.R.

    1987-01-01

    The authors have established the DNA sequence and analyzed the transcription and translation products of a series of putative nodulation (nod) genes in Rhizobium meliloti strain 1021. Four loci have been designated nodF, nodE, nodG and nodH. The correlation of transposon insertion positions with phenotypes and open reading frames was confirmed by sequencing the insertion junctions of the transposons. The protein products of these nod genes were visualized by in vitro expression of cloned DNA segments in a R. meliloti transcription-translation system. In addition, the sequence for nodG was substantiated by creating translational fusions in all three reading frames at several points in the sequence; the resulting fusions were expressed in vitro in both E. coli and R. meliloti transcription-translation systems. A DNA segment bearing several open reading frames downstream of nodG corresponds to the putative nod gene mutated in strain nod-216. The transcription start sites of nodF and nodH were mapped by primer extension of RNA from cells induced with the plant flavone, luteolin. Initiation of transcription occurs approximately 25 bp downstream from the conserved sequence designated the nod box, suggesting that this conserved sequence acts as an upstream regulator of inducible nod gene expression. Its distance from the transcription start site is more suggestive of an activator binding site rather than an RNA polymerase binding site

  10. Strand bias in complementary single-nucleotide polymorphisms of transcribed human sequences: evidence for functional effects of synonymous polymorphisms

    Directory of Open Access Journals (Sweden)

    Majewski Jacek

    2006-08-01

    Full Text Available Abstract Background Complementary single-nucleotide polymorphisms (SNPs may not be distributed equally between two DNA strands if the strands are functionally distinct, such as in transcribed genes. In introns, an excess of A↔G over the complementary C↔T substitutions had previously been found and attributed to transcription-coupled repair (TCR, demonstrating the valuable functional clues that can be obtained by studying such asymmetry. Here we studied asymmetry of human synonymous SNPs (sSNPs in the fourfold degenerate (FFD sites as compared to intronic SNPs (iSNPs. Results The identities of the ancestral bases and the direction of mutations were inferred from human-chimpanzee genomic alignment. After correction for background nucleotide composition, excess of A→G over the complementary T→C polymorphisms, which was observed previously and can be explained by TCR, was confirmed in FFD SNPs and iSNPs. However, when SNPs were separately examined according to whether they mapped to a CpG dinucleotide or not, an excess of C→T over G→A polymorphisms was found in non-CpG site FFD SNPs but was absent from iSNPs and CpG site FFD SNPs. Conclusion The genome-wide discrepancy of human FFD SNPs provides novel evidence for widespread selective pressure due to functional effects of sSNPs. The similar asymmetry pattern of FFD SNPs and iSNPs that map to a CpG can be explained by transcription-coupled mechanisms, including TCR and transcription-coupled mutation. Because of the hypermutability of CpG sites, more CpG site FFD SNPs are relatively younger and have confronted less selection effect than non-CpG FFD SNPs, which can explain the asymmetric discrepancy of CpG site FFD SNPs vs. non-CpG site FFD SNPs.

  11. Nucleotide sequences of the Erwinia chrysanthemi ogl and pelE genes negatively regulated by the kdgR gene product.

    Science.gov (United States)

    Reverchon, S; Huang, Y; Bourson, C; Robert-Baudouy, J

    1989-12-21

    The nucleotide sequences of the coding and regulatory regions of the genes encoding oligoglacturonate lyase (OGL) and pectate lyase e isoenzyme (PLe) from Erwinia chrysanthemi 3937 were determined. The ogl sequence contains an open reading frame (ORF) of 1164 bp coding for a 388-amino acid (aa) polypeptide with a predicted Mr of 44,124. A possible transcriptional start signal showing homology with the Escherichia coli promoter consensus sequence was detected. In addition, a sequence 3' to the coding region was found to be able to form a secondary structure which may function as an Rho-independent transcriptional termination signal. For the pelE sequence, a long ORF of 1212 bp coding for a 404-aa polypeptide was detected. PLe is secreted into the external medium by E. chrysanthemi, and a potential signal peptide sequence was identified in the pelE gene. In the 5' upstream pelE coding region, a putative promoter resembling E. coli promoter consensus sequences was detected. Furthermore, the region immediately 3' to the pelE translational stop codon may function as an Rho-independent translational termination signal. In strain 3937, the synthesis of OGL and PLe, as well as the other enzymes involved in the pectin-degradative pathway (particularly the kdgT product), are known to be regulated by the KdgR repressor, which mediates galacturonate and polygalacturonate induction. Synthesis of these enzymes is also regulated by the CRP-cAMP complex which mediates catabolite repression. Analysis of the regulatory regions of ogl and pelE allowed us to identify possible CRP-binding sites for these two genes.(ABSTRACT TRUNCATED AT 250 WORDS)

  12. Nucleotide sequence of the gene coding for human factor VII, a vitamin K-dependent protein participating in blood coagulation

    International Nuclear Information System (INIS)

    O'Hara, P.J.; Grant, F.J.; Haldeman, B.A.; Gray, C.L.; Insley, M.Y.; Hagen, F.S.; Murray, M.J.

    1987-01-01

    Activated factor VII (factor VIIa) is a vitamin K-dependent plasma serine protease that participates in a cascade of reactions leading to the coagulation of blood. Two overlapping genomic clones containing sequences encoding human factor VII were isolated and characterized. The complete sequence of the gene was determined and found to span about 12.8 kilobases. The mRNA for factor VII as demonstrated by cDNA cloning is polyadenylylated at multiple sites but contains only one AAUAAA poly(A) signal sequence. The mRNA can undergo alternative splicing, forming one transcript containing eight segments as exons and another with an additional exon that encodes a larger prepro leader sequence. The latter transcript has no known counterpart in the other vitamin K-dependent proteins. The positions of the introns with respect to the amino acid sequence encoded by the eight essential exons of factor VII are the same as those present in factor IX, factor X, protein C, and the first three exons of prothrombin. These exons code for domains generally conserved among members of this gene family. The comparable introns in these genes, however, are dissimilar with respect to size and sequence, with the exception of intron C in factor VII and protein C. The gene for factor VII also contains five regions made up of tandem repeats of oligonucleotide monomer elements. More than a quarter of the intron sequences and more than a third of the 3' untranslated portion of the mRNA transcript consist of these minisatellite tandem repeats

  13. Molecular study and nucleotide sequencing of Chlamydia abortus isolated from aborted sheep fetuses ewes of Alborz province

    Directory of Open Access Journals (Sweden)

    amirreza ebadi

    2015-02-01

    Full Text Available Chlamydia is an obligate intracellular and gram negative coccobacilli and one of the most important causes of abortion in ruminants especially in ewes. This investigation was performed with the purpose of molecular study and sequencing of Chlamydia abortus isolated from aborted sheep fetuses of Alborz Province. In this study, DNA extraction was performed on 100 samples from aborted fetuses of 32 sheep flocks from different areas of Alborz province. Then using specific primers of gene IGS-Sr- RNA, polymerase chain reaction was conducted and 10 samples were selected randomly from the positive cases were sent to Macrogene company in Korea for sequencing. In this study, 37 samples from a total of 100 aborted fetuses were positive for Chlamydia abortus. After sequencing, more than 99 percent of the positive samples were similar with sequences in gene bank. The sequencing results indicated that the samples were very similar to isolates LN554882/1, AF051935/1 and CR848038/1 of the gene bank and were in the same cluster. Also, this investigation indicated that Chlamydia abortus is one of the main reasons of ewe abortion in Alborz province.

  14. Cloning and nucleotide sequence analysis of pepV, a carnosinase gene from Lactobacillus delbrueckii subsp. lactis DSM 7290, and partial characterization of the enzyme.

    Science.gov (United States)

    Vongerichten, K F; Klein, J R; Matern, H; Plapp, R

    1994-10-01

    Cell extracts of Lactobacillus delbrueckii subsp. lactis DSM 7290 were found to exhibit unique peptolytic ability against unusual beta-alanyl-dipeptides. In order to clone the gene encoding this activity, designated pepV, a gene library of strain DSM 7290 genomic DNA, prepared in the low-copy-number plasmid pLG339, was screened for heterologous expression in Escherichia coli. Recombinant clones harbouring pepV were identified by their ability to allow the utilization of carnosine (beta-alanyl-histidine) as a source of histidine by the E. coli mutant strain UK197 (pepD, hisG). Complementation was observed in a colony harbouring a recombinant plasmid (pKV101), carrying pepV. A 2.4 kb fragment containing pepV was subcloned and its nucleotide sequence revealed an open reading frame (ORF) of 1413 nucleotides, corresponding to a protein with predicted molecular mass of 51998 Da. A single transcription initiation site 71 bp upstream of the ATG translational start codon was identified by primer extension. No significant homology was detected between pepV or its deduced amino acid sequence with any entry in the databases. The only similarity was found in a region conserved in the ArgE/DapE/CPG2/YscS family of proteins. This observation, and protease inhibitor studies, indicated that pepV is of the metalloprotease type. A second ORF present in the sequenced fragment showed extensive homology to a variety of amino acid permeases from E. coli and Saccharomyces cerevisiae.

  15. Striking structural dynamism and nucleotide sequence variation of the transposon Galileo in the genome of Drosophila mojavensis.

    Science.gov (United States)

    Marzo, Mar; Bello, Xabier; Puig, Marta; Maside, Xulio; Ruiz, Alfredo

    2013-02-04

    Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome.

  16. Exploiting nucleotide composition to engineer promoters.

    Directory of Open Access Journals (Sweden)

    Manfred G Grabherr

    Full Text Available The choice of promoter is a critical step in optimizing the efficiency and stability of recombinant protein production in mammalian cell lines. Artificial promoters that provide stable expression across cell lines and can be designed to the desired strength constitute an alternative to the use of viral promoters. Here, we show how the nucleotide characteristics of highly active human promoters can be modelled via the genome-wide frequency distribution of short motifs: by overlapping motifs that occur infrequently in the genome, we constructed contiguous sequence that is rich in GC and CpGs, both features of known promoters, but lacking homology to real promoters. We show that snippets from this sequence, at 100 base pairs or longer, drive gene expression in vitro in a number of mammalian cells, and are thus candidates for use in protein production. We further show that expression is driven by the general transcription factors TFIIB and TFIID, both being ubiquitously present across cell types, which results in less tissue- and species-specific regulation compared to the viral promoter SV40. We lastly found that the strength of a promoter can be tuned up and down by modulating the counts of GC and CpGs in localized regions. These results constitute a "proof-of-concept" for custom-designing promoters that are suitable for biotechnological and medical applications.

  17. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis.

    Science.gov (United States)

    Hong, Yanbin; Pandey, Manish K; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut.

  18. Ciliate telomerase RNA loop IV nucleotides promote hierarchical RNP assembly and holoenzyme stability.

    Science.gov (United States)

    Robart, Aaron R; O'Connor, Catherine M; Collins, Kathleen

    2010-03-01

    Telomerase adds simple-sequence repeats to chromosome 3' ends to compensate for the loss of repeats with each round of genome replication. To accomplish this de novo DNA synthesis, telomerase uses a template within its integral RNA component. In addition to providing the template, the telomerase RNA subunit (TER) also harbors nontemplate motifs that contribute to the specialized telomerase catalytic cycle of reiterative repeat synthesis. Most nontemplate TER motifs function through linkage with the template, but in ciliate and vertebrate telomerases, a stem-loop motif binds telomerase reverse transcriptase (TERT) and reconstitutes full activity of the minimal recombinant TERT+TER RNP, even when physically separated from the template. Here, we resolve the functional requirements for this motif of ciliate TER in physiological RNP context using the Tetrahymena thermophila p65-TER-TERT core RNP reconstituted in vitro and the holoenzyme reconstituted in vivo. Contrary to expectation based on assays of the minimal recombinant RNP, we find that none of a panel of individual loop IV nucleotide substitutions impacts the profile of telomerase product synthesis when reconstituted as physiological core RNP or holoenzyme RNP. However, loop IV nucleotide substitutions do variably reduce assembly of TERT with the p65-TER complex in vitro and reduce the accumulation and stability of telomerase RNP in endogenous holoenzyme context. Our results point to a unifying model of a conformational activation role for this TER motif in the telomerase RNP enzyme.

  19. A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

    Science.gov (United States)

    Keel, B N; Nonneman, D J; Rohrer, G A

    2017-08-01

    Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  20. Sequence-based separation of single-stranded DNA using nucleotides in capillary electrophoresis: focus on phosphate.

    Science.gov (United States)

    Zhang, Xueru; McGown, Linda B

    2013-06-01

    DNA analysis has widespread applicability in biology, medicine, biotechnology, and forensics. DNA separation by length is readily achieved using sieving gels in electrophoresis. Separation by sequence is less simple, generally requiring adequate differences in native or induced conformation or differences in thermal or chemical stability of the strands that are hybridized prior to measurement. We previously demonstrated separation of four single-stranded DNA 76-mers that differ by only a few A-G substitutions based solely on sequence using guanosine-5'-monophosphate (GMP) in the running buffer. We attributed separation to the unique self-assembly of GMP to form higher order structures. Here, we examine an expanded set of 76-mers designed to probe the mechanism of the separation and effects of experimental conditions. We were surprised to find that other ribonucleotides achieved the similar separation to GMP, and that some separation was achieved using sodium phosphate instead of GMP. Potassium phosphate achieved almost as good separations as the ribonucleotides. This suggests that the separation medium provides a physicochemical environment for the DNA that effects strand migration in a sequence-selective manner. Further investigation is needed to determine whether the mechanism involves specific interactions between the phosphates and the DNA strands or is a result of other properties of the separation medium. Phosphate generally has been avoided in DNA separations by capillary gel electrophoresis because its high ionic strength exacerbates Joule heating. Our results suggest that phosphate compounds should be examined for separation of DNA based on sequence. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Nucleotide Metabolism

    DEFF Research Database (Denmark)

    Martinussen, Jan; Willemoës, M.; Kilstrup, Mogens

    2011-01-01

    Metabolic pathways are connected through their utilization of nucleotides as supplier of energy, allosteric effectors, and their role in activation of intermediates. Therefore, any attempt to exploit a given living organism in a biotechnological process will have an impact on nucleotide metabolis...

  2. Absence of zero-temperature transmission rate of a double-chain tight-binding model for DNA with random sequence of nucleotides in thermodynamic limit

    International Nuclear Information System (INIS)

    Xiong Gang; Wang, X.R.

    2005-01-01

    The zero-temperature transmission rate spectrum of a double-chain tight-binding model for real DNA is calculated. It is shown that a band of extended-like states exists only for finite chain length with strong inter-chain coupling. While the whole spectrum tends to zero in thermodynamic limit, regardless of the strength of inter-chain coupling. It is also shown that a more faithful model for real DNA with periodic sugar-phosphate chains in backbone structures can be mapped into the above simple double-chain tight-binding model. Combined with above results, the transmission rate of real DNA with long random sequence of nucleotides is expected to be poor

  3. The complete nucleotide sequences of the 5 genetically distinct plastid genomes of Oenothera, subsection Oenothera: II. A microevolutionary view using bioinformatics and formal genetic data.

    Science.gov (United States)

    Greiner, Stephan; Wang, Xi; Herrmann, Reinhold G; Rauwolf, Uwe; Mayer, Klaus; Haberer, Georg; Meurer, Jörg

    2008-09-01

    A unique combination of genetic features and a rich stock of information make the flowering plant genus Oenothera an appealing model to explore the molecular basis of speciation processes including nucleus-organelle coevolution. From representative species, we have recently reported complete nucleotide sequences of the 5 basic and genetically distinguishable plastid chromosomes of subsection Oenothera (I-V). In nature, Oenothera plastid genomes are associated with 6 distinct, either homozygous or heterozygous, diploid nuclear genotypes of the 3 basic genomes A, B, or C. Artificially produced plastome-genome combinations that do not occur naturally often display interspecific plastome-genome incompatibility (PGI). In this study, we compare formal genetic data available from all 30 plastome-genome combinations with sequence differences between the plastomes to uncover potential determinants for interspecific PGI. Consistent with an active role in speciation, a remarkable number of genes have high Ka/Ks ratios. Different from the Solanacean cybrid model Atropa/tobacco, RNA editing seems not to be relevant for PGIs in Oenothera. However, predominantly sequence polymorphisms in intergenic segments are proposed as possible sources for PGI. A single locus, the bidirectional promoter region between psbB and clpP, is suggested to contribute to compartmental PGI in the interspecific AB hybrid containing plastome I (AB-I), consistent with its perturbed photosystem II activity.

  4. Nucleotide and deduced amino acid sequence of the envelope gene of the Vasilchenko strain of TBE virus; comparison with other flaviviruses.

    Science.gov (United States)

    Gritsun, T S; Frolova, T V; Pogodina, V V; Lashkevich, V A; Venugopal, K; Gould, E A

    1993-02-01

    A strain of tick-borne encephalitis virus known as Vasilchenko (Vs) exhibits relatively low virulence characteristics in monkeys, Syrian hamsters and humans. The gene encoding the envelope glycoprotein of this virus was cloned and sequenced. Alignment of the sequence with those of other known tick-borne flaviviruses and identification of the recognised amino acid genetic marker EHLPTA confirmed its identity as a member of the TBE complex. However, Vs virus was distinguishable from eastern and western tick-borne serotypes by the presence of the sequence AQQ at amino acid positions 232-234 and also by the presence of other specific amino acid substitutions which may be genetic markers for these viruses and could determine their pathogenetic characteristics. When compared with other tick-borne flaviviruses, Vs virus had 12 unique amino acid substitutions including an additional potential glycosylation site at position (315-317). The Vs virus strain shared closest nucleotide and amino acid homology (84.5% and 95.5% respectively) with western and far eastern strains of tick-borne encephalitis virus. Comparison with the far eastern serotype of tick-borne encephalitis virus, by cross-immunoelectrophoresis of Vs virions and PAGE analysis of the extracted virion proteins, revealed differences in surface charge and virus stability that may account for the different virulence characteristics of Vs virus. These results support and enlarge upon previous data obtained from molecular and serological analysis.

  5. The complete nucleotide sequence of the genome of Barley yellow dwarf virus-RMV reveals it to be a new Polerovirus distantly related to other yellow dwarf viruses.

    Science.gov (United States)

    Krueger, Elizabeth N; Beckett, Randy J; Gray, Stewart M; Miller, W Allen

    2013-01-01

    The yellow dwarf viruses (YDVs) of the Luteoviridae family represent the most widespread group of cereal viruses worldwide. They include the Barley yellow dwarf viruses (BYDVs) of genus Luteovirus, the Cereal yellow dwarf viruses (CYDVs) and Wheat yellow dwarf virus (WYDV) of genus Polerovirus. All of these viruses are obligately aphid transmitted and phloem-limited. The first described YDVs (initially all called BYDV) were classified by their most efficient vector. One of these viruses, BYDV-RMV, is transmitted most efficiently by the corn leaf aphid, Rhopalosiphum maidis. Here we report the complete 5612 nucleotide sequence of the genomic RNA of a Montana isolate of BYDV-RMV (isolate RMV MTFE87, Genbank accession no. KC921392). The sequence revealed that BYDV-RMV is a polerovirus, but it is quite distantly related to the CYDVs or WYDV, which are very closely related to each other. Nor is BYDV-RMV closely related to any other particular polerovirus. Depending on the gene that is compared, different poleroviruses (none of them a YDV) share the most sequence similarity to BYDV-RMV. Because of its distant relationship to other YDVs, and because it commonly infects maize via its vector, R. maidis, we propose that BYDV-RMV be renamed Maize yellow dwarf virus-RMV (MYDV-RMV).

  6. The complete nucleotide sequence of the genome of Barley yellow dwarf virus-RMV reveals it to be a new Polerovirus distantly related to other yellow dwarf viruses

    Directory of Open Access Journals (Sweden)

    Elizabeth N. Krueger

    2013-07-01

    Full Text Available The yellow dwarf viruses (YDVs of the Luteoviridae family represent the most widespread group of cereal viruses worldwide. They include the Barley yellow dwarf viruses (BYDVs of genus Luteovirus, the Cereal yellow dwarf viruses (CYDVs and Wheat yellow dwarf virus (WYDV of genus Polerovirus. All of these viruses are obligately aphid transmitted and phloem-limited. The first described YDVs (initially all called BYDV were classified by their most efficient vector. One of these viruses, BYDV-RMV, is transmitted most efficiently by the corn leaf aphid, Rhopalosiphum maidis. Here we report the complete 5612 nucleotide sequence of the genomic RNA of a Montana isolate of BYDV-RMV (isolate RMV MTFE87, Genbank accession no. KC921392. The sequence revealed that BYDV-RMV is a polerovirus, but it is quite distantly related to the CYDVs or WYDV, which are very closely related to each other. Nor is BYDV-RMV closely related to any other particular polerovirus. Depending on the gene that is compared, different poleroviruses (none of them a YDV share the most sequence similarity to BYDV-RMV. Because of its distant relationship to other YDVs, and because it commonly infects maize via its vector, R. maidis, we propose that BYDV-RMV be renamed Maize yellow dwarf virus-RMV (MYDV-RMV.

  7. The influence of selection on the evolutionary distance estimated from the base changes observed between homologous nucleotide sequences.

    Science.gov (United States)

    Otsuka, J; Kawai, Y; Sugaya, N

    2001-11-21

    In most studies of molecular evolution, the nucleotide base at a site is assumed to change with the apparent rate under functional constraint, and the comparison of base changes between homologous genes is thought to yield the evolutionary distance corresponding to the site-average change rate multiplied by the divergence time. However, this view is not sufficiently successful in estimating the divergence time of species, but mostly results in the construction of tree topology without a time-scale. In the present paper, this problem is investigated theoretically by considering that observed base changes are the results of comparing the survivals through selection of mutated bases. In the case of weak selection, the time course of base changes due to mutation and selection can be obtained analytically, leading to a theoretical equation showing how the selection has influence on the evolutionary distance estimated from the enumeration of base changes. This result provides a new method for estimating the divergence time more accurately from the observed base changes by evaluating both the strength of selection and the mutation rate. The validity of this method is verified by analysing the base changes observed at the third codon positions of amino acid residues with four-fold codon degeneracy in the protein genes of mammalian mitochondria; i.e. the ratios of estimated divergence times are fairly well consistent with a series of fossil records of mammals. Throughout this analysis, it is also suggested that the mutation rates in mitochondrial genomes are almost the same in different lineages of mammals and that the lineage-specific base-change rates indicated previously are due to the selection probably arising from the preference of transfer RNAs to codons.

  8. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  9. Isolation and characterization of human glycophorin A cDNA clones by a synthetic oligonucleotide approach: nucleotide sequence and mRNA structure

    International Nuclear Information System (INIS)

    Siebert, P.D.; Fukuda, M.

    1986-01-01

    In an effort to understand the relationships among and the regulation of human glycophorins, the authors have isolated and characterized several glycophorin A-specific cDNA clones obtained from a human erythroleukemic K562 cell cDNA library. This was accomplished by using mixed synthetic oligonucleotides, corresponding to various regions of the known amino acid sequence, to prime the synthesis of the cDNA as well as to screen the cDNA library. They also used synthetic oligonucleotides to sequence the largest of the glycophorin cDNAs. The nucleotide sequence obtained suggests the presence of a potential leader peptide, consistent with the membrane localization of this glycoprotein. Examination of the structure of glycophorin mRNA by blot hybridization revealed the existence of several electrophoretically distinct mRNAs numbering three or four, depending on the size of the glycophorin cDNA used as a hybridization probe. The smaller cDNA hybridized to three mRNAs of approximately 2.8, 1.7, and 1.0 kilobases. In contrast, the larger cDNA hybridized to an additional mRNA of approximately 0.6 kilobases. Further examination of the relationships between these multiple mRNAs by blot hybridization was conducted with the use of exact-sequence oligonucleotide probes constructed from various regions of the cDNA representing portions of the amino acid sequence of glycophorin A with or without known homology with glycophorin B. In total, the results obtained are consistent with the hypothesis that the three larger mRNAs represent glycophorin A gene transcripts and that the smallest (0.6 kilobase) mRNA may be specific for glycophorin B

  10. Complete nucleotide sequence of the Coturnix chinensis (blue-breasted quail) mitochondrial genome and a phylogenetic analysis with related species.

    Science.gov (United States)

    Nishibori, M; Tsudzuki, M; Hayashi, T; Yamamoto, Y; Yasue, H

    2002-01-01

    Coturnix chinensis (blue-breasted quail) has been classically grouped in Galliformes Phasianidae Coturnix, based on morphologic features and biochemical evidence. Since the blue-breasted quail has the smallest body size among the species of Galliformes, in addition to a short generation time and an excellent reproductive performance, it is a possible model fowl for breeding and physiological studies of the Coturnix japonica (Japanese quail) and Gallus gallus domesticus (chicken), which are classified in the same family as blue-breasted quail. However, since its phylogenetic position in the family Phasianidae has not been determined conclusively, the sequence of the entire blue-breasted quail mitochondria (mt) genome was obtained to provide genetic information for phylogenetic analysis in the present study. The blue-breasted quail mtDNA was found to be a circular DNA of 16,687 base pairs (bp) with the same genomic structure as the mtDNAs of Japanese quail and chicken, though it is smaller than Japanese quail and chicken mtDNAs by 10 bp and 88 bp, respectively. The sequence identity of all mitochondrial genes, including those for 12S and 16S ribosomal RNAs, between blue-breasted quail and Japanese quail ranged from 84.5% to 93.5%; between blue-breasted quail and chicken, sequence identity ranged from 78.0% to 89.6%. In order to obtain information on the phylogenetic position of blue-breasted quail in Galliformes Phasianidae, the 2,184 bp sequence comprising NADH dehydrogenase subunit 2 and cytochrome b genes available for eight species in Galliformes [Japanese quail, chicken, Gallus varius (green junglefowl), Bambusicola thoracica (Chinese bamboo partridge), Pavo cristatus (Indian peafowl), Perdix perdix (gray partridge), Phasianus colchicus (ring-neck pheasant), and Tympanchus phasianellus (sharp-tailed grouse)] together with that of Aythya americana (redhead) were examined using a maximum likelihood (ML) method. The ML analyses on the first/second codon positions

  11. TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine.

    Science.gov (United States)

    Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun

    2016-10-01

    As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.

  12. Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution.

    Science.gov (United States)

    Pausch, Hubert; Emmerling, Reiner; Gredler-Grandl, Birgit; Fries, Ruedi; Daetwyler, Hans D; Goddard, Michael E

    2017-11-09

    Genotyping and whole-genome sequencing data have been generated for hundreds of thousands of cattle. International consortia used these data to compile imputation reference panels that facilitate the imputation of sequence variant genotypes for animals that have been genotyped using dense microarrays. Association studies with imputed sequence variant genotypes allow for the characterization of quantitative trait loci (QTL) at nucleotide resolution particularly when individuals from several breeds are included in the mapping populations. We imputed genotypes for 28 million sequence variants in 17,229 cattle of the Braunvieh, Fleckvieh and Holstein breeds in order to compile large mapping populations that provide high power to identify QTL for milk production traits. Association tests between imputed sequence variant genotypes and fat and protein percentages in milk uncovered between six and thirteen QTL (P < 1e-8) per breed. Eight of the detected QTL were significant in more than one breed. We combined the results across breeds using meta-analysis and identified a total of 25 QTL including six that were not significant in the within-breed association studies. Two missense mutations in the ABCG2 (p.Y581S, rs43702337, P = 4.3e-34) and GHR (p.F279Y, rs385640152, P = 1.6e-74) genes were the top variants at QTL on chromosomes 6 and 20. Another known causal missense mutation in the DGAT1 gene (p.A232K, rs109326954, P = 8.4e-1436) was the second top variant at a QTL on chromosome 14 but its allelic substitution effects were inconsistent across breeds. It turned out that the conflicting allelic substitution effects resulted from flaws in the imputed genotypes due to the use of a multi-breed reference population for genotype imputation. Many QTL for milk production traits segregate across breeds and across-breed meta-analysis has greater power to detect such QTL than within-breed association testing. Association testing between imputed sequence variant genotypes and

  13. Nucleotide sequence of Phaseolus vulgaris L. alcohol dehydrogenase encoding cDNA and three-dimensional structure prediction of the deduced protein.

    Science.gov (United States)

    Amelia, Kassim; Khor, Chin Yin; Shah, Farida Habib; Bhore, Subhash J

    2015-01-01

    Common beans (Phaseolus vulgaris L.) are widely consumed as a source of proteins and natural products. However, its yield needs to be increased. In line with the agenda of Phaseomics (an international consortium), work of expressed sequence tags (ESTs) generation from bean pods was initiated. Altogether, 5972 ESTs have been isolated. Alcohol dehydrogenase (AD) encoding gene cDNA was a noticeable transcript among the generated ESTs. This AD is an important enzyme; therefore, to understand more about it this study was undertaken. The objective of this study was to elucidate P. vulgaris L. AD (PvAD) gene cDNA sequence and to predict the three-dimensional (3D) structure of deduced protein. positive and negative strands of the PvAD cDNA clone were sequenced using M13 forward and M13 reverse primers to elucidate the nucleotide sequence. Deduced PvAD cDNA and protein sequence was analyzed for their basic features using online bioinformatics tools. Sequence comparison was carried out using bl2seq program, and tree-view program was used to construct a phylogenetic tree. The secondary structures and 3D structure of PvAD protein were predicted by using the PHYRE automatic fold recognition server. The sequencing results analysis showed that PvAD cDNA is 1294 bp in length. It's open reading frame encodes for a protein that contains 371 amino acids. Deduced protein sequence analysis showed the presence of putative substrate binding, catalytic Zn binding, and NAD binding sites. Results indicate that the predicted 3D structure of PvAD protein is analogous to the experimentally determined crystal structure of s-nitrosoglutathione reductase from an Arabidopsis species. The 1294 bp long PvAD cDNA encodes for 371 amino acid long protein that contains conserved domains required for biological functions of AD. The predicted deduced PvAD protein's 3D structure reflects the analogy with the crystal structure of Arabidopsis thaliana s-nitrosoglutathione reductase. Further study is required

  14. Single-nucleotide variant in multiple copies of a deleted in azoospermia (DAZ) sequence - a human Y chromosome quantitative polymorphism.

    Science.gov (United States)

    Szmulewicz, Martin N; Ruiz, Luis M; Reategui, Erika P; Hussini, Saeed; Herrera, Rene J

    2002-01-01

    The evolution of the deleted in azoospermia (DAZ) gene family supports prevalent theories on the origin and development of sex chromosomes and sexual dimorphism. The ancestral DAZL gene in human chromosome 3 is known to be involved in germline development of both males and females. The available phylogenetic data suggest that some time after the divergence of the New World and Old World monkey lineages, the DAZL gene, which is found in all mammals, was copied to the Y chromosome of an ancestor to the Old World monkeys, but not New World monkeys. In modern man, the Y-linked DAZ gene complex is located on the distal part of the q arm. It is thought that after being copied to the Y chromosome, and after the divergence of the human and great ape lineages, the DAZ gene in the former underwent internal rearrangements. This included tandem duplications as well as a T > C transition altering an MboI restriction enzyme site in a duplicated sequence. In this study, we report on the ratios of MboI-/MboI+ variant sequences in individuals from seven worldwide human populations (Basque, Benin, Egypt, Formosa, Kungurtug, Oman and Rwanda) in the DAZ complex. The ratio of PCR MboI- and MboI+ amplicons can be used to characterize individuals and populations. Our results show a nonrandom distribution of MboI-/MboI+ sequence ratios in all populations examined, as well as significant differences in ratios between populations when compared pairwise. The multiple ratios imply that there have been more than one recent reorganization events at this locus. Considering the dynamic nature of this locus and its involvement in male fertility, we investigated the extent and distribution of this polymorphism. Copyright 2002 S. Karger AG, Basel

  15. Nucleotide sequence of a human cDNA encoding a ras-related protein (rap1B)

    Energy Technology Data Exchange (ETDEWEB)

    Pizon, V; Lerosey, I; Chardin, P; Tavitian, A [INSERM, Paris (France)

    1988-08-11

    The authors have previously characterized two human ras-related genes rap1 and rap2. Using the rap1 clone as probe they isolated and sequenced a new rap cDNA encoding the 184aa rap1B protein. The rap1B protein is 95% identical to rap1 and shares several properties with the ras protein suggesting that it could bind GTP/GDP and have a membrane location. As for rap1, the structural characteristics of rap1B suggest that the rap and ras proteins might interact on the same effector.

  16. PCR Assays for Identification of Coccidioides posadasii Based on the Nucleotide Sequence of the Antigen 2/Proline-Rich Antigen

    Science.gov (United States)

    Bialek, Ralf; Kern, Jan; Herrmann, Tanja; Tijerina, Rolando; Ceceñas, Luis; Reischl, Udo; González, Gloria M.

    2004-01-01

    A conventional nested PCR and a real-time LightCycler PCR assay for detection of Coccidioides posadasii DNA were designed and tested in 120 clinical strains. These had been isolated from 114 patients within 10 years in Monterrey, Nuevo Leon, Mexico, known to be endemic for coccidioidomycosis. The gene encoding the specific antigen 2/proline-rich antigen (Ag2/PRA) was used as a target. All strains were correctly identified, whereas DNA from related members of the family Onygenaceae remained negative. Melting curve analysis by LightCycler and sequencing of the 526-bp product of the first PCR demonstrated either 100% identity to the GenBank sequence of the Silveira strain, now known to be C. posadasii (accession number AF013256), or a single silent mutation at position 1228. Length determination of two microsatellite-containing loci (GAC and 621) identified all 120 isolates as C. posadasii. Specific DNA was amplified by conventional nested PCR from three microscopically spherule-positive paraffin-embedded tissue samples, whereas 20 human tissue samples positive for other dimorphic fungi remained negative. Additionally, the safety of each step of a modified commercially available DNA extraction procedure was evaluated by using 10 strains. At least three steps of the protocol were demonstrated to sufficiently kill arthroconidia. This safe procedure is applicable to cultures and to clinical specimens. PMID:14766853

  17. Characterisation of purified parvalbumin from five fish species and nucleotide sequencing of this major allergen from Pacific pilchard, Sardinops sagax.

    Science.gov (United States)

    Beale, Janine E; Jeebhay, Mohamed F; Lopata, Andreas L

    2009-09-01

    IgE-mediated allergic reaction to seafood is a common cause of food allergy including anaphylactic reactions. Parvalbumin, the major fish allergen, has been shown to display IgE cross-reactivity among fish species consumed predominantly in Europe and the Far East. However, cross-reactivity studies of parvalbumin from fish species widely consumed in the Southern hemisphere are limited as is data relating to immunological and molecular characterisation. In this study, antigenic cross-reactivity and the presence of oligomers and isomers of parvalbumin from five highly consumed fish species in Southern Africa were assessed by immunoblotting using purified parvalbumin and crude fish extracts. Pilchard (Sardinops sagax) parvalbumin was found to display the strongest IgE reactivity among 10 fish-allergic consumers. The cDNA sequence of the beta-form of pilchard parvalbumin was determined and designated Sar sa 1.0101 (accession number FM177701 EMBL/GenBank/DDBJ databases). Oligomeric forms of parvalbumin were observed in all fish species using a monoclonal anti-parvalbumin antibody and subject's sera. Isoforms varied between approximately 10-13 kDa. A highly cross-reactive allergenic isoform of parvalbumin was identified and sequenced, providing a successful primary step towards the generation of a recombinant form that could be used for diagnostic and potential therapeutic use in allergic individuals.

  18. Human uroporphyrinogen III synthase: Molecular cloning, nucleotide sequence, and expression of a full-length cDNA

    International Nuclear Information System (INIS)

    Tsai, Shihfeng; Bishop, D.F.; Desnick, R.J.

    1988-01-01

    Uroporphyrinogen III synthase, the fourth enzyme in the heme biosynthetic pathway, is responsible for conversion of the linear tetrapyrrole, hydroxymethylbilane, to the cyclic tetrapyrrole, uroporphyrinogen III. The deficient activity of URO-synthase is the enzymatic defect in the autosomal recessive disorder congenital erythropoietic porphyria. To facilitate the isolation of a full-length cDNA for human URO-synthase, the human erythrocyte enzyme was purified to homogeneity and 81 nonoverlapping amino acids were determined by microsequencing the N terminus and four tryptic peptides. Two synthetic oligonucleotide mixtures were used to screen 1.2 x 10 6 recombinants from a human adult liver cDNA library. Eight clones were positive with both oligonucleotide mixtures. Of these, dideoxy sequencing of the 1.3 kilobase insert from clone pUROS-2 revealed 5' and 3' untranslated sequences of 196 and 284 base pairs, respectively, and an open reading frame of 798 base pairs encoding a protein of 265 amino acids with a predicted molecular mass of 28,607 Da. The isolation and expression of this full-length cDNA for human URO-synthase should facilitate studies of the structure, organization, and chromosomal localization of this heme biosynthetic gene as well as the characterization of the molecular lesions causing congenital erythropoietic porphyria

  19. Temporal motifs in time-dependent networks

    International Nuclear Information System (INIS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-01-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological–temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network

  20. Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

    Science.gov (United States)

    Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

    2013-03-15

    The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter

  1. Purification, enzymatic characterization, and nucleotide sequence of a high-isoelectric-point alpha-glucosidase from barley malt

    DEFF Research Database (Denmark)

    Frandsen, T P; Lok, F; Mirgorodskaya, E

    2000-01-01

    in the transition state complex. Mass spectrometry of tryptic fragments assigned the 92-kD protein to a barley cDNA (GenBank accession no. U22450) that appears to encode an alpha-glucosidase. A corresponding sequence (HvAgl97; GenBank accession no. AF118226) was isolated from a genomic phage library using a c......High-isoelectric-point (pI) alpha-glucosidase was purified 7, 300-fold from an extract of barley (Hordeum vulgare) malt by ammonium sulfate fractionation, ion-exchange, and butyl-Sepharose chromatography. The enzyme had high activity toward maltose (k(cat) = 25 s(-1)), with an optimum at pH 4...

  2. Complete nucleotide sequence of the Oenothera elata plastid chromosome, representing plastome I of the five distinguishable euoenothera plastomes.

    Science.gov (United States)

    Hupfer, H; Swiatek, M; Hornung, S; Herrmann, R G; Maier, R M; Chiu, W L; Sears, B

    2000-05-01

    We describe the 159,443-bp [corrected] sequence of the plastid chromosome of Oenothera elata (evening primrose). The Oe. elata plastid chromosome represents type I of the five genetically distinguishable basic plastomes found in the subsection Euoenothera. The genus Oenothera provides an ideal system in which to address fundamental questions regarding the functional integration of the compartmentalised genetic system characteristic of the eukaryotic cell. Its highly developed taxonomy and genetics, together with a favourable combination of features in its genetic structure (interspecific fertility, stable heterozygous progeny, biparental transmission of organelles, and the phenomenon of complex heterozygosity), allow facile exchanges of nuclei, plastids and mitochondria, as well as individual chromosome pairs, between species. The resulting hybrids or cybrids are usually viable and fertile, but can display various forms of developmental disturbance.

  3. COMPLETE NUCLEOTIDE SEQUENCE OF SPHEROIDIN GENES OF CALLIPTAMUS ITALICUS ENTOMOPOXVIRUS(CIEPV) AND GOMPHOCERUS SIBIRICUS ENTOMOPOXVIRUS(GSEPV)

    Institute of Scientific and Technical Information of China (English)

    Yong-danLi; Li-yingWang; Xi-wuGao; Chao-yangZhao; Zhao-fengTian

    2004-01-01

    The spheroidin genes of Calliptamus italicus entomopoxvirus (CiEPV) and Gomphocerus sibiricus entomopoxvirus (GsEPV) were obtained by PCR,and the fragments were cloned, sequenced and analyzed. The CiEPV and GsEPV spheroidin genes respectively harbored ORFs of 2 922 bps and 2 967 bps that were capable of coding polypeptides of 109.2 and 111.1 kDa. Computer analysis indicated that CiEPV and GsEPV spheroidins shared less than 20% amino acid identities with lepidopteran AmEPV and coleopteran AcEPV spheroidins, but more than 80% amino acid identities with orthopteran OaEPV, MsEPV and AaEPV spheroidins. The CiEPV and GsEPV spheroidins respectively contained 19 and 21 cysteine residues that were particularly abundant at the C-termini, as is the case with those of the other orthopteran EPV spheroidins. The numbers and locations of the cysteine residues of the spheroidins were most similar to those of the spheroidins of EPVs that are virulent on the same insect orders. The promoter regions of the two spheroidin genes were highly conserved (99%) among the orthopteran EPVs and also contained the typical very A+T rich and TAAATG signal mediating transcription of poxvirus late genes. We also sequenced an incomplete ORF downstream of the pheroidin gene of CiEPV and GsEPV. The ORF was in the opposite direction to the spheroidin gene and was homologous to MSV072 putative protein of MsEPV.

  4. Species delimitation of common reef corals in the genus Pocillopora using nucleotide sequence phylogenies, population genetics and symbiosis ecology.

    Science.gov (United States)

    Pinzón, Jorge H; LaJeunesse, Todd C

    2011-01-01

    Stony corals in the genus Pocillopora are among the most common and widely distributed of Indo-Pacific corals and, as such, are often the subject of physiological and ecological research. In the far Tropical Eastern Pacific (TEP), they are major constituents of shallow coral communities, exhibiting considerable variability in colony shape and branch morphology and marked differences in response to thermal stress. Numerous intermediates occur between morphospecies that may relate to extensive hybridization. The diversity of the Pocillopora genus in the TEP was analysed genetically using nuclear ribosomal (ITS2) and mitochondrial (ORF) sequences, and population genetic markers (seven microsatellite loci). The resident dinoflagellate endosymbiont (Symbiodinium sp.) in each sample was also characterized using sequences of the internal transcribed spacer 2 (ITS2) rDNA and the noncoding region of the chloroplast psbA minicircle. From these analyses, three symbiotically distinct, reproductively isolated, nonhybridizing, evolutionarily divergent animal lineages were identified. Designated types 1, 2 and 3, these groupings were incongruent with traditional morphospecies classification. Type 1 was abundant and widespread throughout the TEP; type 2 was restricted to the Clipperton Atoll; and type 3 was found only in Panama and the Galapagos Islands. Each type harboured a different Symbiodinium'species lineage' in Clade C, and only type 1 associated with the 'stress-tolerant'Symbiodinium glynni (D1). The accurate delineation of species and implementation of a proper taxonomy may profoundly improve our assessment of Pocillopora's reproductive biology, biogeographic distributions, and resilience to climate warming, information that must be considered when planning for the conservation of reef corals. © 2010 Blackwell Publishing Ltd.

  5. The mitochondrial genome sequence of the ciliate Paramecium caudatum reveals a shift in nucleotide composition and codon usage within the genus Paramecium

    Directory of Open Access Journals (Sweden)

    Berendonk Thomas U

    2011-05-01

    Full Text Available Abstract Background Despite the fact that the organization of the ciliate mitochondrial genome is exceptional, only few ciliate mitochondrial genomes have been sequenced until today. All ciliate mitochondrial genomes are linear. They are 40 kb to 47 kb long and contain some 50 tightly packed genes without introns. Earlier studies documented that the mitochondrial guanine + cytosine contents are very different between Paramecium tetraurelia and all studied Tetrahymena species. This raises the question of whether the high mitochondrial G+C content observed in P. tetraurelia is a characteristic property of Paramecium mtDNA, or whether it is an exception of the ciliate mitochondrial genomes known so far. To test this question, we determined the mitochondrial genome sequence of Paramecium caudatum and compared the gene content and sequence properties to the closely related P. tetraurelia. Results The guanine + cytosine content of the P. caudatum mitochondrial genome was significantly lower than that of P. tetraurelia (22.4% vs. 41.2%. This difference in the mitochondrial nucleotide composition was accompanied by significantly different codon usage patterns in both species, i.e. within P. caudatum clearly A/T ending codons dominated, whereas for P. tetraurelia the synonymous codons were more balanced with a higher number of G/C ending codons. Further analyses indicated that the nucleotide composition of most members of the genus Paramecium resembles that of P. caudatum and that the shift observed in P. tetraurelia is restricted to the P. aurelia species complex. Conclusions Surprisingly, the codon usage bias in the P. caudatum mitochondrial genome, exemplified by the effective number of codons, is more similar to the distantly related T. pyriformis and other single-celled eukaryotes such as Chlamydomonas, than to the closely related P. tetraurelia. These differences in base composition and codon usage bias were, however, not reflected in the amino

  6. 16S-23S rDNA intergenic spacer region polymorphism of Lactococcus garvieae, Lactococcus raffinolactis and Lactococcus lactis as revealed by PCR and nucleotide sequence analysis.

    Science.gov (United States)

    Blaiotta, Giuseppe; Pepe, Olimpia; Mauriello, Gianluigi; Villani, Francesco; Andolfi, Rosamaria; Moschetti, Giancarlo

    2002-12-01

    The intergenic spacer region (ISR) between the 16S and 23S rRNA genes was tested as a tool for differentiating lactococci commonly isolated in a dairy environment. 17 reference strains, representing 11 different species belonging to the genera Lactococcus, Streptococcus, Lactobacillus, Enterococcus and Leuconostoc, and 127 wild streptococcal strains isolated during the whole fermentation process of "Fior di Latte" cheese were analyzed. After 16S-23S rDNA ISR amplification by PCR, species or genus-specific patterns were obtained for most of the reference strains tested. Moreover, results obtained after nucleotide analysis show that the 16S-23S rDNA ISR sequences vary greatly, in size and sequence, among Lactococcus garvieae, Lactococcus raffinolactis, Lactococcus lactis as well as other streptococci from dairy environments. Because of the high degree of inter-specific polymorphism observed, 16S-23S rDNA ISR can be considered a good potential target for selecting species-specific molecular assays, such as PCR primer or probes, for a rapid and extremely reliable differentiation of dairy lactococcal isolates.

  7. Genome-Wide Single-Nucleotide Polymorphisms Discovery and High-Density Genetic Map Construction in Cauliflower Using Specific-Locus Amplified Fragment Sequencing

    Science.gov (United States)

    Zhao, Zhenqing; Gu, Honghui; Sheng, Xiaoguang; Yu, Huifang; Wang, Jiansheng; Huang, Long; Wang, Dan

    2016-01-01

    Molecular markers and genetic maps play an important role in plant genomics and breeding studies. Cauliflower is an important and distinctive vegetable; however, very few molecular resources have been reported for this species. In this study, a novel, specific-locus amplified fragment (SLAF) sequencing strategy was employed for large-scale single nucleotide polymorphism (SNP) discovery and high-density genetic map construction in a double-haploid, segregating population of cauliflower. A total of 12.47 Gb raw data containing 77.92 M pair-end reads were obtained after processing and 6815 polymorphic SLAFs between the two parents were detected. The average sequencing depths reached 52.66-fold for the female parent and 49.35-fold for the male parent. Subsequently, these polymorphic SLAFs were used to genotype the population and further filtered based on several criteria to construct a genetic linkage map of cauliflower. Finally, 1776 high-quality SLAF markers, including 2741 SNPs, constituted the linkage map with average data integrity of 95.68%. The final map spanned a total genetic length of 890.01 cM with an average marker interval of 0.50 cM, and covered 364.9 Mb of the reference genome. The markers and genetic map developed in this study could provide an important foundation not only for comparative genomics studies within Brassica oleracea species but also for quantitative trait loci identification and molecular breeding of cauliflower. PMID:27047515

  8. Genetic relatedness among indigenous rice varieties in the Eastern Himalayan region based on nucleotide sequences of the Waxy gene.

    Science.gov (United States)

    Choudhury, Baharul I; Khan, Mohammed L; Dayanandan, Selvadurai

    2014-12-29

    Indigenous rice varieties in the Eastern Himalayan region of Northeast India are traditionally classified into sali, boro and jum ecotypes based on geographical locality and the season of cultivation. In this study, we used DNA sequence data from the Waxy (Wx) gene to infer the genetic relatedness among indigenous rice varieties in Northeast India and to assess the genetic distinctiveness of ecotypes. The results of all three analyses (Bayesian, Maximum Parsimony and Neighbor Joining) were congruent and revealed two genetically distinct clusters of rice varieties in the region. The large group comprised several varieties of sali and boro ecotypes, and all agronomically improved varieties. The small group consisted of only traditionally cultivated indigenous rice varieties, which included one boro, few sali and all jum varieties. The fixation index analysis revealed a very low level of differentiation between sali and boro (F(ST) = 0.005), moderate differentiation between sali and jum (F(ST) = 0.108) and high differentiation between jum and boro (F(ST) = 0.230) ecotypes. The genetic relatedness analyses revealed that sali, boro and jum ecotypes are genetically heterogeneous, and the current classification based on cultivation type is not congruent with the genetic background of rice varieties. Indigenous rice varieties chosen from genetically distinct clusters could be used in breeding programs to improve genetic gain through heterosis, while maintaining high genetic diversity.

  9. Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2014-10-01

    Full Text Available As a direct result of intense heat and aridity, deserts are thought to be among the most harsh of environments, particularly for their mammalian inhabitants. Given that osmoregulation can be challenging for these animals, with failure resulting in death, strong selection should be observed on genes related to the maintenance of water and solute balance. One such animal, Peromyscus eremicus, is native to the desert regions of the southwest United States and may live its entire life without oral fluid intake. As a first step toward understanding the genetics that underlie this phenotype, we present a characterization of the P. eremicus transcriptome. We assay four tissues (kidney, liver, brain, testes from a single individual and supplement this with population level renal transcriptome sequencing from 15 additional animals. We identified a set of transcripts undergoing both purifying and balancing selection based on estimates of Tajima’s D. In addition, we used the branch-site test to identify a transcript—Slc2a9, likely related to desert osmoregulation—undergoing enhanced selection in P. eremicus relative to a set of related non-desert rodents.

  10. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    OpenAIRE

    Bolton, Michael J; Garry, Robert F

    2011-01-01

    Abstract Background The HIV surface glycoprotein gp120 (SU, gp120) and the Plasmodium vivax Duffy binding protein (PvDBP) bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM). Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infectio...

  11. Detection and copy number estimation of the transgenic nucleotide sequences in an unknown GM event of Oryza sativa

    Directory of Open Access Journals (Sweden)

    Ali M. Sajjad

    2016-12-01

    Full Text Available The present study was designed to establish a qualitative detection method based on conventional and real time PCR assay to screen the commonly grown rice varieties for the presence of the cry1Ac gene. The detection of genetically modified rice in the screening process would necessitate accurate assay development and precise qualitative PCR tests complying with established procedures for the detection and characterization of transgenes in food grains. Such assay would not only enable the monitoring of transgene flow in local agricultural environment but also the characterization of different plant species produced with this transgene and its regulatory components. Thus, a reliable and quick screening assay was established for the qualitative detection of the transgene along with the promoter and selectable marker gene in genetically modified rice. By conventional PCR, a fragment of 215 bp was amplified with gene specific primers of cry1Ac. Primers for other transgenes such as gna and bar were also employed; however, no amplification was detected. The presence of the p35s, sps, and nptII genes was confirmed by qualitative real-time PCR. The specificity of the respective PCR products was checked through melt peak curve analysis. Sharp and precise melting temperatures indicated the presence of a single kind of PCR product in correspondence to each of the primers used. Moreover, the copy number of cry1Ac was estimated by ∆∆CT method. It is proposed that the primer sets and experimental conditions used in this study will be sufficient to meet the requirements for molecular detection and characterization of the cry1Ac transgene and affiliated sequences in sorting out conventional rice varieties from the ones which are genetically modified. It will also help to monitor the ecological flow of these transgenes and other biosafety factors.

  12. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  13. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    Directory of Open Access Journals (Sweden)

    Nedenia Bonvino Stafuzza

    Full Text Available Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose, Gyr, Girolando and Holstein (dairy production. A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs and 3,828,041 insertions/deletions (InDels were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  14. Genome-wide association study using high-density single nucleotide polymorphism arrays and whole-genome sequences for clinical mastitis traits in dairy cattle.

    Science.gov (United States)

    Sahana, G; Guldbrandtsen, B; Thomsen, B; Holm, L-E; Panitz, F; Brøndum, R F; Bendixen, C; Lund, M S

    2014-11-01

    Mastitis is a mammary disease that frequently affects dairy cattle. Despite considerable research on the development of effective prevention and treatment strategies, mastitis continues to be a significant issue in bovine veterinary medicine. To identify major genes that affect mastitis in dairy cattle, 6 chromosomal regions on Bos taurus autosome (BTA) 6, 13, 16, 19, and 20 were selected from a genome scan for 9 mastitis phenotypes using imputed high-density single nucleotide polymorphism arrays. Association analyses using sequence-level variants for the 6 targeted regions were carried out to map causal variants using whole-genome sequence data from 3 breeds. The quantitative trait loci (QTL) discovery population comprised 4,992 progeny-tested Holstein bulls, and QTL were confirmed in 4,442 Nordic Red and 1,126 Jersey cattle. The targeted regions were imputed to the sequence level. The highest association signal for clinical mastitis was observed on BTA 6 at 88.97 Mb in Holstein cattle and was confirmed in Nordic Red cattle. The peak association region on BTA 6 contained 2 genes: vitamin D-binding protein precursor (GC) and neuropeptide FF receptor 2 (NPFFR2), which, based on known biological functions, are good candidates for affecting mastitis. However, strong linkage disequilibrium in this region prevented conclusive determination of the causal gene. A different QTL on BTA 6 located at 88.32 Mb in Holstein cattle affected mastitis. In addition, QTL on BTA 13 and 19 were confirmed to segregate in Nordic Red cattle and QTL on BTA 16 and 20 were confirmed in Jersey cattle. Although several candidate genes were identified in these targeted regions, it was not possible to identify a gene or polymorphism as the causal factor for any of these regions. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  15. Overlapping ETS and CRE Motifs (G/CCGGAAGTGACGTCA) Preferentially Bound by GABPα and CREB Proteins

    Science.gov (United States)

    Chatterjee, Raghunath; Zhao, Jianfei; He, Ximiao; Shlyakhtenko, Andrey; Mann, Ishminder; Waterfall, Joshua J.; Meltzer, Paul; Sathyanarayana, B. K.; FitzGerald, Peter C.; Vinson, Charles

    2012-01-01

    Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X4-N1-30-X4) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif (C/GCCGGAAGCGGAA) and the ETS⇔CRE motif (C/GCGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif. PMID:23050235

  16. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  17. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  18. [Application of single nucleotide polymorphism-microarray and target gene sequencing in the study of genetic etiology of children with unexplained intellectual disability or developmental delay].

    Science.gov (United States)

    Gao, Z J; Jiang, Q; Cheng, D Z; Yan, X X; Chen, Q; Xu, K M

    2016-10-02

    Objective: To evaluate the application of single nucleotide polymorphism (SNP)-microarray and target gene sequencing technology in the clinical molecular genetic diagnosis of unexplained intellectual disability(ID) or developmental delay (DD). Method: Patients with ID or DD were recruited in the Department of Neurology, Affiliated Children's Hospital of Capital Institute of Pediatrics between September 2015 and February 2016. The intellectual assessment of the patients was performed using 0-6-year-old pediatric examination table of neuropsychological development or Wechsler intelligence scale (>6 years). Patients with a DQ less than 49 or IQ less than 51 were included in this study. The patients were scanned by SNP-array for detection of genomic copy number variations (CNV), and the revealed genomic imbalance was confirmed by quantitative real time-PCR. Candidate gene mutation screening was carried out by target gene sequencing technology.Causal mutations or likely pathogenic variants were verified by polymerase chain reaction and direct sequencing. Result: There were 15 children with ID or DD enrolled, 9 males and 6 females. The age of these patients was 7 months-16 years and 9 months. SNP-array revealed that two of the 15 patients had genomic CNV. Both CNV were de novo micro deletions, one involved 11q24.1q25 and the other micro deletion located on 21q22.2q22.3. Both micro deletions were proved to have a clinical significance due to their association with ID, brain DD, unusual faces etc. by querying Decipher database. Thirteen patients with negative findings in SNP-array were consequently examined with target gene sequencing technology, genotype-phenotype correlation analysis and genetic analysis. Five patients were diagnosed with monogenic disorder, two were diagnosed with suspected genetic disorder and six were still negative. Conclusion: Sequential use of SNP-array and target gene sequencing technology can significantly increase the molecular genetic etiologic

  19. Localization of Daucus carota NMCP1 to the nuclear periphery: the role of the N-terminal region and an NLS-linked sequence motif, RYNLRR, in the tail domain

    Directory of Open Access Journals (Sweden)

    Yuta eKimura

    2014-02-01

    Full Text Available Recent ultrastructural studies revealed that a structure similar to the vertebrate nuclear lamina exists in the nuclei of higher plants. However, plant genomes lack genes for lamins and intermediate-type filament proteins, and this suggests that plant-specific nuclear coiled-coil proteins make up the lamina-like structure in plants. NMCP1 is a protein, first identified in Daucus carota cells, that localizes exclusively to the nuclear periphery in interphase cells. It has a tripartite structure comprised of head, rod, and tail domains, and includes putative nuclear localization signal (NLS motifs. We identified the functional NLS of DcNMCP1 (carrot NMCP1 and determined the protein regions required for localizing to the nuclear periphery using EGFP-fused constructs transiently expressed in Apium graveolens epidermal cells. Transcription was driven under a CaMV35S promoter, and the genes were introduced into the epidermal cells by a DNA-coated microprojectile delivery system. Of the NLS motifs, KRRRK and RRHK in the tail domain were highly functional for nuclear localization. Addition of the N-terminal 141 amino acids from DcNMCP1 shifted the localization of a region including these NLSs from the entire nucleus to the nuclear periphery. Using this same construct, the replacement of amino acids in RRHK or its preceding sequence, YNL, with alanine residues abolished localization to the nuclear periphery, while replacement of KRRRK did not affect localization. The sequence R/Q/HYNLRR/H, including YNL and the first part of the sequence of RRHK, is evolutionarily conserved in a subclass of NMCP1 sequences from many plant species. These results show that NMCP1 localizes to the nuclear periphery by a combined action of a sequence composed of R/Q/HYNLRR/H, NLS, and the N-terminal region including the head and a portion of the rod domain, suggesting that more than one binding site is implicated in localization of NMCP1.

  20. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    Science.gov (United States)

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  1. The MHC motif viewer: a visualization tool for MHC binding motifs

    DEFF Research Database (Denmark)

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole

    2010-01-01

    is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  2. The Role of the Y-Chromosome in the Establishment of Murine Hybrid Dysgenesis and in the Analysis of the Nucleotide Sequence Organization, Genetic Transmission and Evolution of Repeated Sequences.

    Science.gov (United States)

    Nallaseth, Ferez Soli

    The Y-chromosome presents a unique cytogenetic framework for the evolution of nucleotide sequences. Alignment of nine Y-chromosomal fragments in their increasing Y-specific/non Y-specific (male/female) sequence divergence ratios was directly and inversely related to their interspersion on these two respective genomic fractions. Sequence analysis confirmed a direct relationship between divergence ratios and the Alu, LINE-1, Satellite and their derivative oligonucleotide contents. Thus their relocation on the Y-chromosome is followed by sequence divergence rather than the well documented concerted evolution of these non-coding progenitor repeated sequences. Five of the nine Y-chromosomal fragments are non-pseudoautosomal and transcribed into heterogeneous PolyA^+ RNA and thus can be retrotransposed. Evolutionary and computer analysis identified homologous oligonucleotide tracts in several human loci suggesting common and random mechanistic origins. Dysgenic genomes represent the accelerated evolution driving sequence divergence (McClintock, 1984). Sex reversal and sterility characterizing dysgenesis occurs in C57BL/6JY ^{rm Pos} but not in 129/SvY^{rm Pos} derivative strains. High frequency, random, multi-locus deletion products of the feral Y^{ rm Pos}-chromosome are generated in the germlines of F1(C57BL/6J X 129/SvY^{ rm Pos})(male) and C57BL/6JY ^{rm Pos}(male) but not in 129/SvY^{rm Pos}(male). Equal, 10^{-1}, 10^ {-2}, and 0 copies (relative to males) of Y^{rm Pos}-specific deletion products respectively characterize C57BL/6JY ^{rm Pos} (HC), (LC), (T) and (F) females. The testes determining loci of inactive Y^{rm Pos}-chromosomes in C57BL/6JY^{rm Pos} HC females are the preferentially deleted/rearranged Y ^{rm Pos}-sequences. Disruption of regulation of plasma testosterone and hepatic MUP-A mRNA levels, TRD of a 4.7 Kbp EcoR1 fragment suggest disruption of autosomal/X-chromosomal sequences. These data and the highly repeated progenitor (Alu, GATA, LINE-1

  3. Molecular Comparison and Evolutionary Analyses of VP1 Nucleotide Sequences of New African Human Enterovirus 71 Isolates Reveal a Wide Genetic Diversity

    Science.gov (United States)

    Nougairède, Antoine; Joffret, Marie-Line; Deshpande, Jagadish M.; Dubot-Pérès, Audrey; Héraud, Jean-Michel

    2014-01-01

    Most circulating strains of Human enterovirus 71 (EV-A71) have been classified primarily into three genogroups (A to C) on the basis of genetic divergence between the 1D gene, which encodes the VP1 capsid protein. The aim of the present study was to provide further insights into the diversity of the EV-A71 genogroups following the recent description of highly divergent isolates, in particular those from African countries, including Madagascar. We classified recent EV-A71 isolates by a large comparison of 3,346 VP1 nucleotidic sequences collected from GenBank. Analysis of genetic distances and phylogenetic investigations indicated that some recently-reported isolates did not fall into the genogroups A-C and clustered into three additional genogroups, including one Indian genogroup (genogroup D) and 2 African ones (E and F). Our Bayesian phylogenetic analysis provided consistent data showing that the genogroup D isolates share a recent common ancestor with the members of genogroup E, while the isolates of genogroup F evolved from a recent common ancestor shared with the members of the genogroup B. Our results reveal the wide diversity that exists among EV-A71 isolates and suggest that the number of circulating genogroups is probably underestimated, particularly in developing countries where EV-A71 epidemiology has been poorly studied. PMID:24598878

  4. Next-Generation Sequencing Approaches in Genome-Wide Discovery of Single Nucleotide Polymorphism Markers Associated with Pungency and Disease Resistance in Pepper.

    Science.gov (United States)

    Manivannan, Abinaya; Kim, Jin-Hee; Yang, Eun-Young; Ahn, Yul-Kyun; Lee, Eun-Su; Choi, Sena; Kim, Do-Sun

    2018-01-01

    Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS) approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP) indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.

  5. Next-Generation Sequencing Approaches in Genome-Wide Discovery of Single Nucleotide Polymorphism Markers Associated with Pungency and Disease Resistance in Pepper

    Directory of Open Access Journals (Sweden)

    Abinaya Manivannan

    2018-01-01

    Full Text Available Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.

  6. The primary structure of L37--a rat ribosomal protein with a zinc finger-like motif.

    Science.gov (United States)

    Chan, Y L; Paz, V; Olvera, J; Wool, I G

    1993-04-30

    The amino acid sequence of the rat 60S ribosomal subunit protein L37 was deduced from the sequence of nucleotides in a recombinant cDNA. Ribosomal protein L37 has 96 amino acids, the NH2-terminal methionine is removed after translation of the mRNA, and has a molecular weight of 10,939. Ribosomal protein L37 has a single zinc finger-like motif of the C2-C2 type. Hybridization of the cDNA to digests of nuclear DNA suggests that there are 13 or 14 copies of the L37 gene. The mRNA for the protein is about 500 nucleotides in length. Rat L37 is related to Saccharomyces cerevisiae ribosomal protein YL35 and to Caenorhabditis elegans L37. We have identified in the data base a DNA sequence that encodes the chicken homolog of rat L37.

  7. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin; Wong, Hau-San

    2015-01-01

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  8. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun

    2015-06-11

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  9. The kissing-loop motif is a preferred site of 5' leader recombination during replication of SL3-3 murine leukemia viruses in mice

    DEFF Research Database (Denmark)

    Lund, Anders Henrik; Mikkelsen, J G; Schmidt, J

    1999-01-01

    , and the upstream part of the 5' untranslated region, enabled us to map recombination sites, guided by distinct scattered nucleotide differences. In 30 of 44 analyzed sequences, recombination was mapped to a 33-nucleotide similarity window coinciding with the kissing-loop stem-loop motif implicated in dimerization...... of the diploid genome. Interestingly, the recombination pattern preference found in replication-competent viruses from T-cell tumors is very similar to the pattern previously reported for retroviral vectors in cell culture experiments. The data therefore sustain the hypothesis that the kissing loop, presumably...

  10. RegRNA: an integrated web server for identifying regulatory RNA motifs and elements

    OpenAIRE

    Huang, Hsi-Yuan; Chien, Chia-Hung; Jen, Kuan-Hua; Huang, Hsien-Da

    2006-01-01

    Numerous regulatory structural motifs have been identified as playing essential roles in transcriptional and post-transcriptional regulation of gene expression. RegRNA is an integrated web server for identifying the homologs of regulatory RNA motifs and elements against an input mRNA sequence. Both sequence homologs and structural homologs of regulatory RNA motifs can be recognized. The regulatory RNA motifs supported in RegRNA are categorized into several classes: (i) motifs in mRNA 5′-untra...

  11. Clinical and molecular characterization of a cohort of patients with novel nucleotide alterations of the Dystrophin gene detected by direct sequencing

    Directory of Open Access Journals (Sweden)

    Corti Stefania

    2011-03-01

    Full Text Available Abstract Background Duchenne and Becker Muscular dystrophies (DMD/BMD are allelic disorders caused by mutations in the dystrophin gene, which encodes a sarcolemmal protein responsible for muscle integrity. Deletions and duplications account for approximately 75% of mutations in DMD and 85% in BMD. The implementation of techniques allowing complete gene sequencing has focused attention on small point mutations and other mechanisms underlying complex rearrangements. Methods We selected 47 patients (41 families; 35 DMD, 6 BMD without deletions and duplications in DMD gene (excluded by multiplex ligation-dependent probe amplification and multiplex polymerase chain reaction analysis. This cohort was investigated by systematic direct sequence analysis to study sequence variation. We focused our attention on rare mutational events which were further studied through transcript analysis. Results We identified 40 different nucleotide alterations in DMD gene and their clinical correlates; altogether, 16 mutations were novel. DMD probands carried 9 microinsertions/microdeletions, 19 nonsense mutations, and 7 splice-site mutations. BMD patients carried 2 nonsense mutations, 2 splice-site mutations, 1 missense substitution, and 1 single base insertion. The most frequent stop codon was TGA (n = 10 patients, followed by TAG (n = 7 and TAA (n = 4. We also analyzed the molecular mechanisms of five rare mutational events. They are two frame-shifting mutations in the DMD gene 3'end in BMD and three novel splicing defects: IVS42: c.6118-3C>A, which causes a leaky splice-site; c.9560A>G, which determines a cryptic splice-site activation and c.9564-426 T>G, which creates pseudoexon retention within IVS65. Conclusion The analysis of our patients' sample, carrying point mutations or complex rearrangements in DMD gene, contributes to the knowledge on phenotypic correlations in dystrophinopatic patients and can provide a better understanding of pre-mRNA maturation defects

  12. Comparing Enterovirus 71 with Coxsackievirus A16 by analyzing nucleotide sequences and antigenicity of recombinant proteins of VP1s and VP4s

    Directory of Open Access Journals (Sweden)

    Sun Yu

    2011-11-01

    Full Text Available Abstract Background Enterovirus 71 (EV71 and Coxsackievirus A16 (CA16 are two major etiological agents of Hand, Foot and Mouth Disease (HFMD. EV71 is associated with severe cases but not CA16. The mechanisms contributed to the different pathogenesis of these two viruses are unknown. VP1 and VP4 are two major structural proteins of these viruses, and should be paid close attention to. Results The sequences of vp1s from 14 EV71 and 14 CA16, and vp4s from 10 EV71 and 1 CA16 isolated in this study during 2007 to 2009 HFMD seasons were analyzed together with the corresponding sequences available in GenBank using DNAStar and MEGA 4.0. Phylogenetic analysis of complete vp1s or vp4s showed that EV71 isolated in Beijing belonged to C4 and CA16 belonged to lineage B2 (lineage C. VP1s and VP4s from 4 strains of viruses expressed in E. coli BL21 cells were used to detect IgM and IgG in human sera by Western Blot. The detection of IgM against VP1s of EV71 and CA16 showed consistent results with current infection, while none of the sera were positive against VP4s of EV71 and CA16. There was significant difference in the positive rates between EV71 VP1 and CA16 VP1 (χ2 = 5.02, P 2 = 15.30, P 2 = 26.47, P 2 = 16.78, P Conclusions EV71 and CA16 were highly diverse in the nucleotide sequences of vp1s and vp4s. The sera positive rates of VP1 and VP4 of EV71 were lower than those of CA16 respectively, which suggested a less exposure rate to EV71 than CA16 in Beijing population. Human serum antibodies detected by Western blot using VP1s and VP4s as antigen indicated that the immunological reaction to VP1 and VP4 of both EV71 and CA16 was different.

  13. Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L.

    Science.gov (United States)

    Allegre, Mathilde; Argout, Xavier; Boccara, Michel; Fouet, Olivier; Roguet, Yolande; Bérard, Aurélie; Thévenin, Jean Marc; Chauveau, Aurélie; Rivallan, Ronan; Clement, Didier; Courtois, Brigitte; Gramacho, Karina; Boland-Augé, Anne; Tahi, Mathias; Umaharan, Pathmanathan; Brunel, Dominique; Lanaud, Claire

    2012-01-01

    Theobroma cacao is an economically important tree of several tropical countries. Its genetic improvement is essential to provide protection against major diseases and improve chocolate quality. We discovered and mapped new expressed sequence tag-single nucleotide polymorphism (EST-SNP) and simple sequence repeat (SSR) markers and constructed a high-density genetic map. By screening 149 650 ESTs, 5246 SNPs were detected in silico, of which 1536 corresponded to genes with a putative function, while 851 had a clear polymorphic pattern across a collection of genetic resources. In addition, 409 new SSR markers were detected on the Criollo genome. Lastly, 681 new EST-SNPs and 163 new SSRs were added to the pre-existing 418 co-dominant markers to construct a large consensus genetic map. This high-density map and the set of new genetic markers identified in this study are a milestone in cocoa genomics and for marker-assisted breeding. The data are available at http://tropgenedb.cirad.fr. PMID:22210604

  14. Evolutionary and Structural Perspectives of Plant Cyclic Nucleotide Gated Cation Channels

    Directory of Open Access Journals (Sweden)

    Alice Kira Zelman

    2012-05-01

    Full Text Available Ligand-gated cation channels are a frequent component of signaling cascades in eukaryotes. Eukaryotes contain numerous diverse gene families encoding ion channels, some of which are shared and some of which are unique to particular kingdoms. Among the many different types are cyclic nucleotide-gated channels (CNGCs. CNGCs are cation channels with varying degrees of ion conduction selectivity. They are implicated in numerous signaling pathways and permit diffusion of divalent and monovalent cations, including Ca2+ and K+. CNGCs are present in both plant and animal cells, typically in the plasma membrane; recent studies have also documented their presence in prokaryotes. All eukaryote CNGC polypeptides have a cyclic nucleotide binding domain (CNBD and a calmodulin binding domain (CaMBD as well as a 6 transmembrane/1 pore tertiary structure. This review summarizes existing knowledge about the functional domains present in these cation-conducting channels, and considers the evidence indicating that plant and animal CNGCs evolved separately. Additionally, an amino acid motif that is only found in the phosphate binding cassette and hinge regions of plant CNGCs, and is present in all experimentally confirmed CNGCs but no other channels was identified. This CNGC-specific amino acid motif provides an additional diagnostic tool to identify plant CNGCs, and can increase confidence in the annotation of open reading frames in newly sequenced genomes as putative CNGCs. Conversely, the absence of the motif in some plant sequences currently identified as probable CNGCs may suggest that they are misannotated or protein fragments.

  15. Evolutionary and structural perspectives of plant cyclic nucleotide-gated cation channels

    KAUST Repository

    Zelman, Alice K.

    2012-05-29

    Ligand-gated cation channels are a frequent component of signaling cascades in eukaryotes. Eukaryotes contain numerous diverse gene families encoding ion channels, some of which are shared and some of which are unique to particular kingdoms. Among the many different types are cyclic nucleotide-gated channels (CNGCs). CNGCs are cation channels with varying degrees of ion conduction selectivity. They are implicated in numerous signaling pathways and permit diffusion of divalent and monovalent cations, including Ca2+ and K+. CNGCs are present in both plant and animal cells, typically in the plasma membrane; recent studies have also documented their presence in prokaryotes. All eukaryote CNGC polypeptides have a cyclic nucleotide-binding domain and a calmodulin binding domain as well as a six transmembrane/one pore tertiary structure. This review summarizes existing knowledge about the functional domains present in these cation-conducting channels, and considers the evidence indicating that plant and animal CNGCs evolved separately. Additionally, an amino acid motif that is only found in the phosphate binding cassette and hinge regions of plant CNGCs, and is present in all experimentally confirmed CNGCs but no other channels was identified. This CNGC-specific amino acid motif provides an additional diagnostic tool to identify plant CNGCs, and can increase confidence in the annotation of open reading frames in newly sequenced genomes as putative CNGCs. Conversely, the absence of the motif in some plant sequences currently identified as probable CNGCs may suggest that they are misannotated or protein fragments. 2012 Zelman, Dawe, Gehring and Berkowitz.

  16. DNA mutation motifs in the genes associated with inherited diseases.

    Directory of Open Access Journals (Sweden)

    Michal Růžička

    Full Text Available Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected by chance are termed coldspots. Mutation hotspots are usually derived from a mutation spectrum, which reflects particular population where an effect of a common ancestor plays a role. To detect coldspots/hotspots unaffected by population bias, we analysed the presence of germline mutations obtained from HGMD database in the 5-nucleotide segments repeatedly occurring in genes associated with common inherited disorders, in particular, the PAH, LDLR, CFTR, F8, and F9 genes. Statistically significant sequences (mutational motifs rarely associated with mutations (coldspots and frequently associated with mutations (hotspots exhibited characteristic sequence patterns, e.g. coldspots contained purine tract while hotspots showed alternating purine-pyrimidine bases, often with the presence of CpG dinucleotide. Using molecular dynamics simulations and free energy calculations, we analysed the global bending properties of two selected coldspots and two hotspots with a G/T mismatch. We observed that the coldspots were inherently more flexible than the hotspots. We assume that this property might be critical for effective mismatch repair as DNA with a mutation recognized by MutSα protein is noticeably bent.

  17. Complete nucleotide sequence and genome structure of a Japanese isolate of hibiscus latent Fort Pierce virus, a unique tobamovirus that contains an internal poly(A) region in its 3' end.

    Science.gov (United States)

    Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou

    2014-11-01

    In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.

  18. Single Nucleotide Polymorphism

    DEFF Research Database (Denmark)

    Børsting, Claus; Pereira, Vania; Andersen, Jeppe Dyrberg

    2014-01-01

    Single nucleotide polymorphisms (SNPs) are the most frequent DNA sequence variations in the genome. They have been studied extensively in the last decade with various purposes in mind. In this chapter, we will discuss the advantages and disadvantages of using SNPs for human identification...... of SNPs. This will allow acquisition of more information from the sample materials and open up for new possibilities as well as new challenges....

  19. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

    Science.gov (United States)

    Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

    2012-01-01

    To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.

  20. Sequential immunization with V3 peptides from primary human immunodeficiency virus type 1 produces cross-neutralizing antibodies against primary isolates with a matching narrow-neutralization sequence motif.

    Science.gov (United States)

    Eda, Yasuyuki; Takizawa, Mari; Murakami, Toshio; Maeda, Hiroaki; Kimachi, Kazuhiko; Yonemura, Hiroshi; Koyanagi, Satoshi; Shiosaki, Kouichi; Higuchi, Hirofumi; Makizumi, Keiichi; Nakashima, Toshihiro; Osatomi, Kiyoshi; Tokiyoshi, Sachio; Matsushita, Shuzo; Yamamoto, Naoki; Honda, Mitsuo

    2006-06-01

    An antibody response capable of neutralizing not only homologous but also heterologous forms of the CXCR4-tropic human immunodeficiency virus type 1 (HIV-1) MNp and CCR5-tropic primary isolate HIV-1 JR-CSF was achieved through sequential immunization with a combination of synthetic peptides representing HIV-1 Env V3 sequences from field and laboratory HIV-1 clade B isolates. In contrast, repeated immunization with a single V3 peptide generated antibodies that neutralized only type-specific laboratory-adapted homologous viruses. To determine whether the cross-neutralization response could be attributed to a cross-reactive antibody in the immunized animals, we isolated a monoclonal antibody, C25, which neutralized the heterologous primary viruses of HIV-1 clade B. Furthermore, we generated a humanized monoclonal antibody, KD-247, by transferring the genes of the complementary determining region of C25 into genes of the human V region of the antibody. KD-247 bound with high affinity to the "PGR" motif within the HIV-1 Env V3 tip region, and, among the established reference antibodies, it most effectively neutralized primary HIV-1 field isolates possessing the matching neutralization sequence motif, suggesting its promise for clinical applications involving passive immunizations. These results demonstrate that sequential immunization with B-cell epitope peptides may contribute to a humoral immune-based HIV vaccine strategy. Indeed, they help lay the groundwork for the development of HIV-1 vaccine strategies that use sequential immunization with biologically relevant peptides to overcome difficulties associated with otherwise poorly immunogenic epitopes.

  1. Determination of the complete nucleotide sequence of a lupine potyvirus isolate from the Czech Republic reveals that it belongs to a new member of the genus Potyvirus

    Czech Academy of Sciences Publication Activity Database

    Sarkisova, Tatiana; Petrzik, Karel

    2011-01-01

    Roč. 156, č. 1 (2011), s. 167-169 ISSN 0304-8608 R&D Projects: GA MZe QH71145 Institutional research plan: CEZ:AV0Z50510513 Keywords : plants * virus * motif Subject RIV: EE - Microbiology, Virology Impact factor: 2.111, year: 2011

  2. MotifMark: Finding Regulatory Motifs in DNA Sequences

    OpenAIRE

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L.; Wang, May D.

    2017-01-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity be...

  3. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  4. DNA sequence polymorphisms within the bovine guanine nucleotide-binding protein Gs subunit alpha (Gsα-encoding (GNAS genomic imprinting domain are associated with performance traits

    Directory of Open Access Journals (Sweden)

    Mullen Michael P

    2011-01-01

    Full Text Available Abstract Background Genes which are epigenetically regulated via genomic imprinting can be potential targets for artificial selection during animal breeding. Indeed, imprinted loci have been shown to underlie some important quantitative traits in domestic mammals, most notably muscle mass and fat deposition. In this candidate gene study, we have identified novel associations between six validated single nucleotide polymorphisms (SNPs spanning a 97.6 kb region within the bovine guanine nucleotide-binding protein Gs subunit alpha gene (GNAS domain on bovine chromosome 13 and genetic merit for a range of performance traits in 848 progeny-tested Holstein-Friesian sires. The mammalian GNAS domain consists of a number of reciprocally-imprinted, alternatively-spliced genes which can play a major role in growth, development and disease in mice and humans. Based on the current annotation of the bovine GNAS domain, four of the SNPs analysed (rs43101491, rs43101493, rs43101485 and rs43101486 were located upstream of the GNAS gene, while one SNP (rs41694646 was located in the second intron of the GNAS gene. The final SNP (rs41694656 was located in the first exon of transcripts encoding the putative bovine neuroendocrine-specific protein NESP55, resulting in an aspartic acid-to-asparagine amino acid substitution at amino acid position 192. Results SNP genotype-phenotype association analyses indicate that the single intronic GNAS SNP (rs41694646 is associated (P ≤ 0.05 with a range of performance traits including milk yield, milk protein yield, the content of fat and protein in milk, culled cow carcass weight and progeny carcass conformation, measures of animal body size, direct calving difficulty (i.e. difficulty in calving due to the size of the calf and gestation length. Association (P ≤ 0.01 with direct calving difficulty (i.e. due to calf size and maternal calving difficulty (i.e. due to the maternal pelvic width size was also observed at the rs

  5. Recoding method that removes inhibitory sequences and improves HIV gene expression

    Energy Technology Data Exchange (ETDEWEB)

    Rabadan, Raul; Krasnitz, Michael; Robins, Harlan; Witten, Daniela; Levine, Arnold

    2016-08-23

    The invention relates to inhibitory nucleotide signal sequences or "INS" sequences in the genomes of lentiviruses. In particular the invention relates to the AGG motif present in all viral genomes. The AGG motif may have an inhibitory effect on a virus, for example by reducing the levels of, or maintaining low steady-state levels of, viral RNAs in host cells, and inducing and/or maintaining in viral latency. In one aspect, the invention provides vaccines that contain, or are produced from, viral nucleic acids in which the AGG sequences have been mutated. In another aspect, the invention provides methods and compositions for affecting the function of the AGG motif, and methods for identifying other INS sequences in viral genomes.

  6. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    Science.gov (United States)

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  7. Deciphering functional glycosaminoglycan motifs in development.

    Science.gov (United States)

    Townley, Robert A; Bülow, Hannes E

    2018-03-23

    Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Import of desired nucleic acid sequences using addressing motif of mitochondrial ribosomal 5S-rRNA for fluorescent in vivo hybridization of mitochondrial DNA and RNA.

    Science.gov (United States)

    Zelenka, Jaroslav; Alán, Lukáš; Jabůrek, Martin; Ježek, Petr

    2014-04-01

    Based on the matrix-addressing sequence of mitochondrial ribosomal 5S-rRNA (termed MAM), which is naturally imported into mitochondria, we have constructed an import system for in vivo targeting of mitochondrial DNA (mtDNA) or mt-mRNA, in order to provide fluorescence hybridization of the desired sequences. Thus DNA oligonucleotides were constructed, containing the 5'-flanked T7 RNA polymerase promoter. After in vitro transcription and fluorescent labeling with Alexa Fluor(®) 488 or 647 dye, we obtained the fluorescent "L-ND5 probe" containing MAM and exemplar cargo, i.e., annealing sequence to a short portion of ND5 mRNA and to the light-strand mtDNA complementary to the heavy strand nd5 mt gene (5'-end 21 base pair sequence). For mitochondrial in vivo fluorescent hybridization, HepG2 cells were treated with dequalinium micelles, containing the fluorescent probes, bringing the probes proximally to the mitochondrial outer membrane and to the natural import system. A verification of import into the mitochondrial matrix of cultured HepG2 cells was provided by confocal microscopy colocalizations. Transfections using lipofectamine or probes without 5S-rRNA addressing MAM sequence or with MAM only were ineffective. Alternatively, the same DNA oligonucleotides with 5'-CACC overhang (substituting T7 promoter) were transcribed from the tetracycline-inducible pENTRH1/TO vector in human embryonic kidney T-REx®-293 cells, while mitochondrial matrix localization after import of the resulting unlabeled RNA was detected by PCR. The MAM-containing probe was then enriched by three-order of magnitude over the natural ND5 mRNA in the mitochondrial matrix. In conclusion, we present a proof-of-principle for mitochondrial in vivo hybridization and mitochondrial nucleic acid import.

  9. BayesMD: flexible biological modeling for motif discovery

    DEFF Research Database (Denmark)

    Tang, Man-Hung Eric; Krogh, Anders; Winther, Ole

    2008-01-01

    We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on trans......We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained...

  10. Design of character-based DNA barcode motif for species identification: A computational approach and its validation in fishes.

    Science.gov (United States)

    Chakraborty, Mohua; Dhar, Bishal; Ghosh, Sankar Kumar

    2017-11-01

    The DNA barcodes are generally interpreted using distance-based and character-based methods. The former uses clustering of comparable groups, based on the relative genetic distance, while the latter is based on the presence or absence of discrete nucleotide substitutions. The distance-based approach has a limitation in defining a universal species boundary across the taxa as the rate of mtDNA evolution is not constant throughout the taxa. However, character-based approach more accurately defines this using a unique set of nucleotide characters. The character-based analysis of full-length barcode has some inherent limitations, like sequencing of the full-length barcode, use of a sparse-data matrix and lack of a uniform diagnostic position for each group. A short continuous stretch of a fragment can be used to resolve the limitations. Here, we observe that a 154-bp fragment, from the transversion-rich domain of 1367 COI barcode sequences can successfully delimit species in the three most diverse orders of freshwater fishes. This fragment is used to design species-specific barcode motifs for 109 species by the character-based method, which successfully identifies the correct species using a pattern-matching program. The motifs also correctly identify geographically isolated population of the Cypriniformes species. Further, this region is validated as a species-specific mini-barcode for freshwater fishes by successful PCR amplification and sequencing of the motif (154 bp) using the designed primers. We anticipate that use of such motifs will enhance the diagnostic power of DNA barcode, and the mini-barcode approach will greatly benefit the field-based system of rapid species identification. © 2017 John Wiley & Sons Ltd.

  11. Sequence-based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families.

    Directory of Open Access Journals (Sweden)

    Janine Maimanakos

    2016-08-01

    Full Text Available Arylmalonate-Decarboxylases (AMDases, EC 4.1.1.76 are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta- and Gammaproteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the TTT family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99% of the (R-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes.

  12. The arabidopsis cyclic nucleotide interactome

    KAUST Repository

    Donaldson, Lara Elizabeth

    2016-05-11

    Background Cyclic nucleotides have been shown to play important signaling roles in many physiological processes in plants including photosynthesis and defence. Despite this, little is known about cyclic nucleotide-dependent signaling mechanisms in plants since the downstream target proteins remain unknown. This is largely due to the fact that bioinformatics searches fail to identify plant homologs of protein kinases and phosphodiesterases that are the main targets of cyclic nucleotides in animals. Methods An affinity purification technique was used to identify cyclic nucleotide binding proteins in Arabidopsis thaliana. The identified proteins were subjected to a computational analysis that included a sequence, transcriptional co-expression and functional annotation analysis in order to assess their potential role in plant cyclic nucleotide signaling. Results A total of twelve cyclic nucleotide binding proteins were identified experimentally including key enzymes in the Calvin cycle and photorespiration pathway. Importantly, eight of the twelve proteins were shown to contain putative cyclic nucleotide binding domains. Moreover, the identified proteins are post-translationally modified by nitric oxide, transcriptionally co-expressed and annotated to function in hydrogen peroxide signaling and the defence response. The activity of one of these proteins, GLYGOLATE OXIDASE 1, a photorespiratory enzyme that produces hydrogen peroxide in response to Pseudomonas, was shown to be repressed by a combination of cGMP and nitric oxide treatment. Conclusions We propose that the identified proteins function together as points of cross-talk between cyclic nucleotide, nitric oxide and reactive oxygen species signaling during the defence response.

  13. Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

    KAUST Repository

    Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian; Huser, Raphaë l; Ali, Amal J.; Merzaban, Jasmeen; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T.

    2018-01-01

    and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter

  14. Armadillo motifs involved in vesicular transport.

    Directory of Open Access Journals (Sweden)

    Harald Striegl

    Full Text Available Armadillo (ARM repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family.

  15. Thermal Stability of Modified i-Motif Oligonucleotides with Naphthalimide Intercalating Nucleic Acids

    DEFF Research Database (Denmark)

    El-Sayed, Ahmed Ali; Pedersen, Erik B.; Khaireldin, Nahid Y.

    2016-01-01

    In continuation of our investigation of characteristics and thermodynamic properties of the i-motif 5′-d[(CCCTAA)3CCCT)] upon insertion of intercalating nucleotides into the cytosine-rich oligonucleotide, this article evaluates the stabilities of i-motif oligonucleotides upon insertion of naphtha......In continuation of our investigation of characteristics and thermodynamic properties of the i-motif 5′-d[(CCCTAA)3CCCT)] upon insertion of intercalating nucleotides into the cytosine-rich oligonucleotide, this article evaluates the stabilities of i-motif oligonucleotides upon insertion...... of naphthalimide (1H-benzo[de]isoquinoline-1,3(2H)-dione) as the intercalating nucleic acid. The stabilities of i-motif structures with inserted naphthalimide intercalating nucleotides were studied using UV melting temperatures (Tm) and circular dichroism spectra at different pH values and conditions (crowding...

  16. An efficient identification strategy of clonal tea cultivars using long-core motif SSR markers.

    Science.gov (United States)

    Wang, Rang Jian; Gao, Xiang Feng; Kong, Xiang Rui; Yang, Jun

    2016-01-01

    Microsatellites, or simple sequence repeats (SSRs), especially those with long-core motifs (tri-, tetra-, penta-, and hexa-nucleotide) represent an excellent tool for DNA fingerprinting. SSRs with long-core motifs are preferred since neighbor alleles are more easily separated and identified from each other, which render the interpretation of electropherograms and the true alleles more reliable. In the present work, with the purpose of characterizing a set of core SSR markers with long-core motifs for well fingerprinting clonal cultivars of tea (Camellia sinensis), we analyzed 66 elite clonal tea cultivars in China with 33 initially-chosen long-core motif SSR markers covering all the 15 linkage groups of tea plant genome. A set of 6 SSR markers were conclusively selected as core SSR markers after further selection. The polymorphic information content (PIC) of the core SSR markers was >0.5, with ≤5 alleles in each marker containing 10 or fewer genotypes. Phylogenetic analysis revealed that the core SSR markers were not strongly correlated with the trait 'cultivar processing-property'. The combined probability of identity (PID) between two random cultivars for the whole set of 6 SSR markers was estimated to be 2.22 × 10(-5), which was quite low, confirmed the usefulness of the proposed SSR markers for fingerprinting analyses in Camellia sinensis. Moreover, for the sake of quickly discriminating the clonal tea cultivars, a cultivar identification diagram (CID) was subsequently established using these core markers, which fully reflected the identification process and provided the immediate information about which SSR markers were needed to identify a cultivar chosen among the tested ones. The results suggested that long-core motif SSR markers used in the investigation contributed to the accurate and efficient identification of the clonal tea cultivars and enabled the protection of intellectual property.

  17. Probing structural changes of self assembled i-motif DNA

    KAUST Repository

    Lee, Iljoon; Patil, Sachin; Fhayli, Karim; Alsaiari, Shahad K.; Khashab, Niveen M.

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  18. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  19. Quantitative statistical analysis of cis-regulatory sequences in ABA/VP1- and CBF/DREB1-regulated genes of Arabidopsis.

    Science.gov (United States)

    Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R

    2005-09-01

    We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.

  20. Gene Isolation Using Degenerate Primers Targeting Protein Motif: A Laboratory Exercise

    Science.gov (United States)

    Yeo, Brandon Pei Hui; Foong, Lian Chee; Tam, Sheh May; Lee, Vivian; Hwang, Siaw San

    2018-01-01

    Structures and functions of protein motifs are widely included in many biology-based course syllabi. However, little emphasis is placed to link this knowledge to applications in biotechnology to enhance the learning experience. Here, the conserved motifs of nucleotide binding site-leucine rich repeats (NBS-LRR) proteins, successfully used for the…

  1. Multiple POU-binding motifs, recognized by tissue-specific nuclear factors, are important for Dll1 gene expression in neural stem cells

    International Nuclear Information System (INIS)

    Nakayama, Kohzo; Nagase, Kazuko; Tokutake, Yuriko; Koh, Chang-Sung; Hiratochi, Masahiro; Ohkawara, Takeshi; Nakayama, Noriko

    2004-01-01

    We cloned the 5'-flanking region of the mouse homolog of the Delta gene (Dll1) and demonstrated that the sequence between nucleotide position -514 and -484 in the 5'-flanking region of Dll1 played a critical role in the regulation of its tissue-specific expression in neural stem cells (NSCs). Further, we showed that multiple POU-binding motifs, located within this short sequence of 30 bp, were essential for transcriptional activation of Dll1 and also that multiple tissue-specific nuclear factors recognized these POU-binding motifs in various combinations through differentiation of NSCs. Thus, POU-binding factors may play an important role in Dll1 expression in developing NSCs

  2. Protein Chaperones Q8ZP25_SALTY from Salmonella Typhimurium and HYAE_ECOLI from Escherichia coli Exhibit Thioredoxin-like Structures Despite Lack of Canonical Thioredoxin Active Site Sequence Motif

    Energy Technology Data Exchange (ETDEWEB)

    Parish, D.; Benach, J; Liu, G; Singarapu, K; Xiao, R; Acton, T; Hunt, J; Montelione, G; Szyperski, T; et. al.

    2008-01-01

    The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe) hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  3. Protein chaperones Q8ZP25_SALTY from Salmonella typhimurium and HYAE_ECOLI from Escherichia coli exhibit thioredoxin-like structures despite lack of canonical thioredoxin active site sequence motif.

    Science.gov (United States)

    Parish, David; Benach, Jordi; Liu, Goahua; Singarapu, Kiran Kumar; Xiao, Rong; Acton, Thomas; Su, Min; Bansal, Sonal; Prestegard, James H; Hunt, John; Montelione, Gaetano T; Szyperski, Thomas

    2008-12-01

    The structure of the 142-residue protein Q8ZP25_SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE_ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE_ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE_ECOLI was previously classified as a [NiFe] hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  4. Nucleotide sequences of the cDNAs encoding the V-regions of H- and L-chains of a human monoclonal antibody with broad reactivity to malignant tumor cells

    Energy Technology Data Exchange (ETDEWEB)

    Kishimoto, Toshimitsu; Okajima, Hideki; Okumoto, Takeki [Yoshitomi Pharmaceutical Industries, Ltd., Saitama (Japan); Taniguchi, Masaru [Chiba Univ. (Japan)

    1989-06-12

    The human monoclonal antibody secreted from 4G12 hybridoma cells has broad reactivity to malignant tumor cells, especially for lung squamous cell carcinomas, and recognizes a new tumor-associated and differentiation antigen. The antigen detected by 4G12 is a glycoprotein with MW 195,000 and MW 65,000 under nonreducing and reducing conditions, respectively. Screening of a 4G12 {lambda}gt10 cDNA library with constant region probes for human immunoglobulin yielded full length clones for H- and L-chains. Nucleotide sequences revealed that subtypes of the variable regions were V{sub HIII} and {lambda}{sub 1}, respectively.

  5. Motif decomposition of the phosphotyrosine proteome reveals a new N-terminal binding motif for SHIP2

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Hanke, S.; Hinsby, A. M.

    2008-01-01

    set of 481 unique phosphotyrosine (Tyr(P)) peptides by sequence similarity to known ligands of the Src homology 2 (SH2) and the phosphotyrosine binding (PTB) domains. From 20 clusters we extracted 16 known and four new interaction motifs. Using quantitative mass spectrometry we pulled down Tyr......(P)-specific binding partners for peptides corresponding to the extracted motifs. We confirmed numerous previously known interaction motifs and found 15 new interactions mediated by phosphosites not previously known to bind SH2 or PTB. Remarkably, a novel hydrophobic N-terminal motif ((L/V/I)(L/V/I)pY) was identified...

  6. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    . Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif....... A special viewing feature, MHC fight, allows for display of the specificity of two different MHC molecules side by side. We show how the web server can be used to discover and display surprising similarities as well as differences between MHC molecules within and between different species. The MHC motif...

  7. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    Directory of Open Access Journals (Sweden)

    Bolton Michael J

    2011-11-01

    Full Text Available Abstract Background The HIV surface glycoprotein gp120 (SU, gp120 and the Plasmodium vivax Duffy binding protein (PvDBP bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM. Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infection of erythrocytes and DBP binding to the Duffy Antigen Receptor for Chemokines (DARC. A peptide including the HBM of PvDBP had similar affinity for heparin as RANTES and V3 loop peptides, and could be specifically inhibited from heparin binding by the same polyanions that inhibit DBP binding to DARC. However, some V3 peptides can competitively inhibit RANTES binding to heparin, but not the PvDBP HBM peptide. Three other members of the DBP family have an HBM sequence that is necessary for erythrocyte binding, however only the protein which binds to DARC, the P. knowlesi alpha protein, is inhibited by heparin from binding to erythrocytes. Heparitinase digestion does not affect the binding of DBP to erythrocytes. Conclusion The HBMs of DBPs that bind to DARC have similar heparin binding affinities as some V3 loop peptides and chemokines, are responsible for specific sulfated polysaccharide inhibition of parasite binding and invasion of red blood cells, and are more likely to bind to negative charges on the receptor than cell surface glycosaminoglycans.

  8. Methods and statistics for combining motif match scores.

    Science.gov (United States)

    Bailey, T L; Gribskov, M

    1998-01-01

    Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

  9. Nucleotide sequence of a cDNA coding for the amino-terminal region of human prepro. alpha. 1(III) collagen

    Energy Technology Data Exchange (ETDEWEB)

    Toman, P D; Ricca, G A [Rorer Biotechnology, Inc., Springfield, VA (USA); de Crombrugghe, B [National Institutes of Health, Bethesda, MD (USA)

    1988-07-25

    Type III Collagen is synthesized in a variety of tissues as a precursor macromolecule containing a leader sequence, a N-propeptide, a N-telopeptide, the triple helical region, a C-telopeptide, and C-propeptide. To further characterize the human type III collagen precursor, a human placental cDNA library was constructed in gt11 using an oligonucleotide derived from a partial cDNA sequence corresponding to the carboxy-terminal part of the 1(III) collagen. A cDNA was identified which contains the leader sequence, the N-propeptide and N-telopeptide regions. The DNA sequence of these regions are presented here. The triple helical, C-telopeptide and C-propeptide amino acid sequence for human type III collagen has been determined previously. A comparison of the human amino acid sequence with mouse, chicken, and calf sequence shows 81%, 81%, and 92% similarity, respectively. At the DNA level, the sequence similarity between human and mouse or chicken type III collagen sequences in this area is 82% and 77%, respectively.

  10. [Personal motif in art].

    Science.gov (United States)

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  11. Direct AUC optimization of regulatory motifs.

    Science.gov (United States)

    Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

    2017-07-15

    The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. Partial nucleotide sequences, and routine typing by polymerase chain reaction-restriction fragment length polymorphism, of the brown trout (Salmo trutta) lactate dehydrogenase, LDH-C1*90 and *100 alleles.

    Science.gov (United States)

    McMeel, O M; Hoey, E M; Ferguson, A

    2001-01-01

    The cDNA nucleotide sequences of the lactate dehydrogenase alleles LDH-C1*90 and *100 of brown trout (Salmo trutta) were found to differ at position 308 where an A is present in the *100 allele but a G is present in the *90 allele. This base substitution results in an amino acid change from aspartic acid at position 82 in the LDH-C1 100 allozyme to a glycine in the 90 allozyme. Since aspartic acid has a net negative charge whilst glycine is uncharged, this is consistent with the electrophoretic observation that the LDH-C1 100 allozyme has a more anodal mobility relative to the LDH-C1 90 allozyme. Based on alignment of the cDNA sequence with the mouse genomic sequence, a local primer set was designed, incorporating the variable position, and was found to give very good amplification with brown trout genomic DNA. Sequencing of this fragment confirmed the difference in both homozygous and heterozygous individuals. Digestion of the polymerase chain reaction products with BslI, a restriction enzyme specific for the site difference, gave one, two and three fragments for the two homozygotes and the heterozygote, respectively, following electrophoretic separation. This provides a DNA-based means of routine screening of the highly informative LDH-C1* polymorphism in brown trout population genetic studies. Primer sets presented could be used to sequence cDNA of other LDH* genes of brown trout and other species.

  13. A 19-nucleotide insertion in the leader sequence of avian leukosis virus subgroup J contributes to its replication in vitro but is not related to its pathogenicity in vivo.

    Directory of Open Access Journals (Sweden)

    Xiaolin Ji

    Full Text Available Subgroup J avian leukosis virus (ALV-J was first isolated from meat-type chickens that had developed myeloid leukosis and since 2008, ALV-J infections in chickens have become widespread in China. A comparison of the sequence of ALV-J epidemic isolates with HPRS-103, the ALV-J prototype virus, revealed several distinct features, one of which is a 19-nucleotide (nt insertion in the leader sequence. To determine the role of the 19-nt insertion in ALV-J pathogenicity, a pair of viruses were constructed and rescued. The first virus was an ALV-J Chinese isolate (designated rSD1009 containing the 19-nt insertion in its leader sequence. The second virus was a clone, in which the leader sequence had a deleted 19-nt sequence (designated rSD1009△19. Compared with rSD1009△19, rSD1009 displayed a moderate growth advantage in vitro. However, no differences were demonstrated in either viral replication or oncogenicity between the two rescued viruses in chickens. These results indicated that the 19-nt insertion contributed to ALV-J replication in vitro but was not related to its pathogenicity in vivo.

  14. Evolutionary history of Phakopsora pachyrhizi (the Asian soybean rust in Brazil based on nucleotide sequences of the internal transcribed spacer region of the nuclear ribosomal DNA

    Directory of Open Access Journals (Sweden)

    Maíra C. M. Freire

    2008-01-01

    Full Text Available Phakopsora pachyrhizi has dispersed globally and brought severe economic losses to soybean growers. The fungus has been established in Brazil since 2002 and is found nationwide. To gather information on the temporal and spatial patterns of genetic variation in P. pachyrhizi , we sequenced the nuclear internal transcribed spacer regions (ITS1 and ITS2. Total genomic DNA was extracted using either lyophilized urediniospores or lesions removed from infected leaves sampled from 26 soybean fields in Brazil and one field in South Africa. Cloning prior to sequencing was necessary because direct sequencing of PCR amplicons gave partially unreadable electrophoretograms with peak displacements suggestive of multiple sequences with length polymorphism. Sequences were determined from four clones per field. ITS sequences from African or Asian isolates available from the GenBank were included in the analyses. Independent sequence alignments of the ITS1 and ITS2 datasets identified 27 and 19 ribotypes, respectively. Molecular phylogeographic analyses revealed that ribotypes of widespread distribution in Brazil displayed characteristics of ancestrality and were shared with Africa and Asia, while ribotypes of rare occurrence in Brazil were indigenous. The results suggest P. pachyrhizi found in Brazil as originating from multiple, independent long-distance dispersal events.

  15. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-01

    LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  16. DMINDA: an integrated web server for DNA motif identification and analyses.

    Science.gov (United States)

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-07-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Peptide and nucleotide sequences of rat CD4 (W3/25) antigen: evidence for derivation from a structure with four immunoglobulin-related domains

    International Nuclear Information System (INIS)

    Clark, S.J.; Jefferies, W.A.; Barclay, A.N.; Gagnon, J.; Williams, A.F.

    1987-01-01

    The rat W3/25 antigen was the first marker antigen of helper T lymphocytes to be identified. Subsequently, the human OKT4 antigen (now called CD4) was described, and cell distribution and functional data suggested that W3/25 and OKT4 antigens were homologous. This is now confirmed by the matching of peptide sequences from W3/25 antigen with sequence predicted from rat cDNA clones detected by cross-hybridization with a cDNA probe for human CD4. Analysis of the two sequences suggests an evolutionary origin from a structure with four immunoglobulin-related domains, although only domain 1 at the NH 2 terminus meets the standard criteria for an immunoglobulin-related sequence. CD4 domains 2 and 4 contain disulfide bonds but seem like truncated immunoglobulin domains, whereas domain 3 may have a pattern of β-strands like an immunoglobulin variable domain, but without the disulfide bond

  18. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

    Science.gov (United States)

    Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

    2015-06-01

    Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. DNA motif alignment by evolving a population of Markov chains.

    Science.gov (United States)

    Bi, Chengpeng

    2009-01-30

    Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

  20. NUCLEOTIDES IN INFANT FEEDING

    Directory of Open Access Journals (Sweden)

    L.G. Mamonova

    2007-01-01

    Full Text Available The article reviews the application of nucleotides-metabolites, playing a key role in many biological processes, for the infant feeding. The researcher provides the date on the nucleotides in the women's milk according to the lactation stages. She also analyzes the foreign experience in feeding newborns with nucleotides-containing milk formulas. The article gives a comparison of nucleotides in the adapted formulas represented in the domestic market of the given products.Key words: children, feeding, nucleotides.

  1. Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections

    Directory of Open Access Journals (Sweden)

    Saliha Hammoumi

    2016-09-01

    Full Text Available Koi herpesvirus disease (KHVD is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3, also known as koi herpesvirus (KHV. Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984 as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×107. The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity. By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3.

  2. Nucleotide sequence of the hexA gene for DNA mismatch repair in Streptococcus pneumoniae and homology of hexA to mutS of Escherichia coli and Salmonella typhimurium

    International Nuclear Information System (INIS)

    Priebe, S.D.; Hadi, S.M.; Greenberg, B.; Lacks, S.A.

    1988-01-01

    The Hex system of heteroduplex DNA base mismatch repair operates in Streptococcus pneumoniae after transformation and replication to correct donor and nascent DNA strands, respectively. A functionally similar system, called Mut, operates in Escherichia coli and Salmonella typhimurium. The nucleotide sequence of a 3.8-kilobase segment from the S. pneumoniae chromosome that includes the 2.7-kilobase hexA gene was determined. Chromosomal DNA used as donor to measure Hex phenotype was irradiated with UV light. An open reading frame that could encode a 17-kilodalton polypeptide (OrfC) was located just upstream of the gene encoding a polypeptide of 95 kilodaltons corresponding to HexA. Shine-Dalgarno sequences and putative promoters were identified upstream of each protein start site. Insertion mutations showed that only HexA functioned in mismatch repair and that the promoter for hexA transcription was located within the OrfC-coding region. The HexA polypeptide contains a consensus sequence for ATP- or GTP-binding sites in proteins. Comparison of the entire HexA protein sequence to that of MutS of S. typhimurium, showed the proteins to be homologous, inasmuch as 36% of their amino acid residues were identical. This homology indicates that the Hex and Mut systems of mismatch repair evolved from an ancestor common to the gram-positive streptococci and the gram-negative enterobacteria. It is the first direct evidence linking the two systems

  3. Complete nucleotide sequence and organization of the mitogenome of the silk moth Caligula boisduvalii (Lepidoptera: Saturniidae) and comparison with other lepidopteran insects.

    Science.gov (United States)

    Hong, Mee Yeon; Lee, Eun Mee; Jo, Yong Hun; Park, Hae Chul; Kim, Seong Ryul; Hwang, Jae Sam; Jin, Byung Rae; Kang, Pil Don; Kim, Ki-Gyoung; Han, Yeon Soo; Kim, Iksoo

    2008-04-30

    The 15,360-bp long complete mitogenome of Caligula boisduvalii possesses a gene arrangement and content identical to other completely sequenced lepidopteran mitogenomes, but different from the common arrangement found in most insect order, as the result of the movement of tRNA(Met) to a position 5'-upstream of tRNA Ile. The 330-bp A+T-rich region is apparently capable of forming a stem-and-loop structure, which harbors the conserved flanking sequences at both ends. Dissimilar to what has been seen in other sequenced lepidopteran insects, the initiation codon for C. boisduvalii COI appears to be TTG, which is a rare, but apparently possible initiation codon. The ATP8, ATP6, ND4L, and ND6 genes, which neighbor another PCG at their 3' end, all harbored potential sequences for the formation of a hairpin structure. This is suggestive of the importance of such structures for the precise cleavage of the mRNA of mature PCGs. Phylogenetic analyses of available sequenced species of Bombycoidea, Pyraloidea, and Tortricidea supported the morphology-based current hypothesis that Bombycoidea and Pyraloidea are monophyletic (Obtectomera). As previously suggested, Bombycidae (Bombyx mori and B. mandarina) and Saturniidae (Antheraea pernyi and C. boisduvalii) formed a reciprocal monophyletic group.

  4. Motif enrichment tool.

    Science.gov (United States)

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. RNA motif search with data-driven element ordering.

    Science.gov (United States)

    Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

    2016-05-18

    In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .

  6. The soybean-Phytophthora resistance locus Rps1-k encompasses coiled coil-nucleotide binding-leucine rich repeat-like genes and repetitive sequences

    Directory of Open Access Journals (Sweden)

    Bhattacharyya Madan K

    2008-03-01

    Full Text Available Abstract Background A series of Rps (resistance to Pytophthora sojae genes have been protecting soybean from the root and stem rot disease caused by the Oomycete pathogen, Phytophthora sojae. Five Rps genes were mapped to the Rps1 locus located near the 28 cM map position on molecular linkage group N of the composite genetic soybean map. Among these five genes, Rps1-k was introgressed from the cultivar, Kingwa. Rps1-k has been providing stable and broad-spectrum Phytophthora resistance in the major soybean-producing regions of the United States. Rps1-k has been mapped and isolated. More than one functional Rps1-k gene was identified from the Rps1-k locus. The clustering feature at the Rps1-k locus might have facilitated the expansion of Rps1-k gene numbers and the generation of new recognition specificities. The Rps1-k region was sequenced to understand the possible evolutionary steps that shaped the generation of Phytophthora resistance genes in soybean. Results Here the analyses of sequences of three overlapping BAC clones containing the 184,111 bp Rps1-k region are reported. A shotgun sequencing strategy was applied in sequencing the BAC contig. Sequence analysis predicted a few full-length genes including two Rps1-k genes, Rps1-k-1 and Rps1-k-2. Previously reported Rps1-k-3 from this genomic region 1 was evolved through intramolecular recombination between Rps1-k-1 and Rps1-k-2 in Escherichia coli. The majority of the predicted genes are truncated and therefore most likely they are nonfunctional. A member of a highly abundant retroelement, SIRE1, was identified from the Rps1-k region. The Rps1-k region is primarily composed of repetitive sequences. Sixteen simple repeat and 63 tandem repeat sequences were identified from the locus. Conclusion These data indicate that the Rps1 locus is located in a gene-poor region. The abundance of repetitive sequences in the Rps1-k region suggested that the location of this locus is in or near a

  7. Using Markov chains of nucleotide sequences as a possible precursor to predict functional roles of human genome: a case study on inactive chromatin regions.

    Science.gov (United States)

    Lee, K-E; Lee, E-J; Park, H-S

    2016-08-30

    Recent advances in computational epigenetics have provided new opportunities to evaluate n-gram probabilistic language models. In this paper, we describe a systematic genome-wide approach for predicting functional roles in inactive chromatin regions by using a sequence-based Markovian chromatin map of the human genome. We demonstrate that Markov chains of sequences can be used as a precursor to predict functional roles in heterochromatin regions and provide an example comparing two publicly available chromatin annotations of large-scale epigenomics projects: ENCODE project consortium and Roadmap Epigenomics consortium.

  8. The complete nucleotide sequence and environmental distribution of the cryptic, conjugative, broad-host-range plasmid pIPO2 islated from bacteria of the wheat rhizosphere

    NARCIS (Netherlands)

    Tauch, A.; Schneiker, S.; Selbitschka, W.; PÜhler, A.; Overbeek, van L.S.; Smalla, K.; Thomas, C.M.; Bailey, M.J.; Forney, L.J.; Weightman, A.; Ceglowski, P.; Pembroke, T.; Tietze, E.; Schröder, G.; Lanka, E.; Elsas, van J.D.

    2002-01-01

    The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily

  9. Length and nucleotide sequence polymorphism at the trnL and trnF non-coding regions of chloroplast genomes among Saccharum and Erianthus species

    Science.gov (United States)

    The aneupolyploidy genome of sugarcane (Saccharum hybrids spp.) and lack of a classical genetic linkage map make genetics research most difficult for sugarcane. Whole genome sequencing and genetic characterization of sugarcane and related taxa are far behind other crops. In this study, universal PCR...

  10. NUCLEOTIDE SEQUENCING AND TRANSCRIPTIONAL MAPPING OF THE GENES ENCODING BIPHENYL DIOXYGENASE, A MULTICOM- PONENT POLYCHLORINATED-BIPHENYL-DEGRADING ENZYME IN PSEUDOMONAS STRAIN LB400

    Science.gov (United States)

    The DNA region encoding biphenyl dioxygenase, the first enzyme in the biphenyl-polychlorinated biphenyl degradation pathway of Pseudomonas species strain LB400, was sequenced. Six open reading frames were identified, four of which are homologous to the components of toluene dioxy...

  11. CHARACTERIZATION AND NUCLEOTIDE SEQUENCE DETERMINATION OF A REPEAT ELEMENT ISOLATED FROM A 2,4,5,-T DEGRADING STRAIN OF PSEUDOMONAS CEPACIA

    Science.gov (United States)

    Pseudomonas cepacia strain AC1100, capable of growth on 2,4,5-trichlorophenoxyacetic acid (2,4,5-T), was mutated to the 2,4,5-T− strain PT88 by a ColE1 :: Tn5 chromosomal insertion. Using cloned DNA from the region flanking the insertion, a 1477-bp sequence (designated RS1100) wa...

  12. Population genetic structure in farm and feral American mink (Neovison vison) inferred from RAD sequencing-generated single nucleotide polymorphisms

    DEFF Research Database (Denmark)

    Thirstrup, Janne Pia; Ruiz-Gonzalez, Aritz; Pujolar, José Martin

    2015-01-01

    Feral American mink populations (Neovison vison), derived from mink farms, are widespread in Europe. In this study we investigated genetic diversity and genetic differentiation between feral and farm mink using a panel of genetic markers (194 SNP) generated from RAD sequencing data. Sampling incl...

  13. Nucleotide sequence and phylogeny of the tet (L) tetracycline resistance determinant encoded by the plasmid pSTE1 from Staphylococcus hyicus

    DEFF Research Database (Denmark)

    Schwarz, S.; Cardoso, M.; Wegener, Henrik Caspar

    1992-01-01

    O from Streptococcus mutans were performed. An alignment of Tet amino acid sequence revealed the presence of 30 conserved amino acids among these Tet variants. On the basis of the alignment, a phylogenetic tree was constructed. It demonstrated large evolutionary distances between the Tet M and Tet O...

  14. The KYxxL motif in Rad17 protein is essential for the interaction with the 9–1–1 complex

    Energy Technology Data Exchange (ETDEWEB)

    Fukumoto, Yasunori, E-mail: fukumoto@faculty.chiba-u.jp [Laboratory of Molecular Cell Biology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 260-8675 (Japan); Ikeuchi, Masayoshi; Nakayama, Yuji [Department of Biochemistry & Molecular Biology, Kyoto Pharmaceutical University, Kyoto 607-8414 (Japan); Yamaguchi, Naoto, E-mail: nyama@faculty.chiba-u.jp [Laboratory of Molecular Cell Biology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 260-8675 (Japan)

    2016-09-02

    ATR-dependent DNA damage checkpoint is the major DNA damage checkpoint against UV irradiation and DNA replication stress. The Rad17–RFC and Rad9–Rad1–Hus1 (9–1–1) complexes interact with each other to contribute to ATR signaling, however, the precise regulatory mechanism of the interaction has not been established. Here, we identified a conserved sequence motif, KYxxL, in the AAA+ domain of Rad17 protein, and demonstrated that this motif is essential for the interaction with the 9–1–1 complex. We also show that UV-induced Rad17 phosphorylation is increased in the Rad17 KYxxL mutants. These data indicate that the interaction with the 9–1–1 complex is not required for Rad17 protein to be an efficient substrate for the UV-induced phosphorylation. Our data also raise the possibility that the 9–1–1 complex plays a negative regulatory role in the Rad17 phosphorylation. We also show that the nucleotide-binding activity of Rad17 is required for its nuclear localization. - Highlights: • We have identified a conserved KYxxL motif in Rad17 protein. • The KYxxL motif is crucial for the interaction with the 9–1–1 complex. • The KYxxL motif is dispensable or inhibitory for UV-induced Rad17 phosphorylation. • Nucleotide binding of Rad17 is required for its nuclear localization.

  15. The complete nucleotide sequence and environmental distribution of the cryptic, conjugative, broad-host-range plasmid pIPO2 islated from bacteria of the wheat rhizosphere

    OpenAIRE

    Tauch, A.; Schneiker, S.; Selbitschka, W.; PÜhler, A.; Overbeek, van, L.S.; Smalla, K.; Thomas, C.M.; Bailey, M.J.; Forney, L.J.; Weightman, A.; Ceglowski, P.; Pembroke, T.; Tietze, E.; Schröder, G.; Lanka, E.

    2002-01-01

    The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approac...

  16. Complete nucleotide sequence of Sida golden mosaic Florida virus and phylogenetic relationships with other begomoviruses infecting malvaceous weeds in the Caribbean.

    Science.gov (United States)

    Fiallo-Olivé, Elvira; Martínez-Zubiaur, Yamila; Moriones, Enrique; Navas-Castillo, Jesús

    2010-09-01

    The complete genome sequence of two isolates of the bipartite begomovirus (genus Begomovirus, family Geminiviridae) Sida golden mosaic Florida virus (SiGMFV) is presented. We propose that both isolates, found infecting Malvastrum coromandelianum (family Malvaceae) in Cuba, belong to a new strain of SiGMFV. Phylogenetic analysis showed that SiGMFV DNA-A is located in a monophyletic cluster that includes begomoviruses infecting malvaceous weeds from the Caribbean.

  17. Analysis of complete nucleotide sequences of Angolan hepatitis B virus isolates reveals the existence of a separate lineage within genotype E.

    Directory of Open Access Journals (Sweden)

    Barbara V Lago

    Full Text Available Hepatitis B virus genotype E (HBV/E is highly prevalent in Western Africa. In this work, 30 HBV/E isolates from HBsAg positive Angolans (staff and visitors of a private hospital in Luanda were genetically characterized: 16 of them were completely sequenced and the pre-S/S sequences of the remaining 14 were determined. A high proportion (12/30, 40% of subjects tested positive for both HBsAg and anti-HBs markers. Deduced amino acid sequences revealed the existence of specific substitutions and deletions in the B- and T-cell epitopes of the surface antigen (pre-S1- and pre-S2 regions of the virus isolates derived from 8/12 individuals with concurrent HBsAg/anti-HBs. Phylogenetic analysis performed with 231 HBV/E full-length sequences, including 16 from this study, showed that all isolates from Angola, Namibia and the Democratic Republic of Congo (n = 28 clustered in a separate lineage, divergent from the HBV/E isolates from nine other African countries, namely Cameroon, Central African Republic, Côte d'Ivoire, Ghana, Guinea, Madagascar, Niger, Nigeria and Sudan, with a Bayesian posterior probability of 1. Five specific mutations, namely small S protein T57I, polymerase Q177H, G245W and M612L, and X protein V30L, were observed in 79-96% of the isolates of the separate lineage, compared to a frequency of 0-12% among the other HBV/E African isolates.

  18. Crystal Structures of the Scaffolding Protein LGN Reveal the General Mechanism by Which GoLoco Binding Motifs Inhibit the Release of GDP from Gαi *

    Science.gov (United States)

    Jia, Min; Li, Jianchao; Zhu, Jinwei; Wen, Wenyu; Zhang, Mingjie; Wang, Wenning

    2012-01-01

    GoLoco (GL) motif-containing proteins regulate G protein signaling by binding to Gα subunit and acting as guanine nucleotide dissociation inhibitors. GLs of LGN are also known to bind the GDP form of Gαi/o during asymmetric cell division. Here, we show that the C-terminal GL domain of LGN binds four molecules of Gαi·GDP. The crystal structures of Gαi·GDP in complex with LGN GL3 and GL4, respectively, reveal distinct GL/Gαi interaction features when compared with the only high resolution structure known with GL/Gαi interaction between RGS14 and Gαi1. Only a few residues C-terminal to the conserved GL sequence are required for LGN GLs to bind to Gαi·GDP. A highly conserved “double Arg finger” sequence (RΨ(D/E)(D/E)QR) is responsible for LGN GL to bind to GDP bound to Gαi. Together with the sequence alignment, we suggest that the LGN GL/Gαi interaction represents a general binding mode between GL motifs and Gαi. We also show that LGN GLs are potent guanine nucleotide dissociation inhibitors. PMID:22952234

  19. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  20. Highly scalable Ab initio genomic motif identification

    KAUST Repository

    Marchand, Benoit; Bajic, Vladimir B.; Kaushik, Dinesh

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  1. The identification of functional motifs in temporal gene expression analysis

    Directory of Open Access Journals (Sweden)

    Michael G. Surette

    2005-01-01

    Full Text Available The identification of transcription factor binding sites is essential to the understanding of the regulation of gene expression and the reconstruction of genetic regulatory networks. The in silico identification of cis-regulatory motifs is challenging due to sequence variability and lack of sufficient data to generate consensus motifs that are of quantitative or even qualitative predictive value. To determine functional motifs in gene expression, we propose a strategy to adopt false discovery rate (FDR and estimate motif effects to evaluate combinatorial analysis of motif candidates and temporal gene expression data. The method decreases the number of predicted motifs, which can then be confirmed by genetic analysis. To assess the method we used simulated motif/expression data to evaluate parameters. We applied this approach to experimental data for a group of iron responsive genes in Salmonella typhimurium 14028S. The method identified known and potentially new ferric-uptake regulator (Fur binding sites. In addition, we identified uncharacterized functional motif candidates that correlated with specific patterns of expression. A SAS code for the simulation and analysis gene expression data is available from the first author upon request.

  2. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  3. Maximum likelihood and Bayesian analyses of a combined nucleotide sequence dataset for genetic characterization of a novel pestivirus, SVA/cont-08.

    Science.gov (United States)

    Liu, Lihong; Xia, Hongyan; Baule, Claudia; Belák, Sándor

    2009-01-01

    Bovine viral diarrhoea virus 1 (BVDV-1) and Bovine viral diarrhoea virus 2 (BVDV-2) are two recognised bovine pestivirus species of the genus Pestivirus. Recently, a pestivirus, termed SVA/cont-08, was detected in a batch of contaminated foetal calf serum originating from South America. Comparative sequence analysis showed that the SVA/cont-08 virus shares 15-28% higher sequence identity to pestivirus D32/00_'HoBi' than to members of BVDV-1 and BVDV-2. In order to reveal the phylogenetic relationship of SVA/cont-08 with other pestiviruses, a molecular dataset of 30 pestiviruses and 1,896 characters, comprising the 5'UTR, N(pro) and E2 gene regions, was analysed by two methods: maximum likelihood and Bayesian approach. An identical, well-supported tree topology was observed, where four pestiviruses (SVA/cont-08, D32/00_'HoBi', CH-KaHo/cont, and Th/04_KhonKaen) formed a monophyletic clade that is closely related to the BVDV-1 and BVDV-2 clades. The strategy applied in this study is useful for classifying novel pestiviruses in the future.

  4. Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    Science.gov (United States)

    Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

    2011-01-01

    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875

  5. Validation of skeletal muscle cis-regulatory module predictions reveals nucleotide composition bias in functional enhancers.

    Directory of Open Access Journals (Sweden)

    Andrew T Kwon

    2011-12-01

    Full Text Available We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions.

  6. Nucleotide sequence of a cDNA coding for the barley seed protein CMa: an inhibitor of insect α-amylase

    DEFF Research Database (Denmark)

    Rasmussen, Søren Kjærsgård; Johansson, A.

    1992-01-01

    The primary structure of the insect alpha-amylase inhibitor CMa of barley seeds was deduced from a full-length cDNA clone pc43F6. Analysis of RNA from barley endosperm shows high levels 15 and 20 days after flowering. The cDNA predicts an amino acid sequence of 119 residues preceded by a signal...... peptide of 25 amino acids. Ala and Leu account for 55% of the signal peptide. CMa is 60-85% identical with alpha-amylase inhibitors of wheat, but shows less than 50% identity to trypsin inhibitors of barley and wheat. The 10 Cys residues are located in identical positions compared to the cereal inhibitor...

  7. CMD: A Database to Store the Bonding States of Cysteine Motifs with Secondary Structures

    Directory of Open Access Journals (Sweden)

    Hamed Bostan

    2012-01-01

    Full Text Available Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.

  8. Insights into the motif preference of APOBEC3 enzymes.

    Directory of Open Access Journals (Sweden)

    Diako Ebrahimi

    Full Text Available We used a multivariate data analysis approach to identify motifs associated with HIV hypermutation by different APOBEC3 enzymes. The analysis showed that APOBEC3G targets G mainly within GG, TG, TGG, GGG, TGGG and also GGGT. The G nucleotides flanked by a C at the 3' end (in +1 and +2 positions were indicated as disfavoured targets by APOBEC3G. The G nucleotides within GGGG were found to be targeted at a frequency much less than what is expected. We found that the infrequent G-to-A mutation within GGGG is not limited to the inaccessibility, to APOBEC3, of poly Gs in the central and 3'polypurine tracts (PPTs which remain double stranded during the HIV reverse transcription. GGGG motifs outside the PPTs were also disfavoured. The motifs GGAG and GAGG were also found to be disfavoured targets for APOBEC3. The motif-dependent mutation of G within the HIV genome by members of the APOBEC3 family other than APOBEC3G was limited to GA→AA changes. The results did not show evidence of other types of context dependent G-to-A changes in the HIV genome.

  9. Sequence requirement of the ade6-4095 meiotic recombination hotspot in Schizosaccharomyces pombe.

    Science.gov (United States)

    Foulis, Steven J; Fowler, Kyle R; Steiner, Walter W

    2018-02-01

    Homologous recombination occurs at a greatly elevated frequency in meiosis compared to mitosis and is initiated by programmed double-strand DNA breaks (DSBs). DSBs do not occur at uniform frequency throughout the genome in most organisms, but occur preferentially at a limited number of sites referred to as hotspots. The location of hotspots have been determined at nucleotide-level resolution in both the budding and fission yeasts, and while several patterns have emerged regarding preferred locations for DSB hotspots, it remains unclear why particular sites experience DSBs at much higher frequency than other sites with seemingly similar properties. Short sequence motifs, which are often sites for binding of transcription factors, are known to be responsible for a number of hotspots. In this study we identified the minimum sequence required for activity of one of such motif identified in a screen of random sequences capable of producing recombination hotspots. The experimentally determined sequence, GGTCTRGACC, closely matches the previously inferred sequence. Full hotspot activity requires an effective sequence length of 9.5 bp, whereas moderate activity requires an effective sequence length of approximately 8.2 bp and shows significant association with DSB hotspots. In combination with our previous work, this result is consistent with a large number of different sequence motifs capable of producing recombination hotspots, and supports a model in which hotspots can be rapidly regenerated by mutation as they are lost through recombination.

  10. Codon based co-occurrence network motifs in human mitochondria

    Directory of Open Access Journals (Sweden)

    Pramod Shinde

    2017-10-01

    Full Text Available The nucleotide polymorphism in human mitochondrial genome (mtDNA tolled by codon position bias plays an indispensable role in human population dispersion and expansion. Herein, we constructed genome-wide nucleotide co-occurrence networks using a massive data consisting of five different geographical regions and around 3000 samples for each region. We developed a powerful network model to describe complex mitochondrial evolutionary patterns between codon and non-codon positions. It was interesting to report a different evolution of Asian genomes than those of the rest which is divulged by network motifs. We found evidence that mtDNA undergoes substantial amounts of adaptive evolution, a finding which was supported by a number of previous studies. The dominance of higher order motifs indicated the importance of long-range nucleotide co-occurrence in genomic diversity. Most notably, codon motifs apparently underpinned the preferences among codon positions for co-evolution which is probably highly biased during the origin of the genetic code. Our analyses manifested that codon position co-evolution is very well conserved across human sub-populations and independently maintained within human sub-populations implying the selective role of evolutionary processes on codon position co-evolution. Ergo, this study provided a framework to investigate cooperative genomic interactions which are critical in underlying complex mitochondrial evolution.

  11. Isolation and characterization of human glycophorin A cDNAs using a synthetic oligonucleotide approach: nucleotide sequence, mRNA structure and regulation by 12-O-tetradecanoylphorbol 13-acetate (TPA)

    International Nuclear Information System (INIS)

    Siebert, P.D.; Fukuda, M.

    1986-01-01

    The authors have previously shown that treatment of human erythroleukemic K562 cells with the tumor-promoting phorbol ester, TPA, results in a diminished expression of glycophorin A at the level of protein biosynthesis and in vitro mRNA translation activity. To further examine the structure, relationships and expression of human glycophorins they have successfully isolated and sequenced several glycophorin A specific cDNA clones derived from K562 cells, by making extensive use of mixed and exact synthetic oligonucleotides as primers and radioactively labeled probes. The nucleotide sequence obtained from the largest glycophorin A cDNA suggests the presence of a hydrophobic leader-like peptide of at least 19 amino acids. Northern gel analysis using both whole cDNA-plasmid and synthetic oligonucleotide probes revealed the existence of multiple mRNAs, three of which they believe to be glycophorin A-specific, whereas a fourth and smaller mRNA appears to be glycophorin B-specific. Furthermore, the abundance of all four glycophorin mRNAs were found to be extensively reduced following treatment of K562 cells with TPA suggesting coordinate regulation, possibly at the level of gene transcription

  12. Nucleotide sequence of pOLA52: a conjugative IncX1 plasmid from Escherichia coli which enables biofilm formation and multidrug efflux

    DEFF Research Database (Denmark)

    Norman, Anders; Hansen, Lars H.; She, Qunxin

    2008-01-01

    . The plasmid was also classified as IncX1 with incompatibility testing. The conjugal transfer and plasmid maintenance regions of pOLA52 therefore seem to represent IncX1 orthologues of the well-characterized IncX2 plasmid R6K. Sequence homology searches in GenBank also suggested a considerably higher...... of type 3 fimbriae (mrkABCDF). The plasmid was found to be 51,602 bp long with 68 putative genes. About half of the plasmid constituted a conserved IncX1-type backbone with predicted regions for conjugation, replication and partitioning, as well as a toxin/antitoxin (TA) plasmid addiction system...... prevalence of IncX1 group plasmids than IncX2. The 21 kb 'genetic load' region of pOLA52 was shown to consist of a mosaic, among other things a fragmented Tn3 transposon encoding ampicillin resistance. Most notably the oqxAB and mrkABCDF cassettes were contained within two composite transposons (Tn6010...

  13. Phylogeny reconstruction and hybrid analysis of populus (Salicaceae) based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments.

    Science.gov (United States)

    Wang, Zhaoshan; Du, Shuhui; Dayanandan, Selvadurai; Wang, Dongsheng; Zeng, Yanfei; Zhang, Jianguo

    2014-01-01

    Populus (Salicaceae) is one of the most economically and ecologically important genera of forest trees. The complex reticulate evolution and lack of highly variable orthologous single-copy DNA markers have posed difficulties in resolving the phylogeny of this genus. Based on a large data set of nuclear and plastid DNA sequences, we reconstructed robust phylogeny of Populus using parsimony, maximum likelihood and Bayesian inference methods. The resulting phylogenetic trees showed better resolution at both inter- and intra-sectional level than previous studies. The results revealed that (1) the plastid-based phylogenetic tree resulted in two main clades, suggesting an early divergence of the maternal progenitors of Populus; (2) three advanced sections (Populus, Aigeiros and Tacamahaca) are of hybrid origin; (3) species of the section Tacamahaca could be divided into two major groups based on plastid and nuclear DNA data, suggesting a polyphyletic nature of the section; and (4) many species proved to be of hybrid origin based on the incongruence between plastid and nuclear DNA trees. Reticulate evolution may have played a significant role in the evolution history of Populus by facilitating rapid adaptive radiations into different environments.

  14. Phylogeny reconstruction and hybrid analysis of populus (Salicaceae based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments.

    Directory of Open Access Journals (Sweden)

    Zhaoshan Wang

    Full Text Available Populus (Salicaceae is one of the most economically and ecologically important genera of forest trees. The complex reticulate evolution and lack of highly variable orthologous single-copy DNA markers have posed difficulties in resolving the phylogeny of this genus. Based on a large data set of nuclear and plastid DNA sequences, we reconstructed robust phylogeny of Populus using parsimony, maximum likelihood and Bayesian inference methods. The resulting phylogenetic trees showed better resolution at both inter- and intra-sectional level than previous studies. The results revealed that (1 the plastid-based phylogenetic tree resulted in two main clades, suggesting an early divergence of the maternal progenitors of Populus; (2 three advanced sections (Populus, Aigeiros and Tacamahaca are of hybrid origin; (3 species of the section Tacamahaca could be divided into two major groups based on plastid and nuclear DNA data, suggesting a polyphyletic nature of the section; and (4 many species proved to be of hybrid origin based on the incongruence between plastid and nuclear DNA trees. Reticulate evolution may have played a significant role in the evolution history of Populus by facilitating rapid adaptive radiations into different environments.

  15. Characterization of the Complete Nucleotide Sequences of IncA/C2 Plasmids Carrying In809-Like Integrons from Enterobacteriaceae Isolates of Wildlife Origin.

    Science.gov (United States)

    Papagiannitsis, Costas C; Kutilova, Iva; Medvecky, Matej; Hrabak, Jaroslav; Dolejska, Monika

    2017-09-01

    A total of 18 Enterobacteriaceae (17 from gulls and 1 from a clinical sample) collected from Australia, carrying IncA/C plasmids with the IMP-encoding In809-like integrons, were studied. Seven plasmids, being representatives of different origins, plasmid sizes, replicon combinations, and resistance genes, were completely sequenced. Plasmid pEc158, identified in a clinical Escherichia coli ST752 isolate, showed extensive similarity to type 2 IncA/C 2 plasmids. pEc158 carried none of the bla CMY-2 -like region or ARI-B and ARI-A regions, while it contained a hybrid transposon structure. The six remaining plasmids, which were of wildlife origin, were highly similar to each other and probably were fusion derivatives of type 1 and type 2 A/C 2 plasmids. The latter plasmids contained an ARI-B region and hybrid transposon structures. In all plasmids, hybrid transposon structures containing In809-like integrons were inserted 3,434 bp downstream of the rhs2 start codon. In all cases, the one outermost 38-bp inverted repeat (IR) of the transposon was associated with the Tn 1696 tnp module, while the other outermost 38-bp IR of the transposon was associated with either a Tn 6317 -like module or a Tn 21 mer module. However, the internal structure of the transposon and the resistance genes were different in each plasmid. These findings indicated that, for the specific periods of time and settings, different IncA/C 2 plasmid types carrying In809-like elements circulated among isolates of wildlife and clinical origins. Additionally, they provided the basis for speculations regarding the reshuffling of IncA/C 2 plasmids with In809-like integrons and confirmed the rapid evolution of IncA/C 2 plasmid lineages. Copyright © 2017 American Society for Microbiology.

  16. Comparison of nucleotide sequences of recent and previous lineages of peste-des-petits-ruminants viruses of sheep and goats in Nigeria

    Directory of Open Access Journals (Sweden)

    Samuel Mantip

    2016-08-01

    Full Text Available Peste-des-petits-ruminants virus (PPRV is a highly contagious, fatal and economically important viral disease of small ruminants that is still endemic and militates against the production of sheep and goats in endemic areas of the world. The aim of this study was to describe the viral strains within the country. This was carried out by collecting tissue and swab samples from sheep and goats in various agro-ecological zones of Nigeria. The phylogeny of archived PPRV strains or isolates and those circulating and causing recent outbreaks was determined by sequencing of the nucleoprotein (N-gene. Twenty tissue and swab samples from apparently healthy and sick sheep and goats were collected randomly from 18 states, namely 3 states in each of the 6 agro-ecological zones visited. A total of 360 samples were collected. A total of 35 samples of 360 (9.7% tested positive by reverse transcriptase–polymerase chain reaction, of which 25 were from oculo-nasal swabs and 10 were from tissue samples. Neighbour-joining phylogenetic analysis using Phylogenetic Analysis Using Parsimony (PAUP identified four different lineages, that is, lineages I, II, III and IV. Interestingly, the Nigerian strains described in this study grouped in two separate major lineages, that is, lineages II and IV. Strains from Sokoto, Oyo, Plateau and Ondo states grouped according to the historical distribution of PPRV together with the Nigerian 75/1 strain of lineage II, while other strains from Sokoto, Oyo, Plateau, Akwa-Ibom, Adamawa, Kaduna, Lagos, Bauchi, Niger and Kano states grouped together with the East African and Asian strains of lineage IV. This finding confirms that both lineage II and IV strains of PPRV are circulating in Nigeria. Previously, only strains of lineage II were found to be present in the country.

  17. POWRS: position-sensitive motif discovery.

    Directory of Open Access Journals (Sweden)

    Ian W Davis

    Full Text Available Transcription factors and the short, often degenerate DNA sequences they recognize are central regulators of gene expression, but their regulatory code is challenging to dissect experimentally. Thus, computational approaches have long been used to identify putative regulatory elements from the patterns in promoter sequences. Here we present a new algorithm "POWRS" (POsition-sensitive WoRd Set for identifying regulatory sequence motifs, specifically developed to address two common shortcomings of existing algorithms. First, POWRS uses the position-specific enrichment of regulatory elements near transcription start sites to significantly increase sensitivity, while providing new information about the preferred localization of those elements. Second, POWRS forgoes position weight matrices for a discrete motif representation that appears more resistant to over-generalization. We apply this algorithm to discover sequences related to constitutive, high-level gene expression in the model plant Arabidopsis thaliana, and then experimentally validate the importance of those elements by systematically mutating two endogenous promoters and measuring the effect on gene expression levels. This provides a foundation for future efforts to rationally engineer gene expression in plants, a problem of great importance in developing biotech crop varieties.BSD-licensed Python code at http://grassrootsbio.com/papers/powrs/.

  18. Effective Feature Selection for Classification of Promoter Sequences.

    Directory of Open Access Journals (Sweden)

    Kouser K

    Full Text Available Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine, KNN (K Nearest Neighbor and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  19. Mitochondrial DNA analysis reveals a low nucleotide diversity of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2009-06-17

    Jun 17, 2009 ... gene sequences of C. japonica in China to assess nucleotide sequence diversity (GenBank ... provide a scientific basis for the regional control of forestry .... population (AB015869) was downloaded from GenBank database.

  20. Deep Sequence Analysis of Non-Small Cell Lung Cancer: Integrated Analysis of Gene Expression, Alternative Splicing, and Single Nucleotide Variations in Lung Adenocarcinomas with and without Oncogenic KRAS Mutations

    International Nuclear Information System (INIS)

    Kalari, Krishna R.; Rossell, David; Necela, Brian M.; Asmann, Yan W.; Nair, Asha

    2012-01-01

    KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.

  1. Metamotifs - a generative model for building families of nucleotide position weight matrices

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2010-06-01

    Full Text Available Abstract Background Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence. Results We propose a probabilistic model for position weight matrix (PWM sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain. Conclusions We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of

  2. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin

    2015-01-01

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  3. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  4. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  5. Cytogenetic Diversity of Simple Sequences Repeats in Morphotypes of Brassica rapa ssp. chinensis.

    Science.gov (United States)

    Zheng, Jin-Shuang; Sun, Cheng-Zhen; Zhang, Shu-Ning; Hou, Xi-Lin; Bonnema, Guusje

    2016-01-01

    A significant fraction of the nuclear DNA of all eukaryotes is comprised of simple sequence repeats (SSRs). Although these sequences are widely used for studying genetic variation, linkage mapping and evolution, little attention had been paid to the chromosomal distribution and cytogenetic diversity of these sequences. In this paper, we report the distribution characterization of mono-, di-, and tri-nucleotide SSRs in Brassica rapa ssp. chinensis. Fluorescence in situ hybridization was used to characterize the cytogenetic diversity of SSRs among morphotypes of B. rapa ssp. chinensis. The proportion of different SSR motifs varied among morphotypes of B. rapa ssp. chinensis, with tri-nucleotide SSRs being more prevalent in the genome of B. rapa ssp. chinensis. We determined the chromosomal locations of mono-, di-, and tri-nucleotide repeat loci. The results showed that the chromosomal distribution of SSRs in the different morphotypes is non-random and motif-dependent, and allowed us to characterize the relative variability in terms of SSR numbers and similar chromosomal distributions in centromeric/peri-centromeric heterochromatin. The differences between SSR repeats with respect to abundance and distribution indicate that SSRs are a driving force in the genomic evolution of B. rapa species. Our results provide a comprehensive view of the SSR sequence distribution and evolution for comparison among morphotypes B. rapa ssp. chinensis.

  6. Nucleotide Selectivity in Abiotic RNA Polymerization Reactions

    Science.gov (United States)

    Coari, Kristin M.; Martin, Rebecca C.; Jain, Kopal; McGown, Linda B.

    2017-09-01

    In order to establish an RNA world on early Earth, the nucleotides must form polymers through chemical rather than biochemical reactions. The polymerization products must be long enough to perform catalytic functions, including self-replication, and to preserve genetic information. These functions depend not only on the length of the polymers, but also on their sequences. To date, studies of abiotic RNA polymerization generally have focused on routes to polymerization of a single nucleotide and lengths of the homopolymer products. Less work has been done the selectivity of the reaction toward incorporation of some nucleotides over others in nucleotide mixtures. Such information is an essential step toward understanding the chemical evolution of RNA. To address this question, in the present work RNA polymerization reactions were performed in the presence of montmorillonite clay catalyst. The nucleotides included the monophosphates of adenosine, cytosine, guanosine, uridine and inosine. Experiments included reactions of mixtures of an imidazole-activated nucleotide (ImpX) with one or more unactivated nucleotides (XMP), of two or more ImpX, and of XMP that were activated in situ in the polymerization reaction itself. The reaction products were analyzed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to identify the lengths and nucleotide compositions of the polymerization products. The results show that the extent of polymerization, the degree of heteropolymerization vs. homopolymerization, and the composition of the polymeric products all vary among the different nucleotides and depend upon which nucleotides and how many different nucleotides are present in the mixture.

  7. Nucleotide Selectivity in Abiotic RNA Polymerization Reactions.

    Science.gov (United States)

    Coari, Kristin M; Martin, Rebecca C; Jain, Kopal; McGown, Linda B

    2017-09-01

    In order to establish an RNA world on early Earth, the nucleotides must form polymers through chemical rather than biochemical reactions. The polymerization products must be long enough to perform catalytic functions, including self-replication, and to preserve genetic information. These functions depend not only on the length of the polymers, but also on their sequences. To date, studies of abiotic RNA polymerization generally have focused on routes to polymerization of a single nucleotide and lengths of the homopolymer products. Less work has been done the selectivity of the reaction toward incorporation of some nucleotides over others in nucleotide mixtures. Such information is an essential step toward understanding the chemical evolution of RNA. To address this question, in the present work RNA polymerization reactions were performed in the presence of montmorillonite clay catalyst. The nucleotides included the monophosphates of adenosine, cytosine, guanosine, uridine and inosine. Experiments included reactions of mixtures of an imidazole-activated nucleotide (ImpX) with one or more unactivated nucleotides (XMP), of two or more ImpX, and of XMP that were activated in situ in the polymerization reaction itself. The reaction products were analyzed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to identify the lengths and nucleotide compositions of the polymerization products. The results show that the extent of polymerization, the degree of heteropolymerization vs. homopolymerization, and the composition of the polymeric products all vary among the different nucleotides and depend upon which nucleotides and how many different nucleotides are present in the mixture.

  8. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  9. Efficient motif finding algorithms for large-alphabet inputs

    Directory of Open Access Journals (Sweden)

    Pavlovic Vladimir

    2010-10-01

    Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

  10. Serological and genetic characterisation of bovine respiratory syncytial virus (BRSV) indicates that Danish isolates belong to the intermediate subgroup: no evidence of a selective effect on the variability of G protein nucleotide sequence by prior cell culture adaption and passages in cell culture

    DEFF Research Database (Denmark)

    Larsen, Lars Erik; Uttenthal, Åse; Arctander, P.

    1998-01-01

    on the nucleotide sequence of the G protein. These findings indicated that the previously established variabilities of the G protein of RS virus isolates were not attributable to mutations induced during the propagation of the virus. The reactivity of the Danish isolates with G protein-specific MAbs were similar......Danish isolates of bovine respiratory syncytial virus (BRSV) were characterised by nucleotide sequencing of the G glycoprotein and by their reactivity with a panel of monoclonal antibodies (MAbs). Among the six Danish isolates, the overall sequence divergence ranged between 0 and 3...... part of the G gene of additional 11 field BRSV viruses, processed directly from lung samples without prior adaption to cell culture growth. revealed sequence variabilities in the range obtained with the propagated virus. In addition, several passages in cell culture and in calves had no major impact...

  11. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Science.gov (United States)

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-01

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683

  12. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Directory of Open Access Journals (Sweden)

    Graziele Pereira Oliveira

    2017-01-01

    Full Text Available For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV, raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’ that could be evolved gradually by nucleotides’ gain and loss and point mutations.

  13. ANALYSIS OF STABILITY OF TRINUCLEOTIDE TTC MOTIFS IN COMMON FLAX PLANTED IN THE CHERNOBYL AREA

    Directory of Open Access Journals (Sweden)

    Veronika Lancíková

    2015-02-01

    Full Text Available Flax (Linum usitatissimum L. is one of the oldest domesticated plants — it was cultivated as early as in ancient Egypt and Samaria 10,000 years ago to serve as a source of fiber and oil, whence it later spread around the world. Compared with other plants, the flax genome consists of a high number of repetitive sequences, middle repetitive sequences and small repetitive sequences of nucleotides. The aim of the study was to analyze the stability of the existing trinucleotides motifs of microsatellite DNA of the flax genome (genotype Kyivskyi, growing in the Chernobyl conditions. The Chernobyl area is the most extensive “natural” laboratory suitable for the study of radiation effects. Over the last 20 years, the researches collected important knowledge about the effects of low and high radiation doses on the DNA isolated from the plant material growing on the remediated fields near Chernobyl and the plant material from fields contaminated by radioactive cesium 137Cs and strontium 90Sr. Using eight pairs of microsatellite primers, we successfully amplified the samples from the remediated fields. For each primer in the control samples and remediated samples, we detected 1 to 3 fragments per locus, each in size up to 120 to 250 base pairs. The applied microsatellite primers confirmed the monomorphic condition of microsatellite loci.

  14. Nucleotide sequence, organization and expression of rdgA and rdgB genes that regulate pectin lyase production in the plant pathogenic bacterium Erwinia carotovora subsp. carotovora in response to DNA-damaging agents.

    Science.gov (United States)

    Liu, Y; Chatterjee, A; Chatterjee, A K

    1994-12-01

    In most soft-rotting Erwinia spp., including E. carotovora subsp. carotovora strain 71 (Ecc71), production of the plant cell wall degrading enzyme pectin lyase (Pnl) is activated by DNA-damaging agents such as mitomycin C (MC). Induction of Pnl production in Ecc71 requires a functional recA gene and the rdg locus. DNA sequencing and RNA analyses revealed that the rdg locus contains two regulatory genes, rdgA and rdgB, in separate transcriptional units. There is high homology between RdgA and repressors of lambdoid phages, specially phi 80. RdgB, however, has significant homology with transcriptional activators of Mu phage. Both RdgA and RdgB are also predicted to possess helix-turn-helix motifs. By replacing the rdgB promoter with the IPTG-inducible tac promoter, we have determined that rdgB by itself can activate Pnl production in Escherichia coli. However, deletion analysis of rdg+ DNA indicated that, when driven by their native promoters, functions of both rdgA and rdgB are required for the induction of pnlA expression by MC treatment. While rdgB transcription occurs only after MC treatment, a substantial level of rdgA mRNA is detected in the absence of MC treatment. Moreover, upon induction with MC, a new rdgA mRNA species, initiated from a different start site, is produced at a high level. Thus, the two closely linked rdgA and rdgB genes, required for the regulation of Pnl production, are expressed differently in Ecc71.

  15. [Phylogenetic relationships of the species of Oxytropis DC. subg. Oxytropis and Phacoxytropis (Fabaceae) from Asian Russia inferred from the nucleotide sequence analysis of the intergenic spacers of the chloroplast genome].

    Science.gov (United States)

    Kholina, A B; Kozyrenko, M M; Artyukova, E V; Sandanov, D V; Andrianova, E A

    2016-08-01

    The nucleotide sequence analysis of trnH–psbA, trnL–trnF, and trnS–trnG intergenic spacer regions of chloroplast DNA performed in the representatives of the genus Oxytropis from Asian Russia provided clarification of the phylogenetic relationships of some species and sections in the subgenera Oxytropis and Phacoxytropis and in the genus Oxytropis as a whole. Only the section Mesogaea corresponds to the subgenus Phacoxytropis, while the section Janthina of the same subgenus groups together with the sections of the subgenus Oxytropis. The sections Chrysantha and Ortholoma of the subgenus Oxytropis are not only closely related to each other, but together with the section Mesogaea, they are grouped into the subgenus Phacoxytropis. It seems likely that the sections Chrysantha and Ortholoma should be assigned to the subgenus Phacoxytropis, and the section Janthina should be assigned to the subgenus Oxytropis. The molecular differences were identified between O. coerulea and O. mandshurica from the section Janthina that were indicative of considerable divergence of their chloroplast genomes and the species independence of the taxa. The species independence of O. czukotica belonging to the section Arctobia was also confirmed.

  16. Binding properties of SUMO-interacting motifs (SIMs) in yeast.

    Science.gov (United States)

    Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

    2015-03-01

    Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.

  17. Retrieval and Representation of Nucleotide Sequence of ...

    African Journals Online (AJOL)

    ABSTRACT: Educational programmes all over the world are facing increasing ... providing biological insights, and proficiency to access and use the vast repository of computational and web- based resources which are the most available information in the world today. ... opening up new frontiers in the past two decades.

  18. Expressed sequence tags (ESTs) and single nucleotide ...

    African Journals Online (AJOL)

    SERVER

    2008-02-19

    Feb 19, 2008 ... polymorphisms (SNPs): Emerging molecular marker tools for ... knowledge in plant biology, breeding and biotechnology. The emergence of many ...... phenotypes: past successes for Mendelian disease, future approaches for ...

  19. Unveiling Mycoplasma hyopneumoniae Promoters: Sequence Definition and Genomic Distribution

    Science.gov (United States)

    Weber, Shana de Souto; Sant'Anna, Fernando Hayashi; Schrank, Irene Silveira

    2012-01-01

    Several Mycoplasma species have had their genome completely sequenced, including four strains of the swine pathogen Mycoplasma hyopneumoniae. Nevertheless, little is known about the nucleotide sequences that control transcriptional initiation in these microorganisms. Therefore, with the objective of investigating the promoter sequences of M. hyopneumoniae, 23 transcriptional start sites (TSSs) of distinct genes were mapped. A pattern that resembles the σ70 promoter −10 element was found upstream of the TSSs. However, no −35 element was distinguished. Instead, an AT-rich periodic signal was identified. About half of the experimentally defined promoters contained the motif 5′-TRTGn-3′, which was identical to the −16 element usually found in Gram-positive bacteria. The defined promoters were utilized to build position-specific scoring matrices in order to scan putative promoters upstream of all coding sequences (CDSs) in the M. hyopneumoniae genome. Two hundred and one signals were found associated with 169 CDSs. Most of these sequences were located within 100 nucleotides of the start codons. This study has shown that the number of promoter-like sequences in the M. hyopneumoniae genome is more frequent than expected by chance, indicating that most of the sequences detected are probably biologically functional. PMID:22334569

  20. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted b