WorldWideScience

Sample records for repeating sequence motifs

  1. REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

    Directory of Open Access Journals (Sweden)

    Chong Chu

    Full Text Available Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.

  2. Nuclear Magnetic Resonance Structure of a Novel Globular Domain in RBM10 Containing OCRE, the Octamer Repeat Sequence Motif.

    Science.gov (United States)

    Martin, Bryan T; Serrano, Pedro; Geralt, Michael; Wüthrich, Kurt

    2016-01-01

    The OCtamer REpeat (OCRE) has been annotated as a 42-residue sequence motif with 12 tyrosine residues in the spliceosome trans-regulatory elements RBM5 and RBM10 (RBM [RNA-binding motif]), which are known to regulate alternative splicing of Fas and Bcl-x pre-mRNA transcripts. Nuclear magnetic resonance structure determination showed that the RBM10 OCRE sequence motif is part of a 55-residue globular domain containing 16 aromatic amino acids, which consists of an anti-parallel arrangement of six β strands, with the first five strands containing complete or incomplete Tyr triplets. This OCRE globular domain is a distinctive component of RBM10 and is more widely conserved in RBM10s across the animal kingdom than the ubiquitous RNA recognition components. It is also found in the functionally related RBM5. Thus, it appears that the three-dimensional structure of the globular OCRE domain, rather than the 42-residue OCRE sequence motif alone, confers specificity on RBM10 intermolecular interactions in the spliceosome.

  3. Discovering novel sequence motifs with MEME.

    Science.gov (United States)

    Bailey, Timothy L

    2002-11-01

    This unit illustrates how to use MEME to discover motifs in a group of related nucleotide or peptide sequences. A MEME motif is a sequence pattern that occurs repeatedly in one or more sequences in the input group. MEME can be used to discover novel patterns because it bases its discoveries only on the input sequences, not on any prior knowledge (such as databases of known motifs). The input to MEME is a set of unaligned sequences of the same type (peptide or nucleotide). For each motif it discovers, MEME reports the occurrences (sites), consensus sequence, and the level of conservation (information content) at each position in the pattern. MEME also produces block diagrams showing where all of the discovered motifs occur in the training set sequences. MEME's hypertext (HTML) output also contains buttons that allow for the convenient use of the motifs in other searches.

  4. Network motifs in music sequences

    CERN Document Server

    Zanette, Damian H

    2010-01-01

    In this note, I summarize ongoing research on motif distribution in networks built up out of symbolic sequences of Western musical origin. Their motif significance profiles exhibit remarkable consistency over different styles and periods, and define a class that cannot be identified with any of the four "superfamilies" to which most real networks seem to belong. Networks from music sequences possess an unusual abundance of bidirectional connections, due to the inherent reversibility of short musical note patterns. This property contributes to motif significance from both local and large-scale features of musical structure.

  5. Bases of motifs for generating repeated patterns with wild cards.

    Science.gov (United States)

    Pisanti, Nadia; Crochemore, Maxime; Grossi, Roberto; Sagot, Marie-France

    2005-01-01

    Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive evidence in favor of either, and recent work has attempted to integrate the two types into a single model. In this paper, we address the formal issue in relation to motifs as patterns. This is essential to get at a better understanding of motifs in general. In particular, we consider a promising idea that was recently proposed, which attempted to avoid the combinatorial explosion in the number of motifs by means of a generator set for the motifs. Instead of exhibiting a complete list of motifs satisfying some input constraints, what is produced is a basis of such motifs from which all the other ones can be generated. We study the computational cost of determining such a basis of repeated motifs with wild cards in a sequence. We give new upper and lower bounds on such a cost, introducing a notion of basis that is provably contained in (and, thus, smaller) than previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all bases defined so far grows exponentially with the quorum, that is, with the minimal number of times a motif must appear in a sequence, something unnoticed in previous work. We show that there is no hope to efficiently compute such bases unless the quorum is fixed.

  6. Detecting Motifs in System Call Sequences

    CERN Document Server

    Wilson, William O; Aickelin, Uwe

    2010-01-01

    The search for patterns or motifs in data represents an area of key interest to many researchers. In this paper we present the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs which repeat within time series data. The power of the algorithm is derived from its use of a small number of parameters with minimal assumptions. The algorithm searches from a completely neutral perspective that is independent of the data being analysed, and the underlying motifs. In this paper the motif tracking algorithm is applied to the search for patterns within sequences of low level system calls between the Linux kernel and the operating system's user space. The MTA is able to compress data found in large system call data sets to a limited number of motifs which summarise that data. The motifs provide a resource from which a profile of executed processes can be built. The potential for these profiles and new implications for security research are highlighted. A...

  7. Parametric bootstrapping for biological sequence motifs.

    Science.gov (United States)

    O'Neill, Patrick K; Erill, Ivan

    2016-10-06

    Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively. We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif's positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators. Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics

  8. MEME: discovering and analyzing DNA and protein sequence motifs.

    Science.gov (United States)

    Bailey, Timothy L; Williams, Nadya; Misleh, Chris; Li, Wilfred W

    2006-07-01

    MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel 'signals' in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. Users can perform MEME searches via the web server hosted by the National Biomedical Computation Resource (http://meme.nbcr.net) and several mirror sites. Through the same web server, users can also access the Motif Alignment and Search Tool to search sequence databases for matches to motifs encoded in several popular formats. By clicking on buttons in the MEME output, users can compare the motifs discovered in their input sequences with databases of known motifs, search sequence databases for matches to the motifs and display the motifs in various formats. This article describes the freely accessible web server and its architecture, and discusses ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance.

  9. Motif Yggdrasil: sampling sequence motifs from a tree mixture model.

    Science.gov (United States)

    Andersson, Samuel A; Lagergren, Jens

    2007-06-01

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.

  10. Detecting correlations among functional-sequence motifs

    Science.gov (United States)

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.

  11. Sublinear Time Motif Discovery from Multiple Sequences

    Directory of Open Access Journals (Sweden)

    Yunhui Fu

    2013-10-01

    Full Text Available In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet, Σ. A motif G = g1g2 ... gm is a string of m characters. In each background sequence is implanted a probabilistically-generated approximate copy of G. For a probabilistically-generated approximate copy b1b2 ... bm of G, every character, bi, is probabilistically generated, such that the probability for bi ≠ gi is at most α. We develop two new randomized algorithms and one new deterministic algorithm. They make advancements in the following aspects: (1 The algorithms are much faster than those before. Our algorithms can even run in sublinear time. (2 They can handle any motif pattern. (3 The restriction for the alphabet size is a lower bound of four. This gives them potential applications in practical problems, since gene sequences have an alphabet size of four. (4 All algorithms have rigorous proofs about their performances. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other software.

  12. Mining of simple sequence repeats in the Genome of Gentianaceae

    Directory of Open Access Journals (Sweden)

    R Sathishkumar

    2011-01-01

    Full Text Available Simple sequence repeats (SSRs or short tandem repeats are short repeat motifs that show high level of length polymorphism due to insertion or deletion mutations of one or more repeat types. Here, we present the detection and abundance of microsatellites or SSRs in nucleotide sequences of Gentianaceae family. A total of 545 SSRs were mined in 4698 nucleotide sequences downloaded from the National Center for Biotechnology Information (NCBI. Among the SSR sequences, the frequency of repeat type was about 429 -mono repeats, 99 -di repeats, 15 -tri repeats, and 2 --hexa repeats. Mononucleotide repeats were found to be abundant repeat types, about 78%, followed by dinucleotide repeats (18.16% among the SSR sequences. An attempt was made to design primer pairs for 545 identified SSRs but these were found only for 169 sequences.

  13. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  14. LRRCE: a leucine-rich repeat cysteine capping motif unique to the chordate lineage

    Directory of Open Access Journals (Sweden)

    Bishop Paul N

    2008-12-01

    Full Text Available Abstract Background The small leucine-rich repeat proteins and proteoglycans (SLRPs form an important family of regulatory molecules that participate in many essential functions. They typically control the correct assembly of collagen fibrils, regulate mineral deposition in bone, and modulate the activity of potent cellular growth factors through many signalling cascades. SLRPs belong to the group of extracellular leucine-rich repeat proteins that are flanked at both ends by disulphide-bonded caps that protect the hydrophobic core of the terminal repeats. A capping motif specific to SLRPs has been recently described in the crystal structures of the core proteins of decorin and biglycan. This motif, designated as LRRCE, differs in both sequence and structure from other, more widespread leucine-rich capping motifs. To investigate if the LRRCE motif is a common structural feature found in other leucine-rich repeat proteins, we have defined characteristic sequence patterns and used them in genome-wide searches. Results The LRRCE motif is a structural element exclusive to the main group of SLRPs. It appears to have evolved during early chordate evolution and is not found in protein sequences from non-chordate genomes. Our search has expanded the family of SLRPs to include new predicted protein sequences, mainly in fishes but with intriguing putative orthologs in mammals. The chromosomal locations of the newly predicted SLRP genes would support the large-scale genome or gene duplications that are thought to have occurred during vertebrate evolution. From this expanded list we describe a new class of SLRP sequences that could be representative of an ancestral SLRP gene. Conclusion Given its exclusivity the LRRCE motif is a useful annotation tool for the identification and classification of new SLRP sequences in genome databases. The expanded list of members of the SLRP family offers interesting insights into early vertebrate evolution and suggests an

  15. Discovering motifs in ranked lists of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Eran Eden

    2007-03-01

    Full Text Available Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray measurements. Several major challenges in sequence motif discovery still require consideration: (i the need for a principled approach to partitioning the data into target and background sets; (ii the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii the need for an appropriate framework for accounting for motif multiplicity; (iv the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs, which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i Identification of 50 novel putative transcription factor (TF binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked

  16. seeMotif: exploring and visualizing sequence motifs in 3D structures

    OpenAIRE

    2009-01-01

    Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D st...

  17. rMotifGen: random motif generator for DNA and protein sequences

    Directory of Open Access Journals (Sweden)

    Hardin C Timothy

    2007-08-01

    Full Text Available Abstract Background Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM. Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Results Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. Conclusion rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.

  18. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  19. seeMotif: exploring and visualizing sequence motifs in 3D structures

    Science.gov (United States)

    Chang, Darby Tien-Hao; Chien, Ting-Ying; Chen, Chien-Yu

    2009-01-01

    Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D structures that have also been accumulated at an astounding rate in recent years. This article reports the development of the web service seeMotif, which provides users with an interactive interface for visualizing sequence motifs on protein structures from the Protein Data Bank (PDB). Researchers can quickly see the locations and conformation of multiple motifs among a number of related structures simultaneously. Considering the fact that PDB sequences are usually shorter than those in sequence databases and/or may have missing residues, seeMotif has two complementary approaches for selecting structures and mapping motifs to protein chains in structures. As more and more structures belonging to previously uncharacterized protein families become available, combining sequence and structure information gives good opportunities to facilitate understanding of protein functions in large-scale genome projects. Available at: http://seemotif.csie.ntu.edu.tw,http://seemotif.ee.ncku.edu.tw or http://seemotif.csbb.ntu.edu.tw. PMID:19477961

  20. seeMotif: exploring and visualizing sequence motifs in 3D structures.

    Science.gov (United States)

    Chang, Darby Tien-Hao; Chien, Ting-Ying; Chen, Chien-Yu

    2009-07-01

    Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D structures that have also been accumulated at an astounding rate in recent years. This article reports the development of the web service seeMotif, which provides users with an interactive interface for visualizing sequence motifs on protein structures from the Protein Data Bank (PDB). Researchers can quickly see the locations and conformation of multiple motifs among a number of related structures simultaneously. Considering the fact that PDB sequences are usually shorter than those in sequence databases and/or may have missing residues, seeMotif has two complementary approaches for selecting structures and mapping motifs to protein chains in structures. As more and more structures belonging to previously uncharacterized protein families become available, combining sequence and structure information gives good opportunities to facilitate understanding of protein functions in large-scale genome projects. Available at: http://seemotif.csie.ntu.edu.tw,http://seemotif.ee.ncku.edu.tw or http://seemotif.csbb.ntu.edu.tw.

  1. Probabilistic models for semisupervised discriminative motif discovery in DNA sequences.

    Science.gov (United States)

    Kim, Jong Kyoung; Choi, Seungjin

    2011-01-01

    Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.

  2. Assembly of supramolecular DNA complexes containing both G-quadruplexes and i-motifs by enhancing the G-repeat-bearing capacity of i-motifs

    Science.gov (United States)

    Cao, Yanwei; Gao, Shang; Yan, Yuting; Bruist, Michael F.; Wang, Bing; Guo, Xinhua

    2017-01-01

    The single-step assembly of supramolecular complexes containing both i-motifs and G-quadruplexes (G4s) is demonstrated. This can be achieved because the formation of four-stranded i-motifs appears to be little affected by certain terminal residues: a five-cytosine tetrameric i-motif can bear ten-base flanking residues. However, things become complex when different lengths of guanine-repeats are added at the 3′ or 5′ ends of the cytosine-repeats. Here, a series of oligomers d(XGiXC5X) and d(XC5XGiX) (X = A, T or none; i < 5) are designed to study the impact of G-repeats on the formation of tetrameric i-motifs. Our data demonstrate that tetramolecular i-motif structure can tolerate specific flanking G-repeats. Assemblies of these oligonucleotides are polymorphic, but may be controlled by solution pH and counter ion species. Importantly, we find that the sequences d(TGiAC5) can form the tetrameric i-motif in large quantities. This leads to the design of two oligonucleotides d(TG4AC7) and d(TGBrGGBrGAC7) that self-assemble to form quadruplex supramolecules under certain conditions. d(TG4AC7) forms supramolecules under acidic conditions in the presence of K+ that are mainly V-shaped or ring-like containing parallel G4s and antiparallel i-motifs. d(TGBrGGBrGAC7) forms long linear quadruplex wires under acidic conditions in the presence of Na+ that consist of both antiparallel G4s and i-motifs. PMID:27899568

  3. A discriminative approach for unsupervised clustering of DNA sequence motifs.

    Directory of Open Access Journals (Sweden)

    Philip Stegmaier

    Full Text Available Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.

  4. Identification of protein superfamily from structure- based sequence motif

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The structure-based sequence motif of the distant proteins in evolution, protein tyrosine phosphatases (PTP) Ⅰ and Ⅱ superfamilies, as an example, has been defined by the structural comparison, structure-based sequence alignment and analyses on substitution patterns of residues in common sequence conserved regions. And the phosphatases Ⅰ and Ⅱ can be correctly identified together by the structure-based PTP sequence motif from SWISS-PROT and TrEBML databases. The results show that the correct rates of identification are over 98%. This is the first time to identify PTP Ⅰ and Ⅱ together by this motif.

  5. Bases of motifs for generating repeated patterns with wild cards

    OpenAIRE

    Pisanti, Nadia; Crochemore, Maxime; Grossi, Roberto; Sagot, Marie-France

    2005-01-01

    Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive e...

  6. Identification of sequence motifs significantly associated with antisense activity

    Directory of Open Access Journals (Sweden)

    Peek Andrew S

    2007-06-01

    Full Text Available Abstract Background Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. Results We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. Conclusion The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic

  7. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    , selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes...... and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms...

  8. Dissection of thousands of cell type-specific enhancers identifies dinucleotide repeat motifs as general enhancer features.

    Science.gov (United States)

    Yáñez-Cuna, J Omar; Arnold, Cosmas D; Stampfel, Gerald; Boryń, Lukasz M; Gerlach, Daniel; Rath, Martina; Stark, Alexander

    2014-07-01

    Gene expression is determined by genomic elements called enhancers, which contain short motifs bound by different transcription factors (TFs). However, how enhancer sequences and TF motifs relate to enhancer activity is unknown, and general sequence requirements for enhancers or comprehensive sets of important enhancer sequence elements have remained elusive. Here, we computationally dissect thousands of functional enhancer sequences from three different Drosophila cell lines. We find that the enhancers display distinct cis-regulatory sequence signatures, which are predictive of the enhancers' cell type-specific or broad activities. These signatures contain transcription factor motifs and a novel class of enhancer sequence elements, dinucleotide repeat motifs (DRMs). DRMs are highly enriched in enhancers, particularly in enhancers that are broadly active across different cell types. We experimentally validate the importance of the identified TF motifs and DRMs for enhancer function and show that they can be sufficient to create an active enhancer de novo from a nonfunctional sequence. The function of DRMs as a novel class of general enhancer features that are also enriched in human regulatory regions might explain their implication in several diseases and provides important insights into gene regulation.

  9. Simple sequence repeats in mycobacterial genomes

    Indian Academy of Sciences (India)

    Vattipally B Sreenu; Pankaj Kumar; Javaregowda Nagaraju; Hampapathalu A Nagarajaram

    2007-01-01

    Simple sequence repeats (SSRs) or microsatellites are the repetitive nucleotide sequences of motifs of length 1–6 bp. They are scattered throughout the genomes of all the known organisms ranging from viruses to eukaryotes. Microsatellites undergo mutations in the form of insertions and deletions (INDELS) of their repeat units with some bias towards insertions that lead to microsatellite tract expansion. Although prokaryotic genomes derive some plasticity due to microsatellite mutations they have in-built mechanisms to arrest undue expansions of microsatellites and one such mechanism is constituted by post-replicative DNA repair enzymes MutL, MutH and MutS. The mycobacterial genomes lack these enzymes and as a null hypothesis one could expect these genomes to harbour many long tracts. It is therefore interesting to analyse the mycobacterial genomes for distribution and abundance of microsatellites tracts and to look for potentially polymorphic microsatellites. Available mycobacterial genomes, Mycobacterium avium, M. leprae, M. bovis and the two strains of M. tuberculosis (CDC1551 and H37Rv) were analysed for frequencies and abundance of SSRs. Our analysis revealed that the SSRs are distributed throughout the mycobacterial genomes at an average of 220–230 SSR tracts per kb. All the mycobacterial genomes contain few regions that are conspicuously denser or poorer in microsatellites compared to their expected genome averages. The genomes distinctly show scarcity of long microsatellites despite the absence of a post-replicative DNA repair system. Such severe scarcity of long microsatellites could arise as a result of strong selection pressures operating against long and unstable sequences although influence of GC-content and role of point mutations in arresting microsatellite expansions can not be ruled out. Nonetheless, the long tracts occasionally found in coding as well as non-coding regions may account for limited genome plasticity in these genomes.

  10. WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches.

    Science.gov (United States)

    Romer, Katherine A; Kayombya, Guy-Richard; Fraenkel, Ernest

    2007-07-01

    WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs.

  11. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes

    OpenAIRE

    Kumar, Pankaj; Chaitanya, Pasumarthy S.; Nagarajaram, Hampapathalu A

    2010-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1–6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in s...

  12. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome

    Directory of Open Access Journals (Sweden)

    Santosh K. Tiwari

    2011-01-01

    Full Text Available The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs, in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0 software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5 software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  13. Exploitation of peptide motif sequences and their use in nanobiotechnology.

    Science.gov (United States)

    Shiba, Kiyotaka

    2010-08-01

    Short amino acid sequences extracted from natural proteins or created using in vitro evolution systems are sometimes associated with particular biological functions. These peptides, called peptide motifs, can serve as functional units for the creation of various tools for nanobiotechnology. In particular, peptide motifs that have the ability to specifically recognize the surfaces of solid materials and to mineralize certain inorganic materials have been linking biological science to material science. Here, I review how these peptide motifs have been isolated from natural proteins or created using in vitro evolution systems, and how they have been used in the nanobiotechnology field.

  14. Sequence Length Limits for Controlling False Positives in Discovering Nucleotide Sequence Motifs

    Institute of Scientific and Technical Information of China (English)

    CHEN Lei; QiAN Zi-liang

    2008-01-01

    In the study of motif discovery, especially the transcription factor DNA binding sites discovery, a too long input sequence would return non-informative motifs rather than those biological functional motifs. This paper gave theoretical analyses and computational experiments to suggest the length limits of the input sequence. When the sequence length exceeds a certain critical point, the probability of discovering the motif decreases sharply. The work not only gave an explanation on the unsatisfying results of the existed motif discovery problems that the input sequence length might be too long and exceed the point, but also provided an estimation of input sequence length we should accept to get more meaningful and reliable results in motif discovery.

  15. Methods for sequencing GC-rich and CCT repeat DNA templates

    Science.gov (United States)

    Robinson, Donna L.

    2007-02-20

    The present invention is directed to a PCR-based method of cycle sequencing DNA and other polynucleotide sequences having high CG content and regions of high GC content, and includes for example DNA strands with a high Cytosine and/or Guanosine content and repeated motifs such as CCT repeats.

  16. The distribution of RNA motifs in natural sequences.

    Science.gov (United States)

    Bourdeau, V; Ferbeyre, G; Pageau, M; Paquin, B; Cedergren, R

    1999-11-15

    Functional analysis of genome sequences has largely ignored RNA genes and their structures. We introduce here the notion of 'ribonomics' to describe the search for the distribution of and eventually the determination of the physiological roles of these RNA structures found in the sequence databases. The utility of this approach is illustrated here by the identification in the GenBank database of RNA motifs having known binding or chemical activity. The frequency of these motifs indicates that most have originated from evolutionary drift and are selectively neutral. On the other hand, their distribution among species and their location within genes suggest that the destiny of these motifs may be more elaborate. For example, the hammerhead motif has a skewed organismal presence, is phylogenetically stable and recent work on a schistosome version confirms its in vivo biological activity. The under-representation of the valine-binding motif and the Rev-binding element in GenBank hints at a detrimental effect on cell growth or viability. Data on the presence and the location of these motifs may provide critical guidance in the design of experiments directed towards the understanding and the manipulation of RNA complexes and activities in vivo.

  17. Exact Tandem Repeats Analyzer (E-TRA): A new program for DNA sequence mining

    Indian Academy of Sciences (India)

    Mehmet Karaca; Mehmet Bilgen; A. Naci Onus; Ayse Gul Ince; Safinaz Y. Elmasulu

    2005-04-01

    Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.

  18. Chaotic motif sampler: detecting motifs from biological sequences by using chaotic neurodynamics

    Science.gov (United States)

    Matsuura, Takafumi; Ikeguchi, Tohru

    Identification of a region in biological sequences, motif extraction problem (MEP) is solved in bioinformatics. However, the MEP is an NP-hard problem. Therefore, it is almost impossible to obtain an optimal solution within a reasonable time frame. To find near optimal solutions for NP-hard combinatorial optimization problems such as traveling salesman problems, quadratic assignment problems, and vehicle routing problems, chaotic search, which is one of the deterministic approaches, has been proposed and exhibits better performance than stochastic approaches. In this paper, we propose a new alignment method that employs chaotic dynamics to solve the MEPs. It is called the Chaotic Motif Sampler. We show that the performance of the Chaotic Motif Sampler is considerably better than that of the conventional methods such as the Gibbs Site Sampler and the Neighborhood Optimization for Multiple Alignment Discovery.

  19. Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences.

    Science.gov (United States)

    Stepančič, Ziva

    2014-10-01

    Finding short patterns with residue variation in a set of sequences is still an open problem in genetics, since motif-finding techniques on DNA and protein sequences are inconclusive on real data sets and their performance varies on different species. Hence, finding new algorithms and evolving established methods are vital to further understanding of genome properties and the mechanisms of protein development. In this work, we present an approach to finding functional motifs in DNA sequences in connection to Gibbs sampling method. Starting points in the search space are partly determined via graphical representation of input sequences opposed to completely random initial points with the standard Gibbs sampling. Our algorithm is evaluated on synthetic as well as on real data sets by using several statistics, such as sensitivity, positive predictive value, specificity, performance, and correlation coefficient. Additionally, a comparison between our algorithm and the basic standard Gibbs sampling algorithm is made to show improvement in accuracy, repeatability, and performance.

  20. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  1. Characterization of simple sequence repeats (SSRs from Phlebotomus papatasi (Diptera: Psychodidae expressed sequence tags (ESTs

    Directory of Open Access Journals (Sweden)

    Hamarsheh Omar

    2011-09-01

    Full Text Available Abstract Background Phlebotomus papatasi is a natural vector of Leishmania major, which causes cutaneous leishmaniasis in many countries. Simple sequence repeats (SSRs, or microsatellites, are common in eukaryotic genomes and are short, repeated nucleotide sequence elements arrayed in tandem and flanked by non-repetitive regions. The enrichment methods used previously for finding new microsatellite loci in sand flies remain laborious and time consuming; in silico mining, which includes retrieval and screening of microsatellites from large amounts of sequence data from sequence data bases using microsatellite search tools can yield many new candidate markers. Results Simple sequence repeats (SSRs were characterized in P. papatasi expressed sequence tags (ESTs derived from a public database, National Center for Biotechnology Information (NCBI. A total of 42,784 sequences were mined, and 1,499 SSRs were identified with a frequency of 3.5% and an average density of 15.55 kb per SSR. Dinucleotide motifs were the most common SSRs, accounting for 67% followed by tri-, tetra-, and penta-nucleotide repeats, accounting for 31.1%, 1.5%, and 0.1%, respectively. The length of microsatellites varied from 5 to 16 repeats. Dinucleotide types; AG and CT have the highest frequency. Dinucleotide SSR-ESTs are relatively biased toward an excess of (AXn repeats and a low GC base content. Forty primer pairs were designed based on motif lengths for further experimental validation. Conclusion The first large-scale survey of SSRs derived from P. papatasi is presented; dinucleotide SSRs identified are more frequent than other types. EST data mining is an effective strategy to identify functional microsatellites in P. papatasi.

  2. Finding Common Sequence and Structure Motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.

    1997-01-01

    We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences...

  3. Development and characterization of simple sequence repeats for Bipolaris sokiniana and cross transferability to related species

    Science.gov (United States)

    Simple sequence repeats (SSR) markers were developed from a small insert genomic library for Bipolaris sorokiniana, a mitosporic fungal pathogen that causes spot blotch and root rot in switchgrass. About 59% of sequenced clones (n=384) harbored various SSR motifs. After eliminating the redundant seq...

  4. Multineuronal Spike Sequences Repeat with Millisecond Precision

    Directory of Open Access Journals (Sweden)

    Koki eMatsumoto

    2013-06-01

    Full Text Available Cortical microcircuits are nonrandomly wired by neurons. As a natural consequence, spikes emitted by microcircuits are also nonrandomly patterned in time and space. One of the prominent spike organizations is a repetition of fixed patterns of spike series across multiple neurons. However, several questions remain unsolved, including how precisely spike sequences repeat, how the sequences are spatially organized, how many neurons participate in sequences, and how different sequences are functionally linked. To address these questions, we monitored spontaneous spikes of hippocampal CA3 neurons ex vivo using a high-speed functional multineuron calcium imaging technique that allowed us to monitor spikes with millisecond resolution and to record the location of spiking and nonspiking neurons. Multineuronal spike sequences were overrepresented in spontaneous activity compared to the statistical chance level. Approximately 75% of neurons participated in at least one sequence during our observation period. The participants were sparsely dispersed and did not show specific spatial organization. The number of sequences relative to the chance level decreased when larger time frames were used to detect sequences. Thus, sequences were precise at the millisecond level. Sequences often shared common spikes with other sequences; parts of sequences were subsequently relayed by following sequences, generating complex chains of multiple sequences.

  5. Role of direct repeat and stem-loop motifs in mtDNA deletions: cause or coincidence?

    Directory of Open Access Journals (Sweden)

    Lakshmi Narayanan Lakshmanan

    Full Text Available Deletion mutations within mitochondrial DNA (mtDNA have been implicated in degenerative and aging related conditions, such as sarcopenia and neuro-degeneration. While the precise molecular mechanism of deletion formation in mtDNA is still not completely understood, genome motifs such as direct repeat (DR and stem-loop (SL have been observed in the neighborhood of deletion breakpoints and thus have been postulated to take part in mutagenesis. In this study, we have analyzed the mitochondrial genomes from four different mammals: human, rhesus monkey, mouse and rat, and compared them to randomly generated sequences to further elucidate the role of direct repeat and stem-loop motifs in aging associated mtDNA deletions. Our analysis revealed that in the four species, DR and SL structures are abundant and that their distributions in mtDNA are not statistically different from randomized sequences. However, the average distance between the reported age associated mtDNA breakpoints and their respective nearest DR motifs is significantly shorter than what is expected of random chance in human (p10 bp tend to decrease with increasing lifespan among the four mammals studied here, further suggesting an evolutionary selection against stable mtDNA misalignments associated with long DRs in long-living animals. In contrast to the results on DR, the probability of finding SL motifs near a deletion breakpoint does not differ from random in any of the four mtDNA sequences considered. Taken together, the findings in this study give support for the importance of stable mtDNA misalignments, aided by long DRs, as a major mechanism of deletion formation in long-living, but not in short-living mammals.

  6. Sequence-dependent stability test of a left-handed β-helix motif.

    Science.gov (United States)

    Hayre, Natha R; Singh, Rajiv R P; Cox, Daniel L

    2012-03-21

    The left-handed β-helix (LHBH) is an intriguing, rare structural pattern in polypeptides that has been implicated in the formation of amyloid aggregates. We used accurate all-atom replica-exchange molecular dynamics (REMD) simulations to study the relative stability of diverse sequences in the LHBH conformation. Ensemble-average coordinates from REMD served as a scoring criterion to identify sequences and threadings optimally suited to the LHBH, as in a fold recognition paradigm. We examined the repeatability of our REMD simulations, finding that single simulations can be reliable to a quantifiable extent. We find expected behavior for the positive and negative control cases of a native LHBH and intrinsically disordered sequences, respectively. Polyglutamine and a designed hexapeptide repeat show remarkable affinity for the LHBH motif. A structural model for misfolded murine prion protein was also considered, and showed intermediate stability under the given conditions. Our technique is found to be an effective probe of LHBH stability, and promises to be scalable to broader studies of this and potentially other novel or rare motifs. The superstable character of the designed hexapeptide repeat suggests theoretical and experimental follow-ups.

  7. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  8. Identification of imine reductase-specific sequence motifs.

    Science.gov (United States)

    Fademrecht, Silvia; Scheller, Philipp N; Nestl, Bettina M; Hauer, Bernhard; Pleiss, Jürgen

    2016-05-01

    Chiral amines are valuable building blocks for the production of a variety of pharmaceuticals, agrochemicals and other specialty chemicals. Only recently, imine reductases (IREDs) were discovered which catalyze the stereoselective reduction of imines to chiral amines. Although several IREDs were biochemically characterized in the last few years, knowledge of the reaction mechanism and the molecular basis of substrate specificity and stereoselectivity is limited. To gain further insights into the sequence-function relationships, the Imine Reductase Engineering Database (www.IRED.BioCatNet.de) was established and a systematic analysis of 530 putative IREDs was performed. A standard numbering scheme based on R-IRED-Sk was introduced to facilitate the identification and communication of structurally equivalent positions in different proteins. A conservation analysis revealed a highly conserved cofactor binding region and a predominantly hydrophobic substrate binding cleft. Two IRED-specific motifs were identified, the cofactor binding motif GLGxMGx(5 )[ATS]x(4) Gx(4) [VIL]WNR[TS]x(2) [KR] and the active site motif Gx[DE]x[GDA]x[APS]x(3){K}x[ASL]x[LMVIAG]. Our results indicate a preference toward NADPH for all IREDs and explain why, despite their sequence similarity to β-hydroxyacid dehydrogenases (β-HADs), no conversion of β-hydroxyacids has been observed. Superfamily-specific conservations were investigated to explore the molecular basis of their stereopreference. Based on our analysis and previous experimental results on IRED mutants, an exclusive role of standard position 187 for stereoselectivity is excluded. Alternatively, two standard positions 139 and 194 were identified which are superfamily-specifically conserved and differ in R- and S-selective enzymes. © 2016 Wiley Periodicals, Inc.

  9. Phosphotyrosine Substrate Sequence Motifs for Dual Specificity Phosphatases.

    Directory of Open Access Journals (Sweden)

    Bryan M Zhao

    Full Text Available Protein tyrosine phosphatases dephosphorylate tyrosine residues of proteins, whereas, dual specificity phosphatases (DUSPs are a subgroup of protein tyrosine phosphatases that dephosphorylate not only Tyr(P residue, but also the Ser(P and Thr(P residues of proteins. The DUSPs are linked to the regulation of many cellular functions and signaling pathways. Though many cellular targets of DUSPs are known, the relationship between catalytic activity and substrate specificity is poorly defined. We investigated the interactions of peptide substrates with select DUSPs of four types: MAP kinases (DUSP1 and DUSP7, atypical (DUSP3, DUSP14, DUSP22 and DUSP27, viral (variola VH1, and Cdc25 (A-C. Phosphatase recognition sites were experimentally determined by measuring dephosphorylation of 6,218 microarrayed Tyr(P peptides representing confirmed and theoretical phosphorylation motifs from the cellular proteome. A broad continuum of dephosphorylation was observed across the microarrayed peptide substrates for all phosphatases, suggesting a complex relationship between substrate sequence recognition and optimal activity. Further analysis of peptide dephosphorylation by hierarchical clustering indicated that DUSPs could be organized by substrate sequence motifs, and peptide-specificities by phylogenetic relationships among the catalytic domains. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential targets of DUSPs. These newly identified DUSP substrates will be important for examining structure-activity relationships with physiologically relevant targets.

  10. DNA consensus sequence motif for binding response regulator PhoP, a virulence regulator of Mycobacterium tuberculosis.

    Science.gov (United States)

    He, Xiaoyuan; Wang, Shuishu

    2014-12-30

    Tuberculosis has reemerged as a serious threat to human health because of the increasing prevalence of drug-resistant strains and synergetic infection with HIV, prompting an urgent need for new and more efficient treatments. The PhoP-PhoR two-component system of Mycobacterium tuberculosis plays an important role in the virulence of the pathogen and thus represents a potential drug target. To study the mechanism of gene transcription regulation by response regulator PhoP, we identified a high-affinity DNA sequence for PhoP binding using systematic evolution of ligands by exponential enrichment. The sequence contains a direct repeat of two 7 bp motifs separated by a 4 bp spacer, TCACAGC(N4)TCACAGC. The specificity of the direct-repeat sequence for PhoP binding was confirmed by isothermal titration calorimetry and electrophoretic mobility shift assays. PhoP binds to the direct repeat as a dimer in a highly cooperative manner. We found many genes previously identified to be regulated by PhoP that contain the direct-repeat motif in their promoter sequences. Synthetic DNA fragments at the putative promoter-binding sites bind PhoP with variable affinity, which is related to the number of mismatches in the 7 bp motifs, the positions of the mismatches, and the spacer and flanking sequences. Phosphorylation of PhoP increases the affinity but does not change the specificity of DNA binding. Overall, our results confirm the direct-repeat sequence as the consensus motif for PhoP binding and thus pave the way for identification of PhoP directly regulated genes in different mycobacterial genomes.

  11. Sequence motifs associated with hepatotoxicity of locked nucleic acid--modified antisense oligonucleotides.

    Science.gov (United States)

    Burdick, Andrew D; Sciabola, Simone; Mantena, Srinivasa R; Hollingshead, Brett D; Stanton, Robert; Warneke, James A; Zeng, Ming; Martsen, Elena; Medvedev, Alexander; Makarov, Sergei S; Reed, Lori A; Davis, John W; Whiteley, Laurence O

    2014-04-01

    Fully phosphorothioate antisense oligonucleotides (ASOs) with locked nucleic acids (LNAs) improve target affinity, RNase H activation and stability. LNA modified ASOs can cause hepatotoxicity, and this risk is currently not fully understood. In vitro cytotoxicity screens have not been reliable predictors of hepatic toxicity in non-clinical testing; however, mice are considered to be a sensitive test species. To better understand the relationship between nucleotide sequence and hepatotoxicity, a structure-toxicity analysis was performed using results from 2 week repeated-dose-tolerability studies in mice administered LNA-modified ASOs. ASOs targeting human Apolipoprotien C3 (Apoc3), CREB (cAMP Response Element Binding Protein) Regulated Transcription Coactivator 2 (Crtc2) or Glucocorticoid Receptor (GR, NR3C1) were classified based upon the presence or absence of hepatotoxicity in mice. From these data, a random-decision forest-classification model generated from nucleotide sequence descriptors identified two trinucleotide motifs (TCC and TGC) that were present only in hepatotoxic sequences. We found that motif containing sequences were more likely to bind to hepatocellular proteins in vitro and increased P53 and NRF2 stress pathway activity in vivo. These results suggest in silico approaches can be utilized to establish structure-toxicity relationships of LNA-modified ASOs and decrease the likelihood of hepatotoxicity in preclinical testing.

  12. Sequence motifs associated with hepatotoxicity of locked nucleic acid—modified antisense oligonucleotides

    Science.gov (United States)

    Burdick, Andrew D.; Sciabola, Simone; Mantena, Srinivasa R.; Hollingshead, Brett D.; Stanton, Robert; Warneke, James A.; Zeng, Ming; Martsen, Elena; Medvedev, Alexander; Makarov, Sergei S.; Reed, Lori A.; Davis, John W.; Whiteley, Laurence O.

    2014-01-01

    Fully phosphorothioate antisense oligonucleotides (ASOs) with locked nucleic acids (LNAs) improve target affinity, RNase H activation and stability. LNA modified ASOs can cause hepatotoxicity, and this risk is currently not fully understood. In vitro cytotoxicity screens have not been reliable predictors of hepatic toxicity in non-clinical testing; however, mice are considered to be a sensitive test species. To better understand the relationship between nucleotide sequence and hepatotoxicity, a structure–toxicity analysis was performed using results from 2 week repeated-dose-tolerability studies in mice administered LNA-modified ASOs. ASOs targeting human Apolipoprotien C3 (Apoc3), CREB (cAMP Response Element Binding Protein) Regulated Transcription Coactivator 2 (Crtc2) or Glucocorticoid Receptor (GR, NR3C1) were classified based upon the presence or absence of hepatotoxicity in mice. From these data, a random-decision forest-classification model generated from nucleotide sequence descriptors identified two trinucleotide motifs (TCC and TGC) that were present only in hepatotoxic sequences. We found that motif containing sequences were more likely to bind to hepatocellular proteins in vitro and increased P53 and NRF2 stress pathway activity in vivo. These results suggest in silico approaches can be utilized to establish structure–toxicity relationships of LNA-modified ASOs and decrease the likelihood of hepatotoxicity in preclinical testing. PMID:24550163

  13. Motif Discovery in Tissue-Specific Regulatory Sequences Using Directed Information

    Directory of Open Access Journals (Sweden)

    States David

    2007-01-01

    Full Text Available Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often leads to discovery of novel motifs (including transcription factor sites with previously uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work, we present an approach to the identification of motifs (not necessarily transcription factor sites and examine its application to some questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific gene promoter or regulatory regions from those that are not tissue-specific. There are two main contributions of this work. Firstly, we propose the use of directed information for such classification constrained motif discovery, and then use the selected features with a support vector machine (SVM classifier to find the tissue specificity of any sequence of interest. Such analysis yields several novel interesting motifs that merit further experimental characterization. Furthermore, this approach leads to a principled framework for the prospective examination of any chosen motif to be discriminatory motif for a group of coexpressed/coregulated genes, thereby integrating sequence and expression perspectives. We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue-specific regulatory role of any conserved sequence element identified from genome-wide studies.

  14. Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder

    Science.gov (United States)

    Sharov, Alexei A.; Ko, Minoru S.H.

    2009-01-01

    We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cells based on published chromatin immunoprecipitation sequencing data. Furthermore, CisFinder successfully identified alternative binding motifs of TFs (e.g. POU5F1, ESRRB, and CTCF) and motifs for known and unknown co-factors of genes associated with the pluripotent state of ES cells. CisFinder also showed robust performance in the identification of motifs that were only slightly enriched in a set of DNA sequences. PMID:19740934

  15. HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

    Science.gov (United States)

    Le, Thanh; Altman, Tom; Gardiner, Katheleen

    2010-02-01

    Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

  16. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing

    Directory of Open Access Journals (Sweden)

    David H. Warshauer

    2015-08-01

    Full Text Available Massively parallel sequencing (MPS technology is capable of determining the sizes of short tandem repeat (STR alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics. The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles.

  17. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing

    Institute of Scientific and Technical Information of China (English)

    David H Warshauer; Jennifer D Churchill; Nicole Novroski; Jonathan L King; Bruce Budowle

    2015-01-01

    Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles.

  18. Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder

    OpenAIRE

    Sharov, Alexei A; Minoru S.H. Ko

    2009-01-01

    We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cel...

  19. Mitoxantrone and Analogues Bind and Stabilize i-Motif Forming DNA Sequences

    Science.gov (United States)

    Wright, Elisé P.; Day, Henry A.; Ibrahim, Ali M.; Kumar, Jeethendra; Boswell, Leo J. E.; Huguin, Camille; Stevenson, Clare E. M.; Pors, Klaus; Waller, Zoë A. E.

    2016-12-01

    There are hundreds of ligands which can interact with G-quadruplex DNA, yet very few which target i-motif. To appreciate an understanding between the dynamics between these structures and how they can be affected by intervention with small molecule ligands, more i-motif binding compounds are required. Herein we describe how the drug mitoxantrone can bind, induce folding of and stabilise i-motif forming DNA sequences, even at physiological pH. Additionally, mitoxantrone was found to bind i-motif forming sequences preferentially over double helical DNA. We also describe the stabilisation properties of analogues of mitoxantrone. This offers a new family of ligands with potential for use in experiments into the structure and function of i-motif forming DNA sequences.

  20. Distinct repeat motifs at the C-terminal region of CagA of Helicobacter pylori strains isolated from diseased patients and asymptomatic individuals in West Bengal, India

    Directory of Open Access Journals (Sweden)

    Chattopadhyay Santanu

    2012-05-01

    Full Text Available Abstract Background Infection with Helicobacter pylori strains that express CagA is associated with gastritis, peptic ulcer disease, and gastric adenocarcinoma. The biological function of CagA depends on tyrosine phosphorylation by a cellular kinase. The phosphate acceptor tyrosine moiety is present within the EPIYA motif at the C-terminal region of the protein. This region is highly polymorphic due to variations in the number of EPIYA motifs and the polymorphism found in spacer regions among EPIYA motifs. The aim of this study was to analyze the polymorphism at the C-terminal end of CagA and to evaluate its association with the clinical status of the host in West Bengal, India. Results Seventy-seven H. pylori strains isolated from patients with various clinical statuses were used to characterize the C-ternimal polymorphic region of CagA. Our analysis showed that there is no correlation between the previously described CagA types and various disease outcomes in Indian context. Further analyses of different CagA structures revealed that the repeat units in the spacer sequences within the EPIYA motifs are actually more discrete than the previously proposed models of CagA variants. Conclusion Our analyses suggest that EPIYA motifs as well as the spacer sequence units are present as distinct insertions and deletions, which possibly have arisen from extensive recombination events. Moreover, we have identified several new CagA types, which could not be typed by the existing systems and therefore, we have proposed a new typing system. We hypothesize that a cagA gene encoding higher number EPIYA motifs may perhaps have arisen from cagA genes that encode lesser EPIYA motifs by acquisition of DNA segments through recombination events.

  1. Screening of repetitive motifs inside the genome of the flat oyster (Ostrea edulis): Transposable elements and short tandem repeats.

    Science.gov (United States)

    Vera, Manuel; Bello, Xabier; Álvarez-Dios, Jose-Antonio; Pardo, Belen G; Sánchez, Laura; Carlsson, Jens; Carlsson, Jeanette E L; Bartolomé, Carolina; Maside, Xulio; Martinez, Paulino

    2015-12-01

    The flat oyster (Ostrea edulis) is one of the most appreciated molluscs in Europe, but its production has been greatly reduced by the parasite Bonamia ostreae. Here, new generation genomic resources were used to analyse the repetitive fraction of the oyster genome, with the aim of developing molecular markers to face this main oyster production challenge. The resulting oyster database, consists of two sets of 10,318 and 7159 unique contigs (4.8 Mbp and 6.8 Mbp in total length) representing the oyster's genome (WG) and haemocyte transcriptome (HT), respectively. A total of 1083 sequences were identified as TE-derived, which corresponded to 4.0% of WG and 1.1% of HT. They were clustered into 142 homology groups, most of which were assigned to the Penelope order of retrotransposons, and to the Helitron and TIR DNA-transposons. Simple repeats and rRNA pseudogenes, also made a significant contribution to the oyster's genome (0.5% and 0.3% of WG and HT, respectively).The most frequent short tandem repeats identified in WG were tetranucleotide motifs while trinucleotide motifs were in HT. Forty identified microsatellite loci, 20 from each database, were selected for technical validation. Success was much lower among WG than HT microsatellites (15% vs 55%), which could reflect higher variation in anonymous regions interfering with primer annealing. All microsatellites developed adjusted to Hardy-Weinberg proportions and represent a useful tool to support future breeding programmes and to manage genetic resources of natural flat oyster beds.

  2. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  3. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    Science.gov (United States)

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  4. Mutational analysis of the SDD sequence motif of a PRRSV RNA-dependent RNA polymerase.

    Science.gov (United States)

    Zhou, Yan; Zheng, Haihong; Gao, Fei; Tian, Debin; Yuan, Shishan

    2011-09-01

    The subgenomic mRNA transcription and genomic replication of the porcine reproductive and respiratory syndrome virus (PRRSV) are directed by the viral replicase. The replicase is expressed in the form of two polyproteins and is subsequently processed into smaller nonstructural proteins (nsps). nsp9, containing the viral replicase, has characteristic sequence motifs conserved among the RNA-dependent RNA polymerases (RdRp) of positive-strand (PS) RNA viruses. To test whether the conserved SDD motif can tolerate other conserved motifs of RNA viruses and the influence of every residue on RdRp catalytic activity, many amino acids substitutions were introduced into it. Only one nsp9 substitution, of serine by glycine (S3050G), could rescue mutant viruses. The rescued virus was genetically stable. Alteration of either aspartate residue was not tolerated, destroyed the polymerase activity, and abolished virus transcription, but did not eliminate virus replication. We also found that the SDD motif was essentially invariant for the signature sequence of PRRSV RdRp. It could not accommodate other conserved motifs found in other RNA viral polymerases, except the GDD motif, which is conserved in all the other PS RNA viruses. These findings indicated that nidoviruses are evolutionarily related to other PS RNA viruses. Our studies support the idea that the two aspartate residues of the SDD motif are critical and essential for PRRSV transcription and represent a sequence variant of the GDD motif in PS RNA viruses.

  5. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    Science.gov (United States)

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  6. Characterization of the tandem CWCH2 sequence motif: a hallmark of inter-zinc finger interactions

    Directory of Open Access Journals (Sweden)

    Aruga Jun

    2010-02-01

    Full Text Available Abstract Background The C2H2 zinc finger (ZF domain is widely conserved among eukaryotic proteins. In Zic/Gli/Zap1 C2H2 ZF proteins, the two N-terminal ZFs form a single structural unit by sharing a hydrophobic core. This structural unit defines a new motif comprised of two tryptophan side chains at the center of the hydrophobic core. Because each tryptophan residue is located between the two cysteine residues of the C2H2 motif, we have named this structure the tandem CWCH2 (tCWCH2 motif. Results Here, we characterized 587 tCWCH2-containing genes using data derived from public databases. We categorized genes into 11 classes including Zic/Gli/Glis, Arid2/Rsc9, PacC, Mizf, Aebp2, Zap1/ZafA, Fungl, Zfp106, Twincl, Clr1, and Fungl-4ZF, based on sequence similarity, domain organization, and functional similarities. tCWCH2 motifs are mostly found in organisms belonging to the Opisthokonta (metazoa, fungi, and choanoflagellates and Amoebozoa (amoeba, Dictyostelium discoideum. By comparison, the C2H2 ZF motif is distributed widely among the eukaryotes. The structure and organization of the tCWCH2 motif, its phylogenetic distribution, and molecular phylogenetic analysis suggest that prototypical tCWCH2 genes existed in the Opisthokonta ancestor. Within-group or between-group comparisons of the tCWCH2 amino acid sequence identified three additional sequence features (site-specific amino acid frequencies, longer linker sequence between two C2H2 ZFs, and frequent extra-sequences within C2H2 ZF motifs. Conclusion These features suggest that the tCWCH2 motif is a specialized motif involved in inter-zinc finger interactions.

  7. Characterisation data of simple sequence repeats of phages closely related to T7M

    Directory of Open Access Journals (Sweden)

    Tiao-Yin Lin

    2016-09-01

    Full Text Available Coliphages T7M and T3, Yersinia phage ϕYeO3-12, and Salmonella phage ϕSG-JL2 share high homology in genomic sequences. Simple sequence repeats (SSRs are found in their genomes and variations of SSRs among these phages are observed. Analyses on regions of sequences in T7M and T3 genomes that are likely derived from phage recombination, as well as the counterparts in ϕYeO3-12 and ϕSG-JL2, have been discussed by Lin in “Simple sequence repeat variations expedite phage divergence: mechanisms of indels and gene mutations” [1]. These regions are referred to as recombinant regions. The focus here is on SSRs in the whole genome and regions of sequences outside the recombinant regions, referred to as non-recombinant regions. This article provides SSR counts, relative abundance, relative density, and GC contents in the complete genome and non-recombinant regions of these phages. SSR period sizes and motifs in the non-recombinant regions of phage genomes are plotted. Genomic sequence changes between T7M and T3 due to insertions, deletions, and substitutions are also illustrated. SSRs and nearby sequences of T7M in the non-recombinant regions are compared to the sequences of ϕYeO3-12 and ϕSG-JL2 in the corresponding positions. The sequence variations of SSRs due to vertical evolution are classified into four categories and tabulated: (1 insertion/deletion of SSR units, (2 expansion/contraction of SSRs without alteration of genome length, (3 changes of repeat motifs, and (4 generation/loss of repeats.

  8. Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.

    Directory of Open Access Journals (Sweden)

    Simon Philipp W

    2010-10-01

    Full Text Available Abstract Background Cucumber, Cucumis sativus L. is an important vegetable crop worldwide. Until very recently, cucumber genetic and genomic resources, especially molecular markers, have been very limited, impeding progress of cucumber breeding efforts. Microsatellites are short tandemly repeated DNA sequences, which are frequently favored as genetic markers due to their high level of polymorphism and codominant inheritance. Data from previously characterized genomes has shown that these repeats vary in frequency, motif sequence, and genomic location across taxa. During the last year, the genomes of two cucumber genotypes were sequenced including the Chinese fresh market type inbred line '9930' and the North American pickling type inbred line 'Gy14'. These sequences provide a powerful tool for developing markers in a large scale. In this study, we surveyed and characterized the distribution and frequency of perfect microsatellites in 203 Mbp assembled Gy14 DNA sequences, representing 55% of its nuclear genome, and in cucumber EST sequences. Similar analyses were performed in genomic and EST data from seven other plant species, and the results were compared with those of cucumber. Results A total of 112,073 perfect repeats were detected in the Gy14 cucumber genome sequence, accounting for 0.9% of the assembled Gy14 genome, with an overall density of 551.9 SSRs/Mbp. While tetranucleotides were the most frequent microsatellites in genomic DNA sequence, dinucleotide repeats, which had more repeat units than any other SSR type, had the highest cumulative sequence length. Coding regions (ESTs of the cucumber genome had fewer microsatellites compared to its genomic sequence, with trinucleotides predominating in EST sequences. AAG was the most frequent repeat in cucumber ESTs. Overall, AT-rich motifs prevailed in both genomic and EST data. Compared to the other species examined, cucumber genomic sequence had the highest density of SSRs (although

  9. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  10. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  11. Discovering sequence motifs in quantitative and qualitative pepetide data

    DEFF Research Database (Denmark)

    Andreatta, Massimo

    -dimensional, as binding sites normally consist of a pocket or a groove on the protein surface. However, in many cases such interactions contain a linear component and can be more conveniently represented, or approximated, by a protein-peptide interaction. Whereas time-consuming structural studies are necessary in systems...... of interactions in a single experiment, with virtually unlimited choice of potential targets and variants of these targets. However, the amount and complexity of data produced by high-throughput techniques poses serious challenges to researchers of limited bioinformatics expertise who need to analyze...... with the presence of multiple motifs, due to the experimental setup or the actual poly-specificity of the receptor, in peptide data. A new algorithm, based on Gibbs sampling, identifies multiple specificities by performing two tasks simultaneously: alignment and clustering of peptide data. The method, available...

  12. Comparative sequence analysis of leucine-rich repeats (LRRs within vertebrate toll-like receptors

    Directory of Open Access Journals (Sweden)

    Taga Masae

    2007-05-01

    Full Text Available Abstract Background Toll-like receptors (TLRs play a central role in innate immunity. TLRs are membrane glycoproteins and contain leucine rich repeat (LRR motif in the ectodomain. TLRs recognize and respond to molecules such as lipopolysaccharide, peptidoglycan, flagellin, and RNA from bacteria or viruses. The LRR domains in TLRs have been inferred to be responsible for molecular recognition. All LRRs include the highly conserved segment, LxxLxLxxNxL, in which "L" is Leu, Ile, Val, or Phe and "N" is Asn, Thr, Ser, or Cys and "x" is any amino acid. There are seven classes of LRRs including "typical" ("T" and "bacterial" ("S". All known domain structures adopt an arc or horseshoe shape. Vertebrate TLRs form six major families. The repeat numbers of LRRs and their "phasing" in TLRs differ with isoforms and species; they are aligned differently in various databases. We identified and aligned LRRs in TLRs by a new method described here. Results The new method utilizes known LRR structures to recognize and align new LRR motifs in TLRs and incorporates multiple sequence alignments and secondary structure predictions. TLRs from thirty-four vertebrate were analyzed. The repeat numbers of the LRRs ranges from 16 to 28. The LRRs found in TLRs frequently consists of LxxLxLxxNxLxxLxxxxF/LxxLxx ("T" and sometimes short motifs including LxxLxLxxNxLxxLPx(xLPxx ("S". The TLR7 family (TLR7, TLR8, and TLR9 contain 27 LRRs. The LRRs at the N-terminal part have a super-motif of STT with about 80 residues. The super-repeat is represented by STTSTTSTT or _TTSTTSTT. The LRRs in TLRs form one or two horseshoe domains and are mostly flanked by two cysteine clusters including two or four cysteine residue. Conclusion Each of the six major TLR families is characterized by their constituent LRR motifs, their repeat numbers, and their patterns of cysteine clusters. The central parts of the TLR1 and TLR7 families and of TLR4 have more irregular or longer LRR motifs. These

  13. Amplification of microsatellite repeat motifs is associated with the evolutionary differentiation and heterochromatinization of sex chromosomes in Sauropsida.

    Science.gov (United States)

    Matsubara, Kazumi; O'Meally, Denis; Azad, Bhumika; Georges, Arthur; Sarre, Stephen D; Graves, Jennifer A Marshall; Matsuda, Yoichi; Ezaz, Tariq

    2016-03-01

    The sex chromosomes in Sauropsida (reptiles and birds) have evolved independently many times. They show astonishing diversity in morphology ranging from cryptic to highly differentiated sex chromosomes with male (XX/XY) and female heterogamety (ZZ/ZW). Comparing such diverse sex chromosome systems thus provides unparalleled opportunities to capture evolution of morphologically differentiated sex chromosomes in action. Here, we describe chromosomal mapping of 18 microsatellite repeat motifs in eight species of Sauropsida. More than two microsatellite repeat motifs were amplified on the sex-specific chromosome, W or Y, in five species (Bassiana duperreyi, Aprasia parapulchella, Notechis scutatus, Chelodina longicollis, and Gallus gallus) of which the sex-specific chromosomes were heteromorphic and heterochromatic. Motifs (AAGG)n and (ATCC)n were amplified on the W chromosome of Pogona vitticeps and the Y chromosome of Emydura macquarii, respectively. By contrast, no motifs were amplified on the W chromosome of Christinus marmoratus, which is not much differentiated from the Z chromosome. Taken together with previously published studies, our results suggest that the amplification of microsatellite repeats is tightly associated with the differentiation and heterochromatinization of sex-specific chromosomes in sauropsids as well as in other taxa. Although some motifs were common between the sex-specific chromosomes of multiple species, no correlation was observed between this commonality and the species phylogeny. Furthermore, comparative analysis of sex chromosome homology and chromosomal distribution of microsatellite repeats between two closely related chelid turtles, C. longicollis and E. macquarii, identified different ancestry and differentiation history. These suggest multiple evolutions of sex chromosomes in the Sauropsida.

  14. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

    Science.gov (United States)

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-11-16

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats.

  15. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    Directory of Open Access Journals (Sweden)

    Glass John I

    2010-07-01

    Full Text Available Abstract Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT. Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the

  16. Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

    Science.gov (United States)

    Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...

  17. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    Aug 28, 2012 ... In the present study, 78 mapped simple sequence repeat (SSR) markers representing 11 ... mean (UPGMA) with each cluster representing a particular Vigna species. ..... were reported to be more frequent than the compound.

  18. Study of simple sequence repeat (SSR) polymorphism for biotic ...

    African Journals Online (AJOL)

    home

    2013-10-02

    Oct 2, 2013 ... back cross breeding; SSRs, simple sequence repeats; PIC, polymorphism ..... PIC values were reported in barley wheat and rice (Gu et ... doubled-haploid rice population. Theor. ... Grover A, Aishwarya V, Sharma PC (2007).

  19. Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

    Science.gov (United States)

    Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

    2017-02-01

    An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Prediction of HIV-1 virus-host protein interactions using virus and host sequence motifs

    Directory of Open Access Journals (Sweden)

    Tozeren Aydin

    2009-05-01

    Full Text Available Abstract Background Host protein-protein interaction networks are altered by invading virus proteins, which create new interactions, and modify or destroy others. The resulting network topology favors excessive amounts of virus production in a stressed host cell network. Short linear peptide motifs common to both virus and host provide the basis for host network modification. Methods We focused our host-pathogen study on the binding and competing interactions of HIV-1 and human proteins. We showed that peptide motifs conserved across 70% of HIV-1 subtype B and C samples occurred in similar positions on HIV-1 proteins, and we documented protein domains that interact with these conserved motifs. We predicted which human proteins may be targeted by HIV-1 by taking pairs of human proteins that may interact via a motif conserved in HIV-1 and the corresponding interacting protein domain. Results Our predictions were enriched with host proteins known to interact with HIV-1 proteins ENV, NEF, and TAT (p-value Conclusion A list of host proteins highly enriched with those targeted by HIV-1 proteins can be obtained by searching for host protein motifs along virus protein sequences. The resulting set of host proteins predicted to be targeted by virus proteins will become more accurate with better annotations of motifs and domains. Nevertheless, our study validates the role of linear binding motifs shared by virus and host proteins as an important part of the crosstalk between virus and host.

  1. Streptococcus salivarius Fimbriae Are Composed of a Glycoprotein Containing a Repeated Motif Assembled into a Filamentous Nondissociable Structure

    Science.gov (United States)

    Lévesque, Céline; Vadeboncoeur, Christian; Chandad, Fatiha; Frenette, Michel

    2001-01-01

    Streptococcus salivarius, a gram-positive bacterium found in the human oral cavity, expresses flexible peritrichous fimbriae. In this paper, we report purification and partial characterization of S. salivarius fimbriae. Fimbriae were extracted by shearing the cell surface of hyperfimbriated mutant A37 (a spontaneous mutant of S. salivarius ATCC 25975) with glass beads. Preliminary experiments showed that S. salivarius fimbriae did not dissociate when they were incubated at 100°C in the presence of sodium dodecyl sulfate. This characteristic was used to separate them from other cell surface components by successive gel filtration chromatography procedures. Fimbriae with molecular masses ranging from 20 × 106 to 40 × 106 Da were purified. Examination of purified fimbriae by electron microscopy revealed the presence of filamentous structures up to 1 μm long and 3 to 4 nm in diameter. Biochemical studies of purified fimbriae and an amino acid sequence analysis of a fimbrial internal peptide revealed that S. salivarius fimbriae were composed of a glycoprotein assembled into a filamentous structure resistant to dissociation. The internal amino acid sequence was composed of a repeated motif of two amino acids alternating with two modified residues: A/X/T-E-Q-M/φ, where X represents a modified amino acid residue and φ represents a blank cycle. Immunolocalization experiments also revealed that the fimbriae were associated with a wheat germ agglutinin-reactive carbohydrate. Immunolabeling experiments with antifimbria polyclonal antibodies showed that antigenically related fimbria-like structures were expressed in two other human oral streptococcal species, Streptococcus mitis and Streptococcus constellatus. PMID:11292790

  2. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  3. Cytogenetic diversity of simple sequences repeats in morphotypes of Brassica rapa ssp. chinensis

    Directory of Open Access Journals (Sweden)

    Jinshuang Zheng

    2016-07-01

    Full Text Available A significant fraction of the nuclear DNA of all eukaryotes is occupied by simple sequence repeats (SSRs. Although thesis sequences have sparked great interest as a means of studying genetic variation, linkage mapping and evolution, little attention had been paid to the chromosomal distribution and cytogenetic diversity of these sequences. This paper report the long-range organization of all possible classes of mono-, di- and tri-nucleotide SSRs in Brassica rapa. Fluorescence in situ hybridization (FISH was used to characterize the cytogenetic diversity of SSRs among morphotypes of B. rapa ssp. chinensis. The proportion of different SSR motifs varied among morphtypes of B. rapa, with trinucleotide SSRs more prevalent in the genome of B. rapa ssp. chinensis. The chromosomal characterizations of mono-, di- and tri-nucleotide repeats have been acquired. The data has revealed the non-random and motif-dependent chromosome distribution of SSRs in different morphtypes, and allowed the relative variability characterized by SSRs amount and similar chromosomal distribution in centromeric/peri-centromeric heterochromatin. The differences of SSRs in the abundance and distribution indicated the driving force of SSRs in relationship with the evolution of B. rapa species. The results provided a comprehensive view on the SSR sequence distribution and evolution for comparison among morphtypes B. rapa ssp. chinensis.

  4. Structural conservation of a short, functional, peptide-sequence motif

    OpenAIRE

    Fox-Erlich, Susan; Schiller, Martin R; Gryk, Michael R.

    2009-01-01

    Full length, eukaryotic proteins generally consist of several autonomously folding and functioning domains. Many of these domains are known to function by binding and/or modifying other partner proteins based on the recognition of a short, linear amino sequence contained within the target protein. This article reviews the many bioinformatic tools and resources which discover, define and catalogue the various, known protein domains as well as assist users by identifying domain signatures withi...

  5. High-resolution NMR characterization of a spider-silk mimetic composed of 15 tandem repeats and a CRGD motif

    OpenAIRE

    McLachlan, Glendon D; Slocik, Joseph; Mantz, Robert; Kaplan, David; Cahill, Sean; Girvin, Mark; Greenbaum, Steve

    2008-01-01

    Multidimensional solution NMR spectroscopic techniques have been used to obtain atomic level information about a recombinant spider silk construct in hexafluoro-isopropanol (HFIP). The synthetic 49 kDa silk-like protein mimics authentic silk from Nephila clavipes, with the inclusion of an extracellular matrix recognition motif. 2D 1H-15N HSQC NMR spectroscopy reveals 33 cross peaks, which were assigned to amino acid residues in the semicrystalline repeat units. Signals from the amorphous segm...

  6. Mining and validation of pyrosequenced simple sequence repeats (SSRs) from American cranberry (Vaccinium macrocarpon Ait.).

    Science.gov (United States)

    Zhu, H; Senalik, D; McCown, B H; Zeldin, E L; Speers, J; Hyman, J; Bassil, N; Hummer, K; Simon, P W; Zalapa, J E

    2012-01-01

    The American cranberry (Vaccinium macrocarpon Ait.) is a major commercial fruit crop in North America, but limited genetic resources have been developed for the species. Furthermore, the paucity of codominant DNA markers has hampered the advance of genetic research in cranberry and the Ericaceae family in general. Therefore, we used Roche 454 sequencing technology to perform low-coverage whole genome shotgun sequencing of the cranberry cultivar 'HyRed'. After de novo assembly, the obtained sequence covered 266.3 Mb of the estimated 540-590 Mb in cranberry genome. A total of 107,244 SSR loci were detected with an overall density across the genome of 403 SSR/Mb. The AG repeat was the most frequent motif in cranberry accounting for 35% of all SSRs and together with AAG and AAAT accounted for 46% of all loci discovered. To validate the SSR loci, we designed 96 primer-pairs using contig sequence data containing perfect SSR repeats, and studied the genetic diversity of 25 cranberry genotypes. We identified 48 polymorphic SSR loci with 2-15 alleles per locus for a total of 323 alleles in the 25 cranberry genotypes. Genetic clustering by principal coordinates and genetic structure analyzes confirmed the heterogeneous nature of cranberries. The parentage composition of several hybrid cultivars was evident from the structure analyzes. Whole genome shotgun 454 sequencing was a cost-effective and efficient way to identify numerous SSR repeats in the cranberry sequence for marker development.

  7. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  8. Genome-Wide Analysis of Simple Sequence Repeats in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2017-06-01

    Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.

  9. Isolation, characterization and amplification of simple sequence repeat loci in coffee

    Directory of Open Access Journals (Sweden)

    Marco-Aurelio Cristancho

    2008-01-01

    Full Text Available Simple sequence repeat (microsatellite loci in coffee were identified in clones isolated from enriched andrandom genomic libraries. It was shown that coffee is a plant species with low microsatellite frequency. However, the averagedistance between two loci, estimated at 127kb for poly (AG, is one of the shortest of all plant genomes. In contrast, thedistance between two poly (AC loci, estimated at 769kb, is one of the largest in plant genomes. Coffee (ACn microsatellites arefrequently associated with other microsatellites, mainly (ATn motifs, while (AGn microsatellites are not normally associatedwith other microsatellites and have a higher number of perfect motifs. Dinucleotide repeats (AG and (AC were found in ATrichregions in coffee. Sequence analysis of (ACn microsatellites identified in coffee revealed the possible association of theserepeated elements with miniature inverted-repeat transposable elements (MITEs. In addition, some of the evaluated SSRmarkers produced transposon-like amplification patterns in tetraploid genotypes. Of 12 SSR markers developed, nine werepolymorphic in diploid genotypes while 5 were polymorphic in tetraploid genotypes, confirming a greater genetic diversity indiploid species.

  10. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

    Science.gov (United States)

    Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

    2011-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

  11. Repeat Sequences and Base Correlations in Human Y Chromosome Palindromes

    Institute of Scientific and Technical Information of China (English)

    Neng-zhi Jin; Zi-xian Liu; Yan-jiao Qi; Wen-yuan Qiu

    2009-01-01

    On the basis of information theory and statistical methods, we use mutual information, n-tuple entropy and conditional entropy, combined with biological characteristics, to analyze the long range correlation and short range correlation in human Y chromosome palindromes. The magnitude distribution of the long range correlation which can be reflected by the mutual information is P5>P5a>P5b (P5a and P5b are the sequences that replace solely Alu repeats and all interspersed repeats with random uncorrelated sequences in human Y chromosome palindrome 5, respectively); and the magnitude distribution of the short range correlation which can be reflected by the n-tuple entropy and the conditional entropy is P5>P5a>P5b>random uncorrelated sequence. In other words, when the Alu repeats and all interspersed repeats replace with random uncorrelated sequence, the long range and short range correlation decrease gradually. However, the random uncorrelated sequence has no correlation. This research indicates that more repeat sequences result in stronger correlation between bases in human Y chromosome. The analyses may be helpful to understand the special structures of human Y chromosome palindromes profoundly.

  12. Survey of simple sequence repeats in woodland strawberry (Fragaria vesca).

    Science.gov (United States)

    Guan, L; Huang, J F; Feng, G Q; Wang, X W; Wang, Y; Chen, B Y; Qiao, Y S

    2013-07-30

    The use of simple sequence repeats (SSRs), or microsatellites, as genetic markers has become popular due to their abundance and variation in length among individuals. In this study, we investigated linkage groups (LGs) in the woodland strawberry (Fragaria vesca) and demonstrated variation in the abundances, densities, and relative densities of mononucleotide, dinucleotide, and trinucleotide repeats. Mononucleotide, dinucleotide, and trinucleotide repeats were more common than longer repeats in all LGs examined. Perfect SSRs were the predominant SSR type found and their abundance was extremely stable among LGs and chloroplasts. Abundances of mononucleotide, dinucleotide, and trinucleotide repeats were positively correlated with LG size, whereas those of tetranucleotide and hexanucleotide SSRs were not. Generally, in each LG, the abundance, relative abundance, relative density, and the proportion of each unique SSR all declined rapidly as the repeated unit increased. Furthermore, the lengths and frequencies of SSRs varied among different LGs.

  13. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal Matoq Saeed

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  14. iTriplet, a rule-based nucleic acid sequence motif finder

    Directory of Open Access Journals (Sweden)

    Gunderson Samuel I

    2009-10-01

    Full Text Available Abstract Background With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. Results We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. Conclusion iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.

  15. Analysis of Simple Sequence Repeats in Genomes of Rhizobia

    Institute of Scientific and Technical Information of China (English)

    GAO Ya-mei; HAN Yi-qiang; TANG Hui; SUN Dong-mei; WANG Yan-jie; WANG Wei-dong

    2008-01-01

    Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications in population genetics of rhizobia. We analyzed the occurrences, relative abundance, and relative density of SSRs, the most common in Bradyrhizobium japonicum, Mesorhizobium loti, and Sinorhizobium meliloti genomes se-quenced in the microorganisms tandem repeats database, and SSRs in the three species genomes were compared with each other. The result showed that there were 1 410, 859, and 638 SSRs in B. japonicum, M. loti, and 5. meliloti genomes, respectively. In the genomes of B. japonicum, M. loti, and 5. meliloti, tetranucleotide, pentanucleotide, and hexanucleotide repeats were more abundant and indicated higher mutation rates in these species. The least abundance was mononucleotide repeat. The SSRs type and distribution were similar among these species.

  16. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  17. Coevolution between simple sequence repeats (SSRs and virus genome size

    Directory of Open Access Journals (Sweden)

    Zhao Xiangyan

    2012-08-01

    Full Text Available Abstract Background Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes. Results In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome  Conclusions We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.

  18. Cloning, characterization, and properties of seven triplet repeat DNA sequences.

    Science.gov (United States)

    Ohshima, K; Kang, S; Larson, J E; Wells, R D

    1996-07-12

    Several neuromuscular and neurodegenerative diseases are caused by genetically unstable triplet repeat sequences (CTG.CAG, CGG.CCG, or AAG.CTT) in or near the responsible genes. We implemented novel cloning strategies with chemically synthesized oligonucleotides to clone seven of the triplet repeat sequences (GTA.TAC, GAT.ATC, GTT.AAC, CAC.GTG, AGG.CCT, TCG.CGA, and AAG.CTT), and the adjoining paper (Ohshima, K., Kang, S., Larson, J. E., and Wells, R. D.(1996) J. Biol. Chem. 271, 16784-16791) describes studies on TTA.TAA. This approach in conjunction with in vivo expansion studies in Escherichia coli enabled the preparation of at least 81 plasmids containing the repeat sequences with lengths of approximately 16 up to 158 triplets in both orientations with varying extents of polymorphisms. The inserts were characterized by DNA sequencing as well as DNA polymerase pausings, two-dimensional agarose gel electrophoresis, and chemical probe analyses to evaluate the capacity to adopt negative supercoil induced non-B DNA conformations. AAG.CTT and AGG.CCT form intramolecular triplexes, and the other five repeat sequences do not form any previously characterized non-B structures. However, long tracts of TCG.CGA showed strong inhibition of DNA synthesis at specific loci in the repeats as seen in the cases of CTG.CAG and CGG.CCG (Kang, S., Ohshima, K., Shimizu, M., Amirhaeri, S., and Wells, R. D.(1995) J. Biol. Chem. 270, 27014-27021). This work along with other studies (Wells, R. D.(1996) J. Biol. Chem. 271, 2875-2878) on CTG.CAG, CGG.CCG, and TTA.TAA makes available long inserts of all 10 triplet repeat sequences for a variety of physical, molecular biological, genetic, and medical investigations. A model to explain the reduction in mRNA abundance in Friedreich's ataxia based on intermolecular triplex formation is proposed.

  19. Exploiting BAC-end sequences for the mining, characterization and utility of new short sequences repeat (SSR) markers in Citrus.

    Science.gov (United States)

    Biswas, Manosh Kumar; Chai, Lijun; Mayer, Christoph; Xu, Qiang; Guo, Wenwu; Deng, Xiuxin

    2012-05-01

    The aim of this study was to develop a large set of microsatellite markers based on publicly available BAC-end sequences (BESs), and to evaluate their transferability, discriminating capacity of genotypes and mapping ability in Citrus. A set of 1,281 simple sequence repeat (SSR) markers were developed from the 46,339 Citrus clementina BAC-end sequences (BES), of them 20.67% contained SSR longer than 20 bp, corresponding to roughly one perfect SSR per 2.04 kb. The most abundant motifs were di-nucleotide (16.82%) repeats. Among all repeat motifs (TA/AT)n is the most abundant (8.38%), followed by (AG/CT)n (4.51%). Most of the BES-SSR are located in the non-coding region, but 1.3% of BES-SSRs were found to be associated with transposable element (TE). A total of 400 novel SSR primer pairs were synthesized and their transferability and polymorphism tested on a set of 16 Citrus and Citrus relative's species. Among these 333 (83.25%) were successfully amplified and 260 (65.00%) showed cross-species transferability with Poncirus trifoliata and Fortunella sp. These cross-species transferable markers could be useful for cultivar identification, for genomic study of Citrus, Poncirus and Fortunella sp. Utility of the developed SSR marker was demonstrated by identifying a set of 118 markers each for construction of linkage map of Citrus reticulata and Poncirus trifoliata. Genetic diversity and phylogenetic relationship among 40 Citrus and its related species were conducted with the aid of 25 randomly selected SSR primer pairs and results revealed that citrus genomic SSRs are superior to genic SSR for genetic diversity and germplasm characterization of Citrus spp.

  20. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

    Science.gov (United States)

    Zhao, Xiaoyan; Sze, Sing-Hoi

    2011-05-01

    One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.

  1. Development of simple sequence repeat (SSR) markers of sesame (Sesamum indicum) from a genome survey.

    Science.gov (United States)

    Wei, Xin; Wang, Linhai; Zhang, Yanxin; Qi, Xiaoqiong; Wang, Xiaoling; Ding, Xia; Zhang, Jing; Zhang, Xiurong

    2014-04-22

    Sesame (Sesamum indicum), an important oil crop, is widely grown in tropical and subtropical regions. It provides part of the daily edible oil allowance for almost half of the world's population. A limited number of co-dominant markers has been developed and applied in sesame genetic diversity and germplasm identity studies. Here we report for the first time a whole genome survey used to develop simple sequence repeat (SSR) markers and to detect the genetic diversity of sesame germplasm. From the initial assembled sesame genome, 23,438 SSRs (≥5 repeats) were identified. The most common repeat motif was dinucleotide with a frequency of 84.24%, followed by 13.53% trinucleotide, 1.65% tetranucleotide, 0.3% pentanucleotide and 0.28% hexanucleotide motifs. From 1500 designed and synthesised primer pairs, 218 polymorphic SSRs were developed and used to screen 31 sesame accessions that from 12 countries. STRUCTURE and phylogenetic analyses indicated that all sesame accessions could be divided into two groups: one mainly from China and another from other countries. Cluster analysis classified Chinese major sesame varieties into three groups. These novel SSR markers are a useful tool for genetic linkage map construction, genetic diversity detection, and marker-assisted selective sesame breeding.

  2. i-motif structures in long cytosine-rich sequences found upstream of the promoter region of the SMARCA4 gene.

    Science.gov (United States)

    Benabou, Sanae; Aviñó, Anna; Lyonnais, S; González, C; Eritja, Ramon; De Juan, Anna; Gargallo, Raimundo

    2017-09-01

    Cytosine-rich oligonucleotides are capable of forming complex structures known as i-motif with increasingly studied biological properties. The study of sequences prone to form i-motifs located near the promoter region of genes may be difficult because these sequences not only contain repeats of cytosine tracts of disparate length but also these may be separated by loops of varied nature and length. In this work, the formation of intramolecular i-motif structures by a long sequence located upstream of the promoter region of the SMARCA4 gene has been demonstrated. Nuclear Magnetic Resonance, Circular Dichroism, Gel Electrophoresis, Size-Exclusion Chromatography, and multivariate analysis have been used. Not only the wild sequence (5'-TC3T2GCTATC3TGTC2TGC2TCGC3T2G2TCATGA2C4-3') has been studied but also several other truncated and mutated sequences. Despite the apparent complex sequence, the results showed that the wild sequence may form a relatively stable and homogeneous unimolecular i-motif structure, both in terms of pH or temperature. The model ligand TMPyP4 destabilizes the structure, whereas the presence of 20% (w/v) PEG200 stabilized it slightly. This finding opens the door to the study of the interaction of these kind of i-motif structures with stabilizing ligands or proteins. Copyright © 2017 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.

  3. Development and characterization of simple sequence repeats for Bipolaris sorokiniana and cross transferability to related species.

    Science.gov (United States)

    Fajolu, Oluseyi L; Wadl, Phillip A; Vu, Andrea L; Gwinn, Kimberly D; Scheffler, Brian E; Trigiano, Robert N; Ownley, Bonnie H

    2013-01-01

    Simple sequence repeats (SSR) markers were developed from a small insert genomic library for Bipolaris sorokiniana, a mitosporic fungal pathogen that causes spot blotch and root rot in switchgrass. About 59% of sequenced clones (n = 384) harbored SSR motifs. After eliminating redundant sequences, 196 SSR loci were identified, of which 84.7% were dinucleotide repeats and 9.7% and 5.6% were tri- and tetra-nucleotide repeats, respectively. Primer pairs were designed for 105 loci and 85 successfully amplified loci. Sixteen polymorphic loci were characterized with 15 B. sorokiniana isolates obtained from infected switchgrass plant materials collected from five states in USA. These loci successfully cross-amplified isolates from at least one related species, including Bipolaris oryzae, Bipolaris spicifera and Bipolaris victoriae, that causes leaf spot on switchgrass. Haploid gene diversity per locus across all isolates studied varied 0.633-0.861. Principal component analysis of SSR data clustered isolates according to their respective species. These SSR markers will be a valuable tool for genetic variability and population studies of B. sorokiniana and related species that are pathogenic on switchgrass and other host plants. In addition, these markers are potential diagnostic tools for species in the genus Bipolaris.

  4. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction.

    Directory of Open Access Journals (Sweden)

    Aalt D J van Dijk

    Full Text Available Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and

  5. MicroRNA sequence motifs reveal asymmetry between the stem arms

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Havgaard, Jakob Hull; Ensterö, M.

    2006-01-01

    RNAs in their genomic contexts. We have compared profiles of mature miRNAs within their genomic context of the 5' and 3' stemloop precursor arms and we find asymmetry between mature sequences of the 5' and 3' stemloop precursor arms. The main observation is that vertebrate organisms have a characteristic motif on the 5......' arm which is in contrast to the 3' arm motif which mainly show the conserved U at the position of the mature start. Also the vertebrate 5' arm motif show a semi-conserved G 13 nucleotides upstream from the first position. We compared the 5' and 3' arm profiles using the average log likelihood ratio...... (ALLR) score, as defined by Wang and Stormo (2003) [Wang T., Stormo, G.D., 2003. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2369-2380.] and computing a p-value we find that the two profiles differs significantly in their 3' end where the 5' arm...

  6. Development of expressed sequence tag and expressed sequence tag–simple sequence repeat marker resources for Musa acuminata

    Science.gov (United States)

    Passos, Marco A. N.; de Oliveira Cruz, Viviane; Emediato, Flavia L.; de Camargo Teixeira, Cristiane; Souza, Manoel T.; Matsumoto, Takashi; Rennó Azevedo, Vânia C.; Ferreira, Claudia F.; Amorim, Edson P.; de Alencar Figueiredo, Lucio Flavio; Martins, Natalia F.; de Jesus Barbosa Cavalcante, Maria; Baurens, Franc-Christophe; da Silva, Orzenil Bonfim; Pappas, Georgios J.; Pignolet, Luc; Abadie, Catherine; Ciampi, Ana Y.; Piffanelli, Pietro; Miller, Robert N. G.

    2012-01-01

    Background and aims Banana (Musa acuminata) is a crop contributing to global food security. Many varieties lack resistance to biotic stresses, due to sterility and narrow genetic background. The objective of this study was to develop an expressed sequence tag (EST) database of transcripts expressed during compatible and incompatible banana–Mycosphaerella fijiensis (Mf) interactions. Black leaf streak disease (BLSD), caused by Mf, is a destructive disease of banana. Microsatellite markers were developed as a resource for crop improvement. Methodology cDNA libraries were constructed from in vitro-infected leaves from BLSD-resistant M. acuminata ssp. burmaniccoides Calcutta 4 (MAC4) and susceptible M. acuminata cv. Cavendish Grande Naine (MACV). Clones were 5′-end Sanger sequenced, ESTs assembled with TGICL and unigenes annotated using BLAST, Blast2GO and InterProScan. Mreps was used to screen for simple sequence repeats (SSRs), with markers evaluated for polymorphism using 20 diploid (AA) M. acuminata accessions contrasting in resistance to Mycosphaerella leaf spot diseases. Principal results A total of 9333 high-quality ESTs were obtained for MAC4 and 3964 for MACV, which assembled into 3995 unigenes. Of these, 2592 displayed homology to genes encoding proteins with known or putative function, and 266 to genes encoding proteins with unknown function. Gene ontology (GO) classification identified 543 GO terms, 2300 unigenes were assigned to EuKaryotic orthologous group categories and 312 mapped to Kyoto Encyclopedia of Genes and Genomes pathways. A total of 624 SSR loci were identified, with trinucleotide repeat motifs the most abundant in MAC4 (54.1 %) and MACV (57.6 %). Polymorphism across M. acuminata accessions was observed with 75 markers. Alleles per polymorphic locus ranged from 2 to 8, totalling 289. The polymorphism information content ranged from 0.08 to 0.81. Conclusions This EST collection offers a resource for studying functional genes, including

  7. Simple sequence repeats in watermelon (Citrullus lanatus (Thunb.) Matsum. & Nakai).

    Science.gov (United States)

    Jarret, R L; Merrick, L C; Holms, T; Evans, J; Aradhya, M K

    1997-08-01

    Simple sequence repeat length polymorphisms were utilized to examine genetic relatedness among accessions of watermelon (Citrullus lanatus (Thunb.) Matsum. & Nakai). A size-fractionated TaqI genomic library was screened for the occurrence of dimer and trimer simple sequence repeats (SSRs). A total of 96 (0.53%) SSR-bearing clones were identified and the inserts from 50 of these were sequenced. The dinucleotide repeats (CT)n and (GA)n accounted for 82% of the SSRs sequenced. PCR primer pairs flanking seven SSR loci were used to amplify SSRs from 32 morphologically variable watermelon genotypes from Africa, Europe, Asia, and Mexico and a single accession of Citrullus colocynthis from Chad. Cluster analysis of SSR length polymorphisms delineated 4 groups at the 25% level of genetic similarity. The largest group contained C. lanatus var. lanatus accessions. The second largest group contained only wild and cultivated "citron"-type or C. lanatus var. citroides accessions. The third group contained an accession tentatively identified as C. lanatus var. lanatus but which perhaps is a hybrid between C. lanatus var. lanatus and C. lanatus var. citroides. The fourth group consisted of a single accession identified as C. colocynthis. "Egusi"-type watermelons from Nigeria grouped with C. lanatus var. lanatus. The use of SSRs for watermelon germplasm characterization and genetic diversity studies is discussed.

  8. Sequence determination and modeling of structural motifs for the smallest monomeric aminoacyl-tRNA synthetase.

    OpenAIRE

    Hou, Y M; Shiba, K; Mottes, C; Schimmel, P.

    1991-01-01

    Polypeptide chains of 19 previously studied Escherichia coli aminoacyl-tRNA synthetases are as large as 951 amino acids and, depending on the enzyme, have quaternary structures of alpha, alpha 2, alpha 2 beta 2, and alpha 4. These enzymes have been organized into two classes which are defined by sequence motifs that are associated with specific three-dimensional structures. We isolated, cloned, and sequenced the previously uncharacterized gene for E. coli cysteine-tRNA synthetase (EC 6.1.1.16...

  9. Analysis of Tandem Repeat Patterns in Nlrc4 using a Motif Model

    Directory of Open Access Journals (Sweden)

    Sim-Hui Tee

    2012-12-01

    Full Text Available Exponential accumulation of biological data requires computer scientists and bioinformaticians to improve the efficiency of computer algorithms and databases. The recent advancement of computational tools has boosted the processing capacity of enormous volume of genetic data. This research applied a computational approach to analyze the tandem repeat patterns in Nlrc4 gene. Because the protein product of Nlrc4 gene is important in detecting pathogen and triggering subsequent immune responses, the results of this genetic analysis is essential for the understanding of the genetic characteristics of Nlrc4. The study on the distribution of tandem repeats may provide insights for drug design catered for the Nlrc4-implicated diseases.

  10. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions.

    Science.gov (United States)

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers.

  11. An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences

    Institute of Scientific and Technical Information of China (English)

    Giulio Pavesi; Giancarlo Mauri; Graziano Pesole

    2004-01-01

    Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example, in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. Unfortunately, finding common structural elements in RNA molecules is a very challenging task, even if limited to secondary structure. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. Although they differ in some details, the approaches proposed so far are usually based on the preliminary alignment of the sequences and attempt to predict common structures (either local or global, or for some selected regions) for the aligned sequences. These methods give good results when sequence and structure similarity are very high, but function less well when similarity is limited to small and local elements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we present directly searches for regions of the sequences that can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, such as those forming stem-loop structures. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in

  12. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    Science.gov (United States)

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  13. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    Directory of Open Access Journals (Sweden)

    Chunsheng Gao

    Full Text Available Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.. Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99% were the most abundant, followed by hexanucleotide (25.13%, dinucleotide (16.34%, tetranucloetide (3.8%, and pentanucleotide (3.74% repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96% was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31% were successfully amplified and 87 (74.36% were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  14. An inverted repeat motif stabilizes binding of E2F and enhances transcription of the dihydrofolate reductase gene

    DEFF Research Database (Denmark)

    Wade, M; Blake, M C; Jambou, R C

    1995-01-01

    An overlapping inverted repeat sequence that binds the eukaryotic transcription factor E2F is 100% conserved near the major transcription start sites in the promoters of three mammalian genes encoding dihydrofolate reductase, and is also found in the promoters of several other important cellular ...

  15. Repeat-based Sequence Typing of Carnobacterium maltaromaticum.

    Science.gov (United States)

    Rahman, Abdur; El Kheir, Sara M; Back, Alexandre; Mangavel, Cécile; Revol-Junelles, Anne-Marie; Borges, Frédéric

    2016-06-01

    Carnobacterium maltaromaticum is a Lactic Acid Bacterium (LAB) of technological interest for the food industry, especially the dairy as bioprotection and ripening flora. The industrial use of this LAB requires accurate and resolutive typing tools. A new typing method for C. maltaromaticum inspired from MLVA analysis and called Repeat-based Sequence Typing (RST) is described. Rather than electrophoresis analysis, our RST method is based on sequence analysis of multiple loci containing Variable-Number Tandem-Repeats (VNTRs). The method described here for C. maltaromaticum relies on the analysis of three VNTR loci, and was applied to a collection of 24 strains. For each strain, a PCR product corresponding to the amplification of each VNTR loci was sequenced. Sequence analysis allowed delineating 11, 11, and 12 alleles for loci VNTR-A, VNTR-B, and VNTR-C, respectively. Considering the allele combination exhibited by each strain allowed defining 15 genotypes, ending in a discriminatory index of 0.94. Comparison with MLST revealed that both methods were complementary for strain typing in C. maltaromaticum.

  16. Development of simple sequence repeat markers and diversity analysis in alfalfa (Medicago sativa L.).

    Science.gov (United States)

    Wang, Zan; Yan, Hongwei; Fu, Xinnian; Li, Xuehui; Gao, Hongwen

    2013-04-01

    Efficient and robust molecular markers are essential for molecular breeding in plant. Compared to dominant and bi-allelic markers, multiple alleles of simple sequence repeat (SSR) markers are particularly informative and superior in genetic linkage map and QTL mapping in autotetraploid species like alfalfa. The objective of this study was to enrich SSR markers directly from alfalfa expressed sequence tags (ESTs). A total of 12,371 alfalfa ESTs were retrieved from the National Center for Biotechnology Information. Total 774 SSR-containing ESTs were identified from 716 ESTs. On average, one SSR was found per 7.7 kb of EST sequences. Tri-nucleotide repeats (48.8 %) was the most abundant motif type, followed by di-(26.1 %), tetra-(11.5 %), penta-(9.7 %), and hexanucleotide (3.9 %). One hundred EST-SSR primer pairs were successfully designed and 29 exhibited polymorphism among 28 alfalfa accessions. The allele number per marker ranged from two to 21 with an average of 6.8. The PIC values ranged from 0.195 to 0.896 with an average of 0.608, indicating a high level of polymorphism of the EST-SSR markers. Based on the 29 EST-SSR markers, assessment of genetic diversity was conducted and found that Medicago sativa ssp. sativa was clearly different from the other subspecies. The high transferability of those EST-SSR markers was also found for relative species.

  17. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers

    Science.gov (United States)

    A blackberry (Rubus L.) expressed sequence tag (EST) library was produced for developing simple sequence repeat (SSR) markers from the tetraploid blackberry cultivar, Merton Thornless, the source of the thornless trait in commercial cultivars. RNA was extracted from young expanding leaves and used f...

  18. Steganalytic method based on short and repeated sequence distance statistics

    Institute of Scientific and Technical Information of China (English)

    WANG GuoXin; PING XiJian; XU ManKun; ZHANG Tao; BAO XiRui

    2008-01-01

    According to the distribution characteristics of short and repeated sequence (SRS),a steganalytic method based on the correlation of image bit planes is proposed.Firstly,we provide the conception of SRS distance statistics and deduce its statistical distribution.Because the SRS distance statistics can effectively reflect the correlation of the sequence,SRS has statistical features when the image bit plane sequence equals the image width.Using this characteristic,the steganalytic method is fulfilled by the distinct test of Poisson distribution.Experimental results show a good performance for detecting LSB matching steganographic method in still images.By the way,the proposed method is not designed for specific steganographic algorithms and has good generality.

  19. Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

    Science.gov (United States)

    Shan, Gao; Zheng, Wei-Mou

    2009-02-01

    By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.

  20. Identification of sequence motifs involved in Dengue virus-host interactions.

    Science.gov (United States)

    Asnet Mary, J; Paramasivan, R; Shenbagarathai, R

    2016-01-01

    Dengue fever is a rapidly spreading mosquito-borne virus infection, which remains a serious global public health problem. As there is no specific treatment or commercial vaccine available for effective control of the disease, the attempts on developing novel control strategies are underway. Viruses utilize the surface receptor proteins of host to enter into the cells. Though various proteins were said to be receptors of Dengue virus (DENV) using Virus Overlay Protein Binding Assay, the precise interaction between DENV and host is not explored. Understanding the structural features of domain III envelope glycoprotein would help in developing efficient antiviral inhibitors. Therefore, an attempt was made to identify the sequence motifs present in domain III envelope glycoprotein of Dengue virus. Computational analysis revealed that the NGR motif is present in the domain III envelope glycoprotein of DENV-1 and DENV-3. Similarly, DENV-1, DENV-2 and DENV-4 were found to contain Yxxphi motif which is a tyrosine-based sorting signal responsible for the interaction with a mu subunit of adaptor protein complex. High-throughput virtual screening resulted in five compounds as lead molecules based on glide score, which ranges from -4.664 to -6.52 kcal/Mol. This computational prediction provides an additional tool for understanding the virus-host interactions and helps to identify potential targets in the host. Further, experimental evidence is warranted to confirm the virus-host interactions and also inhibitory activity of reported lead compounds.

  1. De novo transcriptome sequencing reveals a considerable bias in the incidence of simple sequence repeats towards the downstream of 'Pre-miRNAs' of black pepper.

    Directory of Open Access Journals (Sweden)

    Nisha Joy

    Full Text Available Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L., an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs. We used the array of transcripts generated, for the in silico prediction and detection of '43 pre-miRNA candidates bearing different types of SSR motifs'. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted 'pre-miRNA candidates bearing SSRs'. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted 'pre-miRNA candidates'. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of 'tandem repeats' in miRNAs.

  2. Genome-wide identification and validation of simple sequence repeats (SSRs) from Asparagus officinalis.

    Science.gov (United States)

    Li, Shufen; Zhang, Guojun; Li, Xu; Wang, Lianjun; Yuan, Jinhong; Deng, Chuanliang; Gao, Wujun

    2016-06-01

    Garden asparagus (Asparagus officinalis), an important vegetable cultivated worldwide, can also serve as a model dioecious plant species in the study of sex determination and sex chromosome evolution. However, limited DNA marker resources have been developed and used for this species. To expand these resources, we examined the DNA sequences for simple sequence repeats (SSRs) in 163,406 scaffolds representing approximately 400 Mbp of the A. officinalis genome. A total of 87,576 SSRs were identified in 59,565 scaffolds. The most abundant SSR repeats were trinucleotide and tetranucleotide, accounting for 29.2 and 29.1% of the total SSRs, respectively, followed by di-, penta-, hexa-, hepta-, and octanucleotides. The AG motif was most common among dinucleotides and was also the most frequent motif in the entire A. officinalis genome, representing 14.7% of all SSRs. A total of 41,917 SSR primers pairs were designed to amplify SSRs. Twenty-two genomic SSR markers were tested in 39 asparagus accessions belonging to ten cultivars and one accession of Asparagus setaceus for determination of genetic diversity. The intra-species polymorphism information content (PIC) values of the 22 genomic SSR markers were intermediate, with an average of 0.41. The genetic diversity between the ten A. officinalis cultivars was low, and the UPGMA dendrogram was largely unrelated to cultivars. It is here suggested that the sex of individuals is an important factor influencing the clustering results. The information reported here provides new information about the organization of the microsatellites in A. officinalis genome and lays a foundation for further genetic studies and breeding applications of A. officinalis and related species.

  3. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons, The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS......, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed, Example solutions, and comparisons with other...

  4. Genome-wide characterization and linkage mapping of simple sequence repeats in mei (Prunus mume Sieb. et Zucc..

    Directory of Open Access Journals (Sweden)

    Lidan Sun

    Full Text Available Because of its popularity as an ornamental plant in East Asia, mei (Prunus mume Sieb. et Zucc. has received increasing attention in genetic and genomic research with the recent shotgun sequencing of its genome. Here, we performed the genome-wide characterization of simple sequence repeats (SSRs in the mei genome and detected a total of 188,149 SSRs occurring at a frequency of 794 SSR/Mb. Mononucleotide repeats were the most common type of SSR in genomic regions, followed by di- and tetranucleotide repeats. Most of the SSRs in coding sequences (CDS were composed of tri- or hexanucleotide repeat motifs, but mononucleotide repeats were always the most common in intergenic regions. Genome-wide comparison of SSR patterns among the mei, strawberry (Fragaria vesca, and apple (Malus×domestica genomes showed mei to have the highest density of SSRs, slightly higher than that of strawberry (608 SSR/Mb and almost twice as high as that of apple (398 SSR/Mb. Mononucleotide repeats were the dominant SSR motifs in the three Rosaceae species. Using 144 SSR markers, we constructed a 670 cM-long linkage map of mei delimited into eight linkage groups (LGs, with an average marker distance of 5 cM. Seventy one scaffolds covering about 27.9% of the assembled mei genome were anchored to the genetic map, depending on which the macro-colinearity between the mei genome and Prunus T×E reference map was identified. The framework map of mei constructed provides a first step into subsequent high-resolution genetic mapping and marker-assisted selection for this ornamental species.

  5. Structural analysis of a repetitive protein sequence motif in strepsirrhine primate amelogenin.

    Directory of Open Access Journals (Sweden)

    Rodrigo S Lacruz

    Full Text Available Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL, the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates.

  6. Structural Analysis of a Repetitive Protein Sequence Motif in Strepsirrhine Primate Amelogenin

    Science.gov (United States)

    Bromley, Keith M.; Hacia, Joseph G.; Bromage, Timothy G.; Snead, Malcolm L.; Moradian-Oldak, Janet; Paine, Michael L.

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  7. Construction of libraries enriched for sequence repeats and jumping clones, and hybridization selection for region-specific markers

    Energy Technology Data Exchange (ETDEWEB)

    Kandpal, R.P.; Kandpal, G.; Weissman, S.M. (Yale Univ. School of Medicine, New Haven, CT (United States))

    1994-01-04

    The authors describe a simple and rapid method for constructing small-insert genomic libraries highly enriched for dimeric, trimeric, and tetrameric nucleotide repeat motifs. The approach involves use of DNA inserts recovered by PCR amplification of a small-insert sonicated genomic phage library or by a single-primer PCR amplification of Mbo I-digested and adaptor-ligated genomic DNA. The genomic DNA inserts are heat denatured and hybridized to a biotinylated oligonucleotde. The biotinylated hybrids are retained on a Vectrex-avidin matrix and eluted specifically. The eluate is PCR amplified and cloned. More than 90% of the clones in a library enriched for (CA)[sub n] microsatellites with this approach contained clones with inserts containing CA repeats. They have also used this protocol for enrichment of (CAG)[sub n] and (AGAT)[sub n] sequence repeats and for Not I jumping clones. They have used the enriched libraries with an adaptation of the cDNA selection method to enrich for repeat motifs encoded in yeast artificial chromosomes.

  8. Simple sequence repeat map of the sunflower genome.

    Science.gov (United States)

    Tang, S.; Yu, J.-K.; Slabaugh, B.; Shintani, K.; Knapp, J.

    2002-12-01

    Several independent molecular genetic linkage maps of varying density and completeness have been constructed for cultivated sunflower ( Helianthus annuus L.). Because of the dearth of sequence and probe-specific DNA markers in the public domain, the various genetic maps of sunflower have not been integrated and a single reference map has not emerged. Moreover, comparisons between maps have been confounded by multiple linkage group nomenclatures and the lack of common DNA markers. The goal of the present research was to construct a dense molecular genetic linkage map for sunflower using simple sequence repeat (SSR) markers. First, 879 SSR markers were developed by identifying 1,093 unique SSR sequences in the DNA sequences of 2,033 clones isolated from genomic DNA libraries enriched for (AC)(n) or (AG)(n) and screening 1,000 SSR primer pairs; 579 of the newly developed SSR markers (65.9% of the total) were polymorphic among four elite inbred lines (RHA280, RHA801, PHA and PHB). The genetic map was constructed using 94 RHA280 x RHA801 F(7) recombinant inbred lines (RILs) and 408 polymorphic SSR markers (462 SSR marker loci segregated in the mapping population). Of the latter, 459 coalesced into 17 linkage groups presumably corresponding to the 17 chromosomes in the haploid sunflower genome ( x = 17). The map was 1,368.3-cM long and had a mean density of 3.1 cM per locus. The SSR markers described herein supply a critical mass of DNA markers for constructing genetic maps of sunflower and create the basis for unifying and cross-referencing the multitude of genetic maps developed for wild and cultivated sunflowers.

  9. Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

    Directory of Open Access Journals (Sweden)

    D. Satyanarayana Rao

    2007-02-01

    Full Text Available We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1 57-amino-acid-residue PxV domain, (2 122-amino-acid-residue FxF domain, (3 111-amino-acid-residue YEFF domain, (4 109-amino-acid-residue IMxxH domain, (5 103-amino-acid-residue VxxT domain, (6 84-amino-acid-residue ExW domain, (7 104-amino-acid-residue NTGFIG domain, (8 36-amino-acid-residue NxGK repeat, (9 95-amino-acid-residue VYV domain, (10 75-amino-acid-residue KEWE domain, (11 59-amino-acid-residue AFL domain, (12 53-amino-acid-residue RIDVK repeat, (13 (a 41-amino-acid-residue AGQF repeat and (b 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure.

  10. Identification and Mapping of Simple Sequence Repeat Markers from Common Bean (Phaseolus vulgaris L. Bacterial Artificial Chromosome End Sequences for Genome Characterization and Genetic–Physical Map Integration

    Directory of Open Access Journals (Sweden)

    Juana M. Córdoba

    2010-11-01

    Full Text Available Microsatellite markers or simple sequence repeat (SSR loci are useful for diversity characterization and genetic–physical mapping. Different in silico microsatellite search methods have been developed for mining bacterial artificial chromosome (BAC end sequences for SSRs. The overall goal of this study was genome characterization based on SSRs in 89,017 BAC end sequences (BESs from the G19833 common bean ( L. library. Another objective was to identify new SSR taking into account three tandem motif identification programs (Automated Microsatellite Marker Development [AMMD], Tandem Repeats Finder [TRF], and SSRLocator [SSRL]. Among the microsatellite search engines, SSRL identified the highest number of SSRs; however, when primer design was attempted, the number dropped due to poor primer design regions. Automated Microsatellite Marker Development software identified many SSRs with valuable AT/TA or AG/TC motifs, while TRF found fewer SSRs and produced no primers. A subgroup of 323 AT-rich, di-, and trinucleotide SSRs were selected from the AMMD results and used in a parental survey with DOR364 and G19833, of which 75 could be mapped in the corresponding population; these represented 4052 BAC clones. Together with 92 previously mapped BES- and 114 non-BES-derived markers, a total of 280 SSRs were included in the polymerase chain reaction (PCR-based map, integrating a total of 8232 BAC clones in 162 contigs from the physical map.

  11. Conserved sequence motifs in the small subunit of human general transcription factor TFIIE.

    Science.gov (United States)

    Sumimoto, H; Ohkuma, Y; Sinn, E; Kato, H; Shimasaki, S; Horikoshi, M; Roeder, R G

    1991-12-05

    A general initiation factor, TFIIE, is essential for transcription initiation by RNA polymerase II in conjunction with other general factors. TFIIE is a heterotetramer containing two subunits of relative molecular mass 57,000 (TFIIE-alpha) and two of 34,000 (TFIIE-beta). TFIIE-beta is required in conjunction with TFIIE-alpha for transcription initiation. Here we report the cloning and expression of a complementary DNA encoding a functional human TFIIE-beta. Recombinant TFIIE-beta could replace the natural TFIIE-beta for transcription in conjunction with TFIIE-alpha. Amino-acid sequence comparisons reveal regions with sequence similarities to: subregion 3 of bacterial sigma factors; a region of RAP30 (the small subunit of TFIIF) with sequence similarity to a sigma-factor subregion implicated in binding to RNA polymerase; and a portion of the basic region-helix-loop-helix motif found in several enhancer-binding proteins. These potential homologies have implications for the role of TFIIE in preinitiation complex assembly and function.

  12. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

    Science.gov (United States)

    Gautheret, D; Lambert, A

    2001-11-09

    We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs. Copyright 2001 Academic Press.

  13. Analysis of simple sequence repeats markers derived from Phytophthora sojae expressed sequence tags

    Institute of Scientific and Technical Information of China (English)

    ZHU Zhendong; HUO Yunlong; WANG Xiaoming; HUANG Junbin; WU Xiaofei

    2004-01-01

    Five thousand and eight hundred publicly available expressed sequence tags (ESTs) of Phytophthora sojae were electronically searched and 415 simple sequence repeats (SSRs) were identified in 369 ESTs. The average density of SSRs was one SSR per 8.9 kb of EST sequence screened. The most frequent repeats were trinucleotide repeats (50.1%) and the least frequent were tetranucleotide repeats (8.2%). Forty primer pairs were designed and tested on 5 strains of P. sojae. Thirty-three primer pairs had successful PCR amplifications. Of the 33 functional primer pairs, 28 primer pairs produced characteristic SSR bands of the expected size, and 15 primer pairs (45.5%) detected polymorphism among 5 tested strains of P. sojae. Based on the polymorphisms detected with 20 EST-SSR markers, the 5 tested strains of P. sojae were clustered into 3 groups. In this study, the SSR markers of P. sojae were developed for the first time. These markers could be useful for identification, genetic variation study, and molecular mapping of P. sojae and its relative species.

  14. Specific binding of the replication protein of plasmid pPS10 to direct and inverted repeats is mediated by an HTH motif.

    Science.gov (United States)

    García de Viedma, D; Serrano-López, A; Díaz-Orejas, R

    1995-01-01

    The initiator protein of the plasmid pPS10, RepA, has a putative helix-turn-helix (HTH) motif at its C-terminal end. RepA dimers bind to an inverted repeat at the repA promoter (repAP) to autoregulate RepA synthesis. [D. García de Viedma, et al. (1996) EMBO J. in press]. RepA monomers bind to four direct repeats at the origin of replication (oriV) to initiate pPS10 replication This report shows that randomly generated mutations in RepA, associated with defficiencies in autoregulation, map either at the putative HTH motif or in its vicinity. These mutant proteins do not promote pPS10 replication and are severely affected in binding to both the repAP and oriV regions in vitro. Revertants of a mutant that map in the vicinity of the HTH motif have been obtained and correspond to a second amino acid substitution far upstream of the motif. However, reversion of mutants that map in the helices of the motif occurs less frequently, at least by an order of magnitude. All these data indicate that the helices of the HTH motif play an essential role in specific RepA-DNA interactions, although additional regions also seem to be involved in DNA binding activity. Some mutations have slightly different effects in replication and autoregulation, suggesting that the role of the HTH motif in the interaction of RepA dimers or monomers with their respective DNA targets (IR or DR) is not the same. Images PMID:8559664

  15. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

    Science.gov (United States)

    Gilbert, N; Labuda, D

    1999-03-16

    A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.

  16. Always look on both sides: phylogenetic information conveyed by simple sequence repeat allele sequences.

    Directory of Open Access Journals (Sweden)

    Stéphanie Barthe

    Full Text Available Simple sequence repeat (SSR markers are widely used tools for inferences about genetic diversity, phylogeography and spatial genetic structure. Their applications assume that variation among alleles is essentially caused by an expansion or contraction of the number of repeats and that, accessorily, mutations in the target sequences follow the stepwise mutation model (SMM. Generally speaking, PCR amplicon sizes are used as direct indicators of the number of SSR repeats composing an allele with the data analysis either ignoring the extent of allele size differences or assuming that there is a direct correlation between differences in amplicon size and evolutionary distance. However, without precisely knowing the kind and distribution of polymorphism within an allele (SSR and the associated flanking region (FR sequences, it is hard to say what kind of evolutionary message is conveyed by such a synthetic descriptor of polymorphism as DNA amplicon size. In this study, we sequenced several SSR alleles in multiple populations of three divergent tree genera and disentangled the types of polymorphisms contained in each portion of the DNA amplicon containing an SSR. The patterns of diversity provided by amplicon size variation, SSR variation itself, insertions/deletions (indels, and single nucleotide polymorphisms (SNPs observed in the FRs were compared. Amplicon size variation largely reflected SSR repeat number. The amount of variation was as large in FRs as in the SSR itself. The former contributed significantly to the phylogenetic information and sometimes was the main source of differentiation among individuals and populations contained by FR and SSR regions of SSR markers. The presence of mutations occurring at different rates within a marker's sequence offers the opportunity to analyse evolutionary events occurring on various timescales, but at the same time calls for caution in the interpretation of SSR marker data when the distribution of within

  17. Fitting a mixture model by expectation maximization to discover motifs in biopolymers

    Energy Technology Data Exchange (ETDEWEB)

    Bailey, T.L.; Elkan, C. [Univ. of California, La Jolla, CA (United States)

    1994-12-31

    The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs. The algorithm requires only a set of unaligned sequences and a number specifying the width of the motifs as input. It returns a model of each motif and a threshold which together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases. The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset.

  18. A strain-variable bacteriocin in Bacillus anthracis and Bacillus cereus with repeated Cys-Xaa-Xaa motifs

    Directory of Open Access Journals (Sweden)

    Haft Daniel H

    2009-04-01

    Full Text Available Abstract Bacteriocins are peptide antibiotics from ribosomally translated precursors, produced by bacteria often through extensive post-translational modification. Minimal sequence conservation, short gene lengths, and low complexity sequence can hinder bacteriocin identification, even during gene calling, so they are often discovered by proximity to accessory genes encoding maturation, immunity, and export functions. This work reports a new subfamily of putative thiazole-containing heterocyclic bacteriocins. It appears universal in all strains of Bacillus anthracis and B. cereus, but has gone unrecognized because it is always encoded far from its maturation protein operon. Patterns of insertions and deletions among twenty-four variants suggest a repeating functional unit of Cys-Xaa-Xaa. Reviewers This article was reviewed by Andrei Osterman and Lakshminarayan Iyer.

  19. [Homologous simple sequence repeats (SSRs) analysis in tetraploid (AD1) and diploid (A₂, D₅) genomes of Gossypium].

    Science.gov (United States)

    Gaofei, Sun; Shoupu, He; Zhaoe, Pan; Xiongming, Du

    2015-02-01

    Simple sequence repeats (SSRs)are a class of repetitive DNA sequences, which are commonly used for genome analysis. Comparison of the homologous SSRs among different genomes is helpful to understand the evolutionary process in relative species. In this study, SSR scanning was performed to investigate their distribution and length variation among the genomes of G. raimondii (D₅), G. arboretum (A₂) and G. hirsutum (AD₁). The results demonstrated that the distribution of SSRs in A genome was very similar with that in D genome, while the length variation of homologous SSRs between A and AD genome was more conserved than that between D and AD genome. Compared with SSRs in AD genome, the number of SSRs with longer motif length in A genome was about five times of those with shorter motif length, while it was about three times in D genome. This implied that the length variation rates of homologous SSRs between diploid cotton and tetraploid cotton were different during the parallel evolution due to the subgenome fusion, and the motif length of most SSRs in tetraoploid genome tended to become shorter than homologous SSRs in diploid genome during the process of evolution. This study comprehensively compared the SSRs in three cotton genomes and revealed the significant difference among them, providing a foundation for further evolutionary study of Gossypium genome.

  20. Isolation and Characterization of Simple Sequence Repeats (SSR) Markers from the Moss Genus Orthotrichum Using a Small Throughput Pyrosequencing Machine

    Science.gov (United States)

    Sawicki, Jakub; Kwaśniewski, Mirosław; Szczecińska, Monika; Chwiałkowska, Karolina; Milewicz, Monika; Plášek, Vítězslav

    2012-01-01

    Here, we report the results of next-generation sequencing on the GS Junior system to identify a large number of microsatellites from the epiphytic moss Orthotrichum speciosum. Using a combination of a total (non-enrichment) genomic library and small-scale 454 pyrosequencing, we determined 5382 contigs whose length ranged from 103 to 5445 bp. In this dataset we identified 92 SSR (simple sequence repeats) motifs in 89 contigs. Forty-six of these had flanking regions suitable for primer design. We tested PCR amplification, reproducibility, and the level of polymorphism of 46 primer pairs for Orthotrichum speciosum using 40 individuals from two populations. As a result, the designed primers revealed 35 polymorphic loci with more than two alleles detected. This method is cost- and time-effective in comparison with traditional approaches involving cloning and sequencing. PMID:22837714

  1. Correlating CpG islands, motifs, and sequence variants in human chromosome 21

    Directory of Open Access Journals (Sweden)

    Cercone Nick

    2011-07-01

    Full Text Available Abstract Background CpG islands are important regions in DNA. They usually appear at the 5’ end of genes containing GC-rich dinucleotides. When DNA methylation occurs, gene regulation is affected and it sometimes leads to carcinogenesis. We propose a new detection program using a hidden-markov model alongside the Viterbi algorithm. Methods Our solution provides a graphical user interface not seen in many of the other CGI detection programs and we unify the detection and analysis under one program to allow researchers to scan a genetic sequence, detect the significant CGIs, and analyze the sequence once the scan is complete for any noteworthy findings. Results Using human chromosome 21, we show that our algorithm finds a significant number of CGIs. Running an analysis on a dataset of promoters discovered that the characteristics of methylated and unmethylated CGIs are significantly different. Finally, we detected significantly different motifs between methylated and unmethylated CGI promoters using MEME and MAST. Conclusions Developing this new tool for the community using powerful algorithms has shown that combining analysis with CGI detection will improve the continued research within the field of epigenetics.

  2. Engineering Proteins with Enhanced Mechanical Stability by Force Specific Sequence Motifs

    Science.gov (United States)

    Lu, Wenzhe; Negi, Surendra; Oberhauser, Andres F.; Braun, Werner

    2012-01-01

    Use of atomic force microscopy (AFM) has recently led to a better understanding of the molecular mechanisms of the unfolding process by mechanical forces; however, the rational design of novel proteins with specific mechanical strength remains challenging. We have approached this problem from a new perspective that generates linear physical-chemical properties (PCP) motifs from a limited AFM data set. Guided by our linear sequence analysis we designed and analyzed four new mutants of the titin I1 domain with the goal of increasing the domain's mechanical strength. All four mutants could be cloned and expressed as soluble proteins. AFM data indicate that at least two of the mutants have increased molecular mechanical strength. This observation suggests that the PCP method is useful to graft sequences specific for high mechanical stability to weak proteins to increase their mechanical stability, and represents an additional tool in the design of novel proteins besides steered molecular dynamics calculations, coarse grained simulations and phi-value analysis of the transition state. PMID:22274941

  3. Analysis of the Campylobacter jejuni genome by SMRT DNA sequencing identifies restriction-modification motifs.

    Directory of Open Access Journals (Sweden)

    Jason L O'Loughlin

    Full Text Available Campylobacter jejuni is a leading bacterial cause of human gastroenteritis. The goal of this study was to analyze the C. jejuni F38011 strain, recovered from an individual with severe enteritis, at a genomic and proteomic level to gain insight into microbial processes. The C. jejuni F38011 genome is comprised of 1,691,939 bp, with a mol.% (G+C content of 30.5%. PacBio sequencing coupled with REBASE analysis was used to predict C. jejuni F38011 genomic sites and enzymes that may be involved in DNA restriction-modification. A total of five putative methylation motifs were identified as well as the C. jejuni enzymes that could be responsible for the modifications. Peptides corresponding to the deduced amino acid sequence of the C. jejuni enzymes were identified using proteomics. This work sets the stage for studies to dissect the precise functions of the C. jejuni putative restriction-modification enzymes. Taken together, the data generated in this study contributes to our knowledge of the genomic content, methylation profile, and encoding capacity of C. jejuni.

  4. Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins.

    Directory of Open Access Journals (Sweden)

    David Karlin

    Full Text Available Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa, several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains that could be detected simply by comparing orthologous proteins.

  5. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Directory of Open Access Journals (Sweden)

    Guido W. Grimm

    2006-01-01

    Full Text Available The multi-copy internal transcribed spacer (ITS region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation instead of the full (partly redundant original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

  6. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs

    DEFF Research Database (Denmark)

    van Beest, M; Dooijes, D; van De Wetering, M

    2000-01-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment...

  7. The Cipher Code of Simple Sequence Repeats in "Vampire Pathogens".

    Science.gov (United States)

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-07-28

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like "vampire pathogens" (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation.

  8. Sequence, structure, and cooperativity in folding of elementary protein structural motifs.

    Science.gov (United States)

    Lai, Jason K; Kubelka, Ginka S; Kubelka, Jan

    2015-08-11

    Residue-level unfolding of two helix-turn-helix proteins--one naturally occurring and one de novo designed--is reconstructed from multiple sets of site-specific (13)C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa-Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako-Saitô-Muñoz-Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and (13)C-amide I' bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for "experimental" reaction coordinates--namely, the degree of local folding as sensed by site-specific (13)C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture.

  9. A structural study for the optimisation of functional motifs encoded in protein sequences

    Directory of Open Access Journals (Sweden)

    Helmer-Citterich Manuela

    2004-04-01

    Full Text Available Abstract Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases, the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of

  10. Sequence characterization of hypervariable regions in the soybean genome: leucine-rich repeats and simple sequence repeats

    Directory of Open Access Journals (Sweden)

    Everaldo G. de Barros

    2000-06-01

    Full Text Available The genetic basis of cultivated soybean is rather narrow. This observation has been confirmed by analysis of agronomic traits among different genotypes, and more recently by the use of molecular markers. During the construction of an RFLP soybean map (Glycine soja x Glycine max the two progenitors were analyzed with over 2,000 probes, of which 25% were polymorphic. Among the probes that revealed polymorphisms, a small proportion, about 0.5%, hybridized to regions that were highly polymorphic. Here we report the sequencing and analysis of five of these probes. Three of the five contain segments that encode leucine-rich repeat (LRR sequence homologous to known disease resistance genes in plants. Two other probes are relatively AT-rich and contain segments of (An/(Tn. DNA segments corresponding to one of the probes (A45-10 were amplified from nine soybean genotypes. Partial sequencing of these amplicons suggests that deletions and/or insertions are responsible for the extensive polymorphism observed. We propose that genes encoding LRR proteins and simple sequence repeat region prone to slippage are some of the most hypervariable regions of the soybean genome.A base genética da soja cultivada é relativamente estreita. Essa observação foi confirmada por análises de características agronômicas entre diferentes genótipos e, mais recentemente, pelo uso de marcadores moleculares. Durante a construção de um mapa de RFLP da soja (Glycine soja x Glycine max, os dois progenitores foram analisados com mais de 2000 sondas, das quais 25% eram polimórficas. Entre as sondas que revelaram polimorfismos, uma pequena proporção, cerca de 0,5%, hibridizou com regiões que eram altamente polimórficas. Neste trabalho, são apresentados o seqüenciamento e análise de cinco dessas sondas. Três dessas sondas contêm segmentos que codificam repetições ricas em leucina que são homólogas a genes de resistência a doenças já conhecidos em plantas. As duas

  11. A nested leucine rich repeat (LRR domain: The precursor of LRRs is a ten or eleven residue motif

    Directory of Open Access Journals (Sweden)

    Matsushima Norio

    2010-09-01

    Full Text Available Abstract Background Leucine rich repeats (LRRs are present in over 60,000 proteins that have been identified in viruses, bacteria, archae, and eukaryotes. All known structures of repeated LRRs adopt an arc shape. Most LRRs are 20-30 residues long. All LRRs contain LxxLxLxxNxL, in which "L" is Leu, Ile, Val, or Phe and "N" is Asn, Thr, Ser, or Cys and "x" is any amino acid. Seven classes of LRRs have been identified. However, other LRR classes remains to be characterized. The evolution of LRRs is not well understood. Results Here we describe a novel LRR domain, or nested repeat observed in 134 proteins from 54 bacterial species. This novel LRR domain has 21 residues with the consensus sequence of LxxLxLxxNxLxxLDLxx(N/L/Q/xxx or LxxLxCxxNxLxxLDLxx(N/L/xxx. This LRR domain is characterized by a nested periodicity; it consists of alternating 10- and 11- residues units of LxxLxLxxNx(x/-. We call it "IRREKO" LRR, since the Japanese word for "nested" is "IRREKO". The first unit of the "IRREKO" LRR domain is frequently occupied by an "SDS22-like" LRR with the consensus of LxxLxLxxNxLxxLxxLxxLxx or a "Bacterial" LRR with the consensus of LxxLxLxxNxLxxLPxLPxx. In some proteins an "SDS22-like" LRR intervenes between "IRREKO" LRRs. Conclusion Proteins having "IRREKO" LRR domain are almost exclusively found in bacteria. It is suggested that IRREKO@LRR evolved from a common ancestor with "SDS22-like" and "Bacterial" classes and that the ancestor of IRREKO@LRR is 10 or 11 residues of LxxLxLxxNx(x/-. The "IRREKO" LRR is predicted to adopt an arc shape with smaller curvature in which β-strands are formed on both concave and convex surfaces.

  12. A Naturally Occurring Repeat Protein with High Internal Sequence Identity Defines a New Class of TPR-like Proteins.

    Science.gov (United States)

    Marold, Jacob D; Kavran, Jennifer M; Bowman, Gregory D; Barrick, Doug

    2015-11-01

    Linear repeat proteins often have high structural similarity and low (∼25%) pairwise sequence identities (PSI) among modules. We identified a unique P. anserina (Pa) sequence with tetratricopeptide repeat (TPR) homology, which contains longer (42 residue) repeats (42PRs) with an average PSI >91%. We determined the crystal structure of five tandem Pa 42PRs to 1.6 Å, and examined the stability and solution properties of constructs containing three to six Pa 42PRs. Compared with 34-residue TPRs (34PRs), Pa 42PRs have a one-turn extension of each helix, and bury more surface area. Unfolding transitions shift to higher denaturant concentration and become sharper as repeats are added. Fitted Ising models show Pa 42PRs to be more cooperative than consensus 34PRs, with increased magnitudes of intrinsic and interfacial free energies. These results demonstrate the tolerance of the TPR motif to length variation, and provide a basis to understand the effects of helix length on intrinsic/interfacial stability.

  13. AptaTRACE Elucidates RNA Sequence-Structure Motifs from Selection Trends in HT-SELEX Experiments.

    Science.gov (United States)

    Dao, Phuong; Hoinka, Jan; Takahashi, Mayumi; Zhou, Jiehua; Ho, Michelle; Wang, Yijie; Costa, Fabrizio; Rossi, John J; Backofen, Rolf; Burnett, John; Przytycka, Teresa M

    2016-07-01

    Aptamers, short RNA or DNA molecules that bind distinct targets with high affinity and specificity, can be identified using high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX), but scalable analytic tools for understanding sequence-function relationships from diverse HT-SELEX data are not available. Here we present AptaTRACE, a computational approach that leverages the experimental design of the HT-SELEX protocol, RNA secondary structure, and the potential presence of many secondary motifs to identify sequence-structure motifs that show a signature of selection. We apply AptaTRACE to identify nine motifs in C-C chemokine receptor type 7 targeted by aptamers in an in vitro cell-SELEX experiment. We experimentally validate two aptamers whose binding required both sequence and structural features. AptaTRACE can identify low-abundance motifs, and we show through simulations that, because of this, it could lower HT-SELEX cost and time by reducing the number of selection cycles required. Published by Elsevier Inc.

  14. Viroids: from genotype to phenotype just relying on RNA sequence and structural motifs

    Directory of Open Access Journals (Sweden)

    Ricardo eFlores

    2012-06-01

    Full Text Available As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson-Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunvioidae adopt multibranched conformations occasionally stabilized by kissing loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunvioidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures ⎯either global or local ⎯ determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs.

  15. Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

    Directory of Open Access Journals (Sweden)

    William R. Gallaher

    2015-01-01

    Full Text Available Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP and the full length glycoprotein (GP, which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4 of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis.

  16. Assembly of Repeat Content Using Next Generation Sequencing Data

    Energy Technology Data Exchange (ETDEWEB)

    labutti, Kurt; Kuo, Alan; Grigoriev, Igor; Copeland, Alex

    2014-03-17

    Repetitive organisms pose a challenge for short read assembly, and typically only unique regions and repeat regions shorter than the read length, can be accurately assembled. Recently, we have been investigating the use of Pacific Biosciences reads for de novo fungal assembly. We will present an assessment of the quality and degree of repeat reconstruction possible in a fungal genome using long read technology. We will also compare differences in assembly of repeat content using short read and long read technology.

  17. Development and characterization of 1,827 expressed sequence tag-derived simple sequence repeat markers for ramie (Boehmeria nivea L. Gaud.

    Directory of Open Access Journals (Sweden)

    Touming Liu

    Full Text Available Ramie (Boehmeria nivea L. Gaud is one of the most important natural fiber crops, and improvement of fiber yield and quality is the main goal in efforts to breed superior cultivars. However, efforts aimed at enhancing the understanding of ramie genetics and developing more effective breeding strategies have been hampered by the shortage of simple sequence repeat (SSR markers. In our previous study, we had assembled de novo 43,990 expressed sequence tags (ESTs. In the present study, we searched these previously assembled ESTs for SSRs and identified 1,685 ESTs (3.83% containing 1,878 SSRs. Next, we designed 1,827 primer pairs complementary to regions flanking these SSRs, and these regions were designated as SSR markers. Among these markers, dinucleotide and trinucleotide repeat motifs were the most abundant types (36.4% and 36.3%, respectively, whereas tetranucleotide, pentanucleotide, and hexanucleotide motifs represented <10% of the markers. The motif AG/CT was the most abundant, accounting for 28.74% of the markers. One hundred EST-SSR markers (97 SSRs located in genes encoding transcription factors and 3 SSRs in genes encoding cellulose synthases were amplified using polymerase chain reaction for detecting 24 ramie varieties. Of these 100 markers, 98 markers were successfully amplified and 81 markers were polymorphic, with 2-6 alleles among the 24 varieties. Analysis of the genetic diversity of all 24 varieties revealed similarity coefficients that ranged from 0.51 to 0.80. The EST-SSRs developed in this study represent the first large-scale development of SSR markers for ramie. These SSR markers could be used for development of genetic and physical maps, quantitative trait loci mapping, genetic diversity studies, association mapping, and cultivar fingerprinting.

  18. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, Catherine [Noblis

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  19. Analysis of simple sequence repeats in rice bean (Vigna umbellata) using an SSR-enriched library

    Institute of Scientific and Technical Information of China (English)

    Lixia Wang; Kyung Do Kim; Dongying Gao; Honglin Chen; Suhua Wang; SukHa Lee; Scott A. Jackson; Xuzhen Cheng

    2016-01-01

    Rice bean (Vigna umbellata Thunb.), a warm-season annual legume, is grown in Asia mainly for dried grain or fodder and plays an important role in human and animal nutrition because the grains are rich in protein and some essential fatty acids and minerals. With the aim of expediting the genetic improvement of rice bean, we initiated a project to develop genomic resources and tools for molecular breeding in this little-known but important crop. Here we report the construction of an SSR-enriched genomic library from DNA extracted from pooled young leaf tissues of 22 rice bean genotypes and developing SSR markers. In 433,562 reads generated by a Roche 454 GS-FLX sequencer, we identified 261,458 SSRs, of which 48.8% were of compound form. Dinucleotide repeats were predominant with an absolute proportion of 81.6%, followed by trinucleotides (17.8%). Other types together accounted for 0.6%. The motif AC/GT accounted for 77.7%of the total, followed by AAG/CTT (14.3%), and all others accounted for 12.0%. Among the flanking sequences, 2928 matched putative genes or gene models in the protein database of Arabidopsis thaliana, corresponding with 608 non-redundant Gene Ontology terms. Of these sequences, 11.2%were involved in cellular components, 24.2%were involved molecular functions, and 64.6%were associated with biological processes. Based on homolog analysis, 1595 flanking sequences were similar to mung bean and 500 to common bean genomic sequences. Comparative mapping was conducted using 350 sequences homologous to both mung bean and common bean sequences. Finally, a set of primer pairs were designed, and a validation test showed that 58 of 220 new primers can be used in rice bean and 53 can be transferred to mung bean. However, only 11 were polymorphic when tested on 32 rice bean varieties. We propose that this study lays the groundwork for developing novel SSR markers and will enhance the mapping of qualitative and quantitative traits and marker-assisted selection in

  20. Analysis of simple sequence repeats in rice bean (Vigna umbellata using an SSR-enriched library

    Directory of Open Access Journals (Sweden)

    Lixia Wang

    2016-02-01

    Full Text Available Rice bean (Vigna umbellata Thunb., a warm-season annual legume, is grown in Asia mainly for dried grain or fodder and plays an important role in human and animal nutrition because the grains are rich in protein and some essential fatty acids and minerals. With the aim of expediting the genetic improvement of rice bean, we initiated a project to develop genomic resources and tools for molecular breeding in this little-known but important crop. Here we report the construction of an SSR-enriched genomic library from DNA extracted from pooled young leaf tissues of 22 rice bean genotypes and developing SSR markers. In 433,562 reads generated by a Roche 454 GS-FLX sequencer, we identified 261,458 SSRs, of which 48.8% were of compound form. Dinucleotide repeats were predominant with an absolute proportion of 81.6%, followed by trinucleotides (17.8%. Other types together accounted for 0.6%. The motif AC/GT accounted for 77.7% of the total, followed by AAG/CTT (14.3%, and all others accounted for 12.0%. Among the flanking sequences, 2928 matched putative genes or gene models in the protein database of Arabidopsis thaliana, corresponding with 608 non-redundant Gene Ontology terms. Of these sequences, 11.2% were involved in cellular components, 24.2% were involved molecular functions, and 64.6% were associated with biological processes. Based on homolog analysis, 1595 flanking sequences were similar to mung bean and 500 to common bean genomic sequences. Comparative mapping was conducted using 350 sequences homologous to both mung bean and common bean sequences. Finally, a set of primer pairs were designed, and a validation test showed that 58 of 220 new primers can be used in rice bean and 53 can be transferred to mung bean. However, only 11 were polymorphic when tested on 32 rice bean varieties. We propose that this study lays the groundwork for developing novel SSR markers and will enhance the mapping of qualitative and quantitative traits and marker

  1. Triazine-Based Sequence-Defined Polymers with Side-Chain Diversity and Backbone-Backbone Interaction Motifs.

    Science.gov (United States)

    Grate, Jay W; Mo, Kai-For; Daily, Michael D

    2016-03-14

    Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions.

  2. Flow Cytometry-Assisted Cloning of Specific Sequence Motifs from Complex 16S rRNA Gene Libraries

    DEFF Research Database (Denmark)

    Nielsen, Jeppe Lund; Schramm, Andreas; Bernhard, Anne E.

    2004-01-01

      FLOW CYTOMETRY-ASSISTED CLONING OF SPECIFIC SEQUENCE MOTIFS FROM COMPLEX 16S RRNA GENE LIBRARIES Jeppe L. Nielsen,1 Andreas Schramm,1,2 Anne E. Bernhard,1 Gerrit J. van den Engh,3 and David A. Stahl1* Department of Civil and Environmental Engineering, University of Washington,1 and Institute...... for Systems Biology,3 Seattle, Washington, and Department of Ecological Microbiology, University of Bayreuth, Bayreuth, Germany2 A flow cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting......-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant in a clone library of environmental 16S rRNA genes.  ...

  3. Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

    Science.gov (United States)

    Velagapudi, Sai Pradeep; Disney, Matthew D

    2013-10-15

    RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site.

  4. APTANI: a computational tool to select aptamers through sequence-structure motif analysis of HT-SELEX data.

    Science.gov (United States)

    Caroli, J; Taccioli, C; De La Fuente, A; Serafini, P; Bicciato, S

    2016-01-15

    Aptamers are synthetic nucleic acid molecules that can bind biological targets in virtue of both their sequence and three-dimensional structure. Aptamers are selected using SELEX, Systematic Evolution of Ligands by EXponential enrichment, a technique that exploits aptamer-target binding affinity. The SELEX procedure, coupled with high-throughput sequencing (HT-SELEX), creates billions of random sequences capable of binding different epitopes on specific targets. Since this technique produces enormous amounts of data, computational analysis represents a critical step to screen and select the most biologically relevant sequences. Here, we present APTANI, a computational tool to identify target-specific aptamers from HT-SELEX data and secondary structure information. APTANI builds on AptaMotif algorithm, originally implemented to analyze SELEX data; extends the applicability of AptaMotif to HT-SELEX data and introduces new functionalities, as the possibility to identify binding motifs, to cluster aptamer families or to compare output results from different HT-SELEX cycles. Tabular and graphical representations facilitate the downstream biological interpretation of results. APTANI is available at http://aptani.unimore.it. silvio.bicciato@unimore.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. A blackberry (Rubus L. expressed sequence tag library for the development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Main Dorrie S

    2008-06-01

    Full Text Available Abstract Background The recent development of novel repeat-fruiting types of blackberry (Rubus L. cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR, and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. Results A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. Conclusion This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry.

  6. Functional insight from the tetratricopeptide repeat-like motifs of the type III secretion chaperone SicA in Salmonella enterica serovar Typhimurium.

    Science.gov (United States)

    Kim, Jin Seok; Kim, Bae-Hoon; Jang, Jung Im; Eom, Jeong Seon; Kim, Hyeon Guk; Bang, Iel Soo; Park, Yong Keun

    2014-01-01

    SicA functions both as a class II chaperone for SipB and SipC of the type III secretion system (T3SS)-1 and as a transcriptional cofactor for the AraC-type transcription factor InvF in Salmonella enterica subsp. enterica serovar Typhimurium. Bioinformatic analysis has predicted that SicA possesses three tetratricopeptide repeat (TPR)-like motifs, which are important for protein-protein interactions and serve as multiprotein complex mediators. To investigate whether the TPR-like motifs in SicA are critical for its transcriptional cofactor function, the canonical residues in these motifs were mutated to glutamate (SicAA44E , SicAA78E , and SicAG112E ). None of these mutants except SicAA44E were able to activate the expression of the sipB and sigD genes. SicAA44E still has a capacity to interact with InvF in vitro, and despite its instability in cell, it could activate the sigDE operon. This suggests that TPR motifs are important for the transcriptional cofactor function of the SicA chaperone. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  7. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    Science.gov (United States)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  8. Sequence and Spatiotemporal Expression Analysis of CLE-Motif Containing Genes from the Reniform Nematode (Rotylenchulus reniformis Linford & Oliveira).

    Science.gov (United States)

    Wubben, Martin J; Gavilano, Lily; Baum, Thomas J; Davis, Eric L

    2015-06-01

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globodera, and Meloidogyne genera of sedentary endoparasites. Here, we describe the isolation, sequence analysis, and spatiotemporal expression of three R. reniformis genes encoding putative CLE motifs named Rr-cle-1, Rr-cle-2, and Rr-cle-3. The Rr-cle cDNAs showed >98% identity with each other and the predicted peptides were identical with the exception of a short stretch of residues at the carboxy(C)-terminus of the variable domain (VD). Each RrCLE peptide possessed an amino-terminal signal peptide for secretion and a single C-terminal CLE motif that was most similar to Heterodera CLE motifs. Aligning the Rr-cle cDNAs with their corresponding genomic sequences showed three exons with an intron separating the signal peptide from the VD and a second intron separating the VD from the CLE motif. An alignment of the RrCLE1 peptide with Heterodera glycines and Heterodera schachtii CLE proteins revealed a high level of homology within the VD region associated with regulating in planta trafficking of the processed CLE peptide. Quantitative RT-PCR (qRT-PCR) showed similar expression profiles for each Rr-cle transcript across the R. reniformis life-cycle with the greatest transcript abundance being in sedentary parasitic female nematodes. In situ hybridization showed specific Rr-cle expression within the dorsal esophageal gland cell of sedentary parasitic females.

  9. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Science.gov (United States)

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  10. Large cryptic internal sequence repeats in protein structures from Homo sapiens

    Indian Academy of Sciences (India)

    R Sarani; N A Udayaprakash; R Subashini; P Mridula; T Yamane; K Sekar

    2009-03-01

    Amino acid sequences are known to constantly mutate and diverge unless there is a limiting condition that makes such a change deleterious. However, closer examination of the sequence and structure reveals that a few large, cryptic repeats are nevertheless sequentially conserved. This leads to the question of why only certain repeats are conserved at the sequence level. It would be interesting to find out if these sequences maintain their conservation at the three-dimensional structure level. They can play an active role in protein and nucleotide stability, thus not only ensuring proper functioning but also potentiating malfunction and disease. Therefore, insights into any aspect of the repeats – be it structure, function or evolution – would prove to be of some importance. This study aims to address the relationship between protein sequence and its three-dimensional structure, by examining if large cryptic sequence repeats have the same structure.

  11. Large-scale analysis of structural, sequence and thermodynamic characteristics of A-to-I RNA editing sites in human Alu repeats

    Directory of Open Access Journals (Sweden)

    Eisenberg Eli

    2010-07-01

    Full Text Available Abstract Background Alu repeats in the human transcriptome undergo massive adenosine to inosine RNA editing. This process is selective, as editing efficiency varies greatly among different adenosines. Several studies have identified weak sequence motifs characterizing the editing sites, but these alone do not account for the large diversity observed. Results Here we build a dataset of 29,971 editing sites and use it to characterize editing preferences. We focus on structural aspects, studying the double-stranded RNA structure of the Alu repeats, and show the editing frequency of a given site to depend strongly on the micro-structure it resides in. Surprisingly, we find that interior loops, and especially the nucleotides at their edges, are more likely to be edited than helices. In addition, the sequence motifs characterizing editing sites vary with the micro-structure. Finally, we show that thermodynamic stability of the site is important for its editing. Conclusions Analysis of a large dataset of editing events reveals more information on sequence and structural motifs characterizing the A-to-I editing process

  12. Complete nucleotide sequences of the domestic cat (Felis catus) mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the nuclear genome

    Energy Technology Data Exchange (ETDEWEB)

    Lopez, J.V.; Cevario, S.; O`Brien, S.J. [National Cancer Institute, Frederick, MD (United States)

    1996-04-15

    The complete 17,009-bp mitochondrial genome of the domestic cat, Felis catus, has been sequenced and conforms largely to the typical organization of previously characterized mammalian mtDNAs. Codon usage and base composition also followed canonical vertebrate patterns, except for an unusual ATC (non-AUG) codon initiating the NADH dehydrogenase subunit 2 (ND2) gene. Two distinct repetitive motifs at opposite ends of the control region contribute to the relatively large size (1559 bp) of this carnivore mtDNA. Alignment of the feline mtDNA genome to a homologous 7946-bp nuclear mtDNA tandem repeat DNA sequence in the cat, Numt, indicates simple repeat motifs associated with insertion/deletion mutations. Overall DNA sequence divergence between Numt and cytoplasmic mtDNA sequence was only 5.1%. Substitutions predominate at the third codon position of homologous feline protein genes. Phylogenetic analysis of mitochondrial gene sequences confirms the recent transfer of the cytoplasmic mtDNA sequences to the domestic cat nucleus and recapitulates evolutionary relationships between mammal species. 86 refs., 4 figs., 3 tabs.

  13. A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction.

    Science.gov (United States)

    Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R; Gottschalk, Laura B; Lopez, Andrea P; Pellicore, Matthew J; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh; Cutting, Garry R

    2016-12-01

    The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, Kd = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence (1417)EENKVR(1422) and the terminal (1478)TRL(1480) (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics.

  14. The neuronal nitric oxide synthase PDZ motif binds to -G(D,E)XV* carboxyterminal sequences

    NARCIS (Netherlands)

    Schepens, J.; Cuppen, E.; Wieringa, B.; Hendriks, W.

    1997-01-01

    PDZ motifs are small protein-protein interaction modules that are thought to play a role in the clustering of submembranous signalling molecules. The specificity and functional consequences of their associative actions is still largely unknown. Using two-hybrid methodology we here demonstrate that t

  15. The histone chaperone sNASP binds a conserved peptide motif within the globular core of histone H3 through its TPR repeats.

    Science.gov (United States)

    Bowman, Andrew; Lercher, Lukas; Singh, Hari R; Zinne, Daria; Timinszky, Gyula; Carlomagno, Teresa; Ladurner, Andreas G

    2016-04-20

    Eukaryotic chromatin is a complex yet dynamic structure, which is regulated in part by the assembly and disassembly of nucleosomes. Key to this process is a group of proteins termed histone chaperones that guide the thermodynamic assembly of nucleosomes by interacting with soluble histones. Here we investigate the interaction between the histone chaperone sNASP and its histone H3 substrate. We find that sNASP binds with nanomolar affinity to a conserved heptapeptide motif in the globular domain of H3, close to the C-terminus. Through functional analysis of sNASP homologues we identified point mutations in surface residues within the TPR domain of sNASP that disrupt H3 peptide interaction, but do not completely disrupt binding to full length H3 in cells, suggesting that sNASP interacts with H3 through additional contacts. Furthermore, chemical shift perturbations from(1)H-(15)N HSQC experiments show that H3 peptide binding maps to the helical groove formed by the stacked TPR motifs of sNASP. Our findings reveal a new mode of interaction between a TPR repeat domain and an evolutionarily conserved peptide motif found in canonical H3 and in all histone H3 variants, including CenpA and have implications for the mechanism of histone chaperoning within the cell.

  16. Spectroscopic investigation on the telomeric DNA base sequence repeat

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Telomeres are protein-DNA complexes at the terminals of linear chromosomes, which protect chromosomal integrity and maintain cellular replicative capacity.From single-cell organisms to advanced animals and plants,structures and functions of telomeres are both very conservative. In cells of human and vertebral animals, telomeric DNA base sequences all are (TTAGGG)n. In the present work, we have obtained absorption and fluorescence spectra measured from seven synthesized oligonucleotides to simulate the telomeric DNA system and calculated their relative fluorescence quantum yields on which not only telomeric DNA characteristics are predicted but also possibly the shortened telomeric sequences during cell division are imrelative fluorescence quantum yield and remarkable excitation energy innerconversion, which tallies with the telomeric sequence of (TTAGGG)n. This result shows that telomeric DNA has a strong non-radiative or innerconvertible capability.``

  17. Plasmid P1 replication: negative control by repeated DNA sequences.

    OpenAIRE

    Chattoraj, D; Cordes, K.; Abeles, A

    1984-01-01

    The incompatibility locus, incA, of the unit-copy plasmid P1 is contained within a fragment that is essentially a set of nine 19-base-pair repeats. One or more copies of the fragment destabilizes the plasmid when present in trans. Here we show that extra copies of incA interfere with plasmid DNA replication and that a deletion of most of incA increases plasmid copy number. Thus, incA is not essential for replication but is required for its control. When cloned in a high-copy-number vector, pi...

  18. Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies.

    Science.gov (United States)

    May, Alex C W

    2002-12-01

    It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered.

  19. Recombination frequency in plasmid DNA containing direct repeats--predictive correlation with repeat and intervening sequence length.

    Science.gov (United States)

    Oliveira, Pedro H; Lemos, Francisco; Monteiro, Gabriel A; Prazeres, Duarte M F

    2008-09-01

    In this study, a simple non-linear mathematical function is proposed to accurately predict recombination frequencies in bacterial plasmid DNA harbouring directly repeated sequences. The mathematical function, which was developed on the basis of published data on deletion-formation in multicopy plasmids containing direct-repeats (14-856 bp) and intervening sequences (0-3872 bp), also accounts for the strain genotype in terms of its recA function. A bootstrap resampling technique was used to estimate confidence intervals for the correlation parameters. More than 92% of the predicted values were found to be within a pre-established +/-5-fold interval of deviation from experimental data. The correlation does not only provide a way to predict, with good accuracy, the recombination frequency, but also opens the way to improve insight into these processes.

  20. Sequencing analysis of the spinal bulbar muscular atrophy CAG expansion reveals absence of repeat interruptions.

    Science.gov (United States)

    Fratta, Pietro; Collins, Toby; Pemble, Sally; Nethisinghe, Suran; Devoy, Anny; Giunti, Paola; Sweeney, Mary G; Hanna, Michael G; Fisher, Elizabeth M C

    2014-02-01

    Trinucleotide repeat disorders are a heterogeneous group of diseases caused by the expansion, beyond a pathogenic threshold, of unstable DNA tracts in different genes. Sequence interruptions in the repeats have been described in the majority of these disorders and may influence disease phenotype and heritability. Spinal bulbar muscular atrophy (SBMA) is a motor neuron disease caused by a CAG trinucleotide expansion in the androgen receptor (AR) gene. Diagnostic testing and previous research have relied on fragment analysis polymerase chain reaction to determine the AR CAG repeat size, and have therefore not been able to assess the presence of interruptions. We here report a sequencing study of the AR CAG repeat in a cohort of SBMA patients and control subjects in the United Kingdom. We found no repeat interruptions to be present, and we describe differences between sequencing and traditional sizing methods.

  1. Analysis of genetic relationship in mutant silkworm strains of Bombyx mori using inter simple sequence repeat (ISSR) markers

    Institute of Scientific and Technical Information of China (English)

    Dhanikachalam Velu; Kangayam M. Ponnuvel; Murugiah Muthulakshmi; Randhir K. Sinha; Syed M.H. Qadri

    2008-01-01

    Amplified inter simple sequence repeats (ISSR) markers were used to determine genetic relationships among mutant silkworm strains of Bombyx mori. Fifteen ISSR primers containing simple sequence repeat (SSR) motifs were used in this study. A total of 113 markers were produced among 20 mutant swains, of which 73.45% were found to be polymorphic. In selected mutant genetic stocks, the average number of observed allele was (1.7080±0.4567), effective alleles (1.5194±0.3950) and genetic diversity (Ht) (0.2901±0.0415). The dendrogram produced using the unweighted pair group method with arithmetic means (UPGMA) and cluster analysis made using Nei's genetic distance resulted in the formation of one major group containing 6 groups separated 20 mutant silkworm strains. Therefore, ISSR amplification is a valuable method for determining the genetic variability among mutant silkworm swains. This efficient molecular marker would be useful for characterizing a considerable number of silkworm swains maintained at the germplasm center.

  2. TdIF1 recognizes a specific DNA sequence through its Helix-Turn-Helix and AT-hook motifs to regulate gene transcription.

    Directory of Open Access Journals (Sweden)

    Takashi Kubota

    Full Text Available TdIF1 was originally identified as a protein that directly binds to DNA polymerase TdT. TdIF1 is also thought to function in transcription regulation, because it binds directly to the transcriptional factor TReP-132, and to histone deacetylases HDAC1 and HDAC2. Here we show that TdIF1 recognizes a specific DNA sequence and regulates gene transcription. By constructing TdIF1 mutants, we identify amino acid residues essential for its interaction with DNA. An in vitro DNA selection assay, SELEX, reveals that TdIF1 preferentially binds to the sequence 5'-GNTGCATG-3' following an AT-tract, through its Helix-Turn-Helix and AT-hook motifs. We show that four repeats of this recognition sequence allow TdIF1 to regulate gene transcription in a plasmid-based luciferase reporter assay. We demonstrate that TdIF1 associates with the RAB20 promoter, and RAB20 gene transcription is reduced in TdIF1-knocked-down cells, suggesting that TdIF1 stimulates RAB20 gene transcription.

  3. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    Directory of Open Access Journals (Sweden)

    Andrea Vásquez

    2014-01-01

    Full Text Available We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava’s SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp. It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava’s genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution.

  4. Applications of inter simple sequence repeat (ISSR) rDNA in ...

    African Journals Online (AJOL)

    Applications of inter simple sequence repeat (ISSR) rDNA in detecting ... and phylogenetic relationships between Lymnaea natalensis collected from Giza, ... in water samples of all tested governorates with different significant differences.

  5. Application of PCR amplicon sequencing using a single primer pair in PCR amplification to assess variations in Helicobacter pylori CagA EPIYA tyrosine phosphorylation motifs

    OpenAIRE

    Karlsson Anneli; Monstein Hans-Jürg; Ryberg Anna; Borch Kurt

    2010-01-01

    Background The presence of various EPIYA tyrosine phosphorylation motifs in the CagA protein of Helicobacter pylori has been suggested to contribute to pathogenesis in adults. In this study, a unique PCR assay and sequencing strategy was developed to establish the number and variation of cagA EPIYA motifs. Findings MDA-DNA derived from gastric biopsy specimens from eleven subjects with gastritis was used with M13- and T7- sequence-tagged primers for amplification of the cagA EPIYA motif regio...

  6. RePS: a sequence assembler that masks exact repeats identified from the shotgun data

    DEFF Research Database (Denmark)

    Wang, Jun; Wong, Gane Ka-Shu; Ni, Peixiang;

    2002-01-01

    We describe a sequence assembler, RePS (repeat-masked Phrap with scaffolding), that explicitly identifies exact 20mer repeats from the shotgun data and removes them prior to the assembly. The established software is used to compute meaningful error probabilities for each base. Clone-end-pairing i...

  7. Survey and analysis of simple sequence repeats in the Laccaria bicolor genome, with development of microsatellite markers

    Energy Technology Data Exchange (ETDEWEB)

    Labbe, Jessy L [ORNL; Murat, Claude [INRA, Nancy, France; Morin, Emmanuelle [INRA, Nancy, France; Le Tacon, F [UMR, France; Martin, Francis [INRA, Nancy, France

    2011-01-01

    It is becoming clear that simple sequence repeats (SSRs) play a significant role in fungal genome organization, and they are a large source of genetic markers for population genetics and meiotic maps. We identified SSRs in the Laccaria bicolor genome by in silico survey and analyzed their distribution in the different genomic regions. We also compared the abundance and distribution of SSRs in L. bicolor with those of the following fungal genomes: Phanerochaete chrysosporium, Coprinopsis cinerea, Ustilago maydis, Cryptococcus neoformans, Aspergillus nidulans, Magnaporthe grisea, Neurospora crassa and Saccharomyces cerevisiae. Using the MISA computer program, we detected 277,062 SSRs in the L. bicolor genome representing 8% of the assembled genomic sequence. Among the analyzed basidiomycetes, L. bicolor exhibited the highest SSR density although no correlation between relative abundance and the genome sizes was observed. In most genomes the short motifs (mono- to trinucleotides) were more abundant than the longer repeated SSRs. Generally, in each organism, the occurrence, relative abundance, and relative density of SSRs decreased as the repeat unit increased. Furthermore, each organism had its own common and longest SSRs. In the L. bicolor genome, most of the SSRs were located in intergenic regions (73.3%) and the highest SSR density was observed in transposable elements (TEs; 6,706 SSRs/Mb). However, 81% of the protein-coding genes contained SSRs in their exons, suggesting that SSR polymorphism may alter gene phenotypes. Within a L. bicolor offspring, sequence polymorphism of 78 SSRs was mainly detected in non-TE intergenic regions. Unlike previously developed microsatellite markers, these new ones are spread throughout the genome; these markers could have immediate applications in population genetics.

  8. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences.

    Directory of Open Access Journals (Sweden)

    Michael J McDonald

    2011-06-01

    Full Text Available The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.

  9. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences.

    Science.gov (United States)

    McDonald, Michael J; Wang, Wei-Chi; Huang, Hsien-Da; Leu, Jun-Yi

    2011-06-01

    The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.

  10. Conserved sequence motifs upstream from the co-ordinately expressed vitellogenin and apoVLDLII genes of chicken.

    Science.gov (United States)

    van het Schip, F; Strijker, R; Samallo, J; Gruber, M; Geert, A B

    1986-11-11

    The vitellogenin and apoVLDLII yolk protein genes of chicken are transcribed in the liver upon estrogenization. To get information on putative regulatory elements, we compared more than 2 kb of their 5' flanking DNA sequences. Common sequence motifs were found in regions exhibiting estrogen-induced changes in chromatin structure. Stretches of alternating pyrimidines and purines of about 30-nucleotides long are present at roughly similar positions. A distinct box of sequence homology in the chicken genes also appears to be present at a similar position in front of the vitellogenin genes of Xenopus laevis, but is absent from the estrogen-responsive egg-white protein genes expressed in the oviduct. In front of the vitellogenin (position -595) and the VLDLII gene (position -548), a DNA element of about 300 base-pairs was found, which possesses structural characteristics of a mobile genetic element and bears homology to the transposon-like Vi element of Xenopus laevis.

  11. Complete gene sequence of spider attachment silk protein (PySp1) reveals novel linker regions and extreme repeat homogenization.

    Science.gov (United States)

    Chaw, Ro Crystal; Saski, Christopher A; Hayashi, Cheryl Y

    2017-02-01

    Spiders use a myriad of silk types for daily survival, and each silk type has a unique suite of task-specific mechanical properties. Of all spider silk types, pyriform silk is distinct because it is a combination of a dry protein fiber and wet glue. Pyriform silk fibers are coated with wet cement and extruded into "attachment discs" that adhere silks to each other and to substrates. The mechanical properties of spider silk types are linked to the primary and higher-level structures of spider silk proteins (spidroins). Spidroins are often enormous molecules (>250 kDa) and have a lengthy repetitive region that is flanked by relatively short (∼100 amino acids), non-repetitive amino- and carboxyl-terminal regions. The amino acid sequence motifs in the repetitive region vary greatly between spidroin type, while motif length and number underlie the remarkable mechanical properties of spider silk fibers. Existing knowledge of pyriform spidroins is fragmented, making it difficult to define links between the structure and function of pyriform spidroins. Here, we present the full-length sequence of the gene encoding pyriform spidroin 1 (PySp1) from the silver garden spider Argiope argentata. The predicted protein is similar to previously reported PySp1 sequences but the A. argentata PySp1 has a uniquely long and repetitive "linker", which bridges the amino-terminal and repetitive regions. Predictions of the hydrophobicity and secondary structure of A. argentata PySp1 identify regions important to protein self-assembly. Analysis of the full complement of A. argentata PySp1 repeats reveals extreme intragenic homogenization, and comparison of A. argentata PySp1 repeats with other PySp1 sequences identifies variability in two sub-repetitive expansion regions. Overall, the full-length A. argentata PySp1 sequence provides new evidence for understanding how pyriform spidroins contribute to the properties of pyriform silk fibers. Copyright © 2017 The Authors. Published by

  12. Development and Characterization of Simple Sequence Repeat Markers Providing Genome-Wide Coverage and High Resolution in Maize

    Science.gov (United States)

    Xu, Jie; Liu, Ling; Xu, Yunbi; Chen, Churun; Rong, Tingzhao; Ali, Farhan; Zhou, Shufeng; Wu, Fengkai; Liu, Yaxi; Wang, Jing; Cao, Moju; Lu, Yanli

    2013-01-01

    Simple sequence repeats (SSRs) have been widely used in maize genetics and breeding, because they are co-dominant, easy to score, and highly abundant. In this study, we used whole-genome sequences from 16 maize inbreds and 1 wild relative to determine SSR abundance and to develop a set of high-density polymorphic SSR markers. A total of 264 658 SSRs were identified across the 17 genomes, with an average of 135 693 SSRs per genome. Marker density was one SSR every of 15.48 kb. (C/G)n, (AT)n, (CAG/CTG)n, and (AAAT/ATTT)n were the most frequent motifs for mono, di-, tri-, and tetra-nucleotide SSRs, respectively. SSRs were most abundant in intergenic region and least frequent in untranslated regions, as revealed by comparing SSR distributions of three representative resequenced genomes. Comparing SSR sequences and e-polymerase chain reaction analysis among the 17 tested genomes created a new database, including 111 887 SSRs, that could be develop as polymorphic markers in silico. Among these markers, 58.00, 26.09, 7.20, 3.00, 3.93, and 1.78% of them had mono, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs, respectively. Polymorphic information content for 35 573 polymorphic SSRs out of 111 887 loci varied from 0.05 to 0.83, with an average of 0.31 in the 17 tested genomes. Experimental validation of polymorphic SSR markers showed that over 70% of the primer pairs could generate the target bands with length polymorphism, and these markers would be very powerful when they are used for genetic populations derived from various types of maize germplasms that were sampled for this study. PMID:23804557

  13. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

    Science.gov (United States)

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-01-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277

  14. Characterization of a highly repeated DNA sequence family in five species of the genus Eulemur.

    Science.gov (United States)

    Ventura, M; Boniotto, M; Cardone, M F; Fulizio, L; Archidiacono, N; Rocchi, M; Crovella, S

    2001-09-19

    The karyotypes of Eulemur species exhibit a high degree of variation, as a consequence of the Robertsonian fusion and/or centromere fission. Centromeric and pericentromeric heterochromatin of eulemurs is constituted by highly repeated DNA sequences (including some telomeric TTAGGG repeats) which have so far been investigated and used for the study of the systematic relationships of the different species of the genus Eulemur. In our study, we have cloned a set of repetitive pericentromeric sequences of five Eulemur species: E. fulvus fulvus (EFU), E. mongoz (EMO), E. macaco (EMA), E. rubriventer (ERU), and E. coronatus (ECO). We have characterized these clones by sequence comparison and by comparative fluorescence in situ hybridization analysis in EMA and EFU. Our results showed a high degree of sequence similarity among Eulemur species, indicating a strong conservation, within the five species, of these pericentromeric highly repeated DNA sequences.

  15. Repeat Associated Non-AUG Translation (RAN Translation Dependent on Sequence Downstream of the ATXN2 CAG Repeat.

    Directory of Open Access Journals (Sweden)

    Daniel R Scoles

    Full Text Available Spinocerebellar ataxia type 2 (SCA2 is a progressive autosomal dominant disorder caused by the expansion of a CAG tract in the ATXN2 gene. The SCA2 disease phenotype is characterized by cerebellar atrophy, gait ataxia, and slow saccades. ATXN2 mutation causes gains of toxic and normal functions of the ATXN2 gene product, ataxin-2, and abnormally slow Purkinje cell firing frequency. Previously we investigated features of ATXN2 controlling expression and noted expression differences for ATXN2 constructs with varying CAG lengths, suggestive of repeat associated non-AUG translation (RAN translation. To determine whether RAN translation occurs for ATXN2 we assembled various ATXN2 constructs with ATXN2 tagged by luciferase, HA or FLAG tags, driven by the CMV promoter or the ATXN2 promoter. Luciferase expression from ATXN2-luciferase constructs lacking the ATXN2 start codon was weak vs AUG translation, regardless of promoter type, and did not increase with longer CAG repeat lengths. RAN translation was detected on western blots by the anti-polyglutamine antibody 1C2 for constructs driven by the CMV promoter but not the ATXN2 promoter, and was weaker than AUG translation. Strong RAN translation was also observed when driving the ATXN2 sequence with the CMV promoter with ATXN2 sequence downstream of the CAG repeat truncated to 18 bp in the polyglutamine frame but not in the polyserine or polyalanine frames. Our data demonstrate that ATXN2 RAN translation is weak compared to AUG translation and is dependent on ATXN2 sequences flanking the CAG repeat.

  16. Requirement for asparagine in the aquaporin NPA sequence signature motifs for cation exclusion

    DEFF Research Database (Denmark)

    Wree, Dorothea; Wu, Binghua; Zeuthen, Thomas

    2011-01-01

    Two highly conserved NPA motifs are a hallmark of the aquaporin (AQP) family. The NPA triplets form N-terminal helix capping structures with the Asn side chains located in the centre of the water or solute-conducting channel, and are considered to play an important role in AQP selectivity. Although...... another AQP selectivity filter site, the aromatic/Arg (ar/R) constriction, has been well characterized by mutational analysis, experimental data concerning the NPA region--in particular, the Asn position--is missing. Here, we report on the cloning and mutational analysis of a novel aquaglyceroporin...

  17. Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction

    NARCIS (Netherlands)

    Dijk, van A.D.J.; Morabito, G.; Fiers, M.A.; Ham, van R.C.H.J.; Angenent, G.C.; Immink, R.G.H.

    2010-01-01

    Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein famil

  18. angaGEDUCI: Anopheles gambiae gene expression database with integrated comparative algorithms for identifying conserved DNA motifs in promoter sequences

    Directory of Open Access Journals (Sweden)

    Ribeiro Jose Marcos C

    2006-05-01

    Full Text Available Abstract Background The completed sequence of the Anopheles gambiae genome has enabled genome-wide analyses of gene expression and regulation in this principal vector of human malaria. These investigations have created a demand for efficient methods of cataloguing and analyzing the large quantities of data that have been produced. The organization of genome-wide data into one unified database makes possible the efficient identification of spatial and temporal patterns of gene expression, and by pairing these findings with comparative algorithms, may offer a tool to gain insight into the molecular mechanisms that regulate these expression patterns. Description We provide a publicly-accessible database and integrated data-mining tool, angaGEDUCI, that unifies 1 stage- and tissue-specific microarray analyses of gene expression in An. gambiae at different developmental stages and temporal separations following a bloodmeal, 2 functional gene annotation, 3 genomic sequence data, and 4 promoter sequence comparison algorithms. The database can be used to study genes expressed in particular stages, tissues, and patterns of interest, and to identify conserved promoter sequence motifs that may play a role in the regulation of such expression. The database is accessible from the address http://www.angaged.bio.uci.edu. Conclusion By combining gene expression, function, and sequence data with integrated sequence comparison algorithms, angaGEDUCI streamlines spatial and temporal pattern-finding and produces a straightforward means of developing predictions and designing experiments to assess how gene expression may be controlled at the molecular level.

  19. Examination of the transcription factor NtcA-binding motif by in vitro selection of DNA sequences from a random library.

    Science.gov (United States)

    Jiang, F; Wisén, S; Widersten, M; Bergman, B; Mannervik, B

    2000-08-25

    A recursive in vitro selection among random DNA sequences was used for analysis of the cyanobacterial transcription factor NtcA-binding motifs. An eight-base palindromic sequence, TGTA-(N(8))-TACA, was found to be the optimal NtcA-binding sequence. The more divergent the binding sequences, compared to this consensus sequence, the lower the NtcA affinity. The second and third bases in each four-nucleotide half of the consensus sequence were crucial for NtcA binding, and they were in general highly conserved. The most frequently occurring sequence in the middle weakly conserved region was similar to that of the NtcA-binding motif of the Anabaena sp. strain PCC 7120 glnA gene, previously known to have high affinity for NtcA. This indicates that the middle sequences were selected for high NtcA affinity. Analysis of natural NtcA-binding motifs showed that these could be classified into two groups based on differences in recognition consensus sequences. It is suggested that NtcA naturally recognizes different DNA-binding motifs, or has differential affinities to these sequences under different physiological conditions.

  20. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Directory of Open Access Journals (Sweden)

    Charlotte Rehm

    Full Text Available In prokaryotes simple sequence repeats (SSRs with unit sizes of 1-5 nucleotides (nt are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4 structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc, Xanthomonas axonopodis pv. citri str. 306 (Xac, and Nostoc sp. strain PCC7120 (Ana. In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  1. Variable number of tandem repeats in clinical strains of Haemophilus influenzae

    NARCIS (Netherlands)

    A.F. van Belkum (Alex); S. Scherer; D. Willemse; L. van Alphen (Loek); H.A. Verbrugh (Henri); W.B. van Leeuwen (Willem)

    1997-01-01

    textabstractAn algorithm capable of identifying short repeat motifs was developed and used to screen the whole genome sequence available for Haemophilus influenzae, since some of these repeats have been shown to affect bacterial virulence. Various di- to hexanucleotide

  2. An ancient repeat sequence in the ATP synthase beta-subunit gene of forcipulate sea stars.

    Science.gov (United States)

    Foltz, David W

    2007-11-01

    A novel repeat sequence with a conserved secondary structure is described from two nonadjacent introns of the ATP synthase beta-subunit gene in sea stars of the order Forcipulatida (Echinodermata: Asteroidea). The repeat is present in both introns of all forcipulate sea stars examined, which suggests that it is an ancient feature of this gene (with an approximate age of 200 Mya). Both stem and loop regions show high levels of sequence constraint when compared to flanking nonrepetitive intronic regions. The repeat was also detected in (1) the family Pterasteridae, order Velatida and (2) the family Korethrasteridae, order Velatida. The repeat was not detected in (1) the family Echinasteridae, order Spinulosida, (2) the family Astropectinidae, order Paxillosida, (3) the family Solasteridae, order Velatida, or (4) the family Goniasteridae, order Valvatida. The repeat lacks similarity to published sequences in unrestricted GenBank searches, and there are no significant open reading frames in the repeat or in the flanking intron sequences. Comparison via parametric bootstrapping to a published phylogeny based on 4.2 kb of nuclear and mitochondrial sequence for a subset of these species allowed the null hypothesis of a congruent phylogeny to be rejected for each repeat, when compared separately to the published phylogeny. In contrast, the flanking nonrepetitive sequences in each intron yielded separate phylogenies that were each congruent with the published phylogeny. In four species, the repeat in one or both introns has apparently experienced gene conversion. The two introns also show a correlated pattern of nucleotide substitutions, even after excluding the putative cases of gene conversion.

  3. A novel tandem repeat sequence located on human chromosome 4p: isolation and characterization.

    Science.gov (United States)

    Kogi, M; Fukushige, S; Lefevre, C; Hadano, S; Ikeda, J E

    1997-06-01

    In an effort to analyze the genomic region of the distal half of human chromosome 4p, to where Huntington disease and other diseases have been mapped, we have isolated the cosmid clone (CRS447) that was likely to contain a region with specific repeat sequences. Clone CRS447 was subjected to detailed analysis, including chromosome mapping, restriction mapping, and DNA sequencing. Chromosome mapping by both a human-CHO hybrid cell panel and FISH revealed that CRS447 was predominantly located in the 4p15.1-15.3 region. CRS447 was shown to consist of tandem repeats of 4.7-kb units present on chromosome 4p. A single EcoRI unit was subcloned (pRS447), and the complete sequence was determined as 4752 nucleotides. When pRS447 was used as a probe, the number of copies of this repeat per haploid genome was estimated to be 50-70. Sequence analysis revealed that it contained two internal CA repeats and one putative ORF. Database search established that this sequence was unreported. However, two homologous STS markers were found in the database. We concluded that CRS447/pRS447 is a novel tandem repeat sequence that is mainly specific to human chromosome 4p.

  4. Novel multiplex format of an extended multilocus variable-number-tandem-repeat analysis of Clostridium difficile correlates with tandem repeat sequence typing.

    Science.gov (United States)

    Jensen, Mie Birgitte Frid; Engberg, Jørgen; Larsson, Jonas T; Olsen, Katharina E P; Torpdahl, Mia

    2015-03-01

    Subtyping of Clostridium difficile is crucial for outbreak investigations. An extended multilocus variable-number tandem-repeat analysis (eMLVA) of 14 variable number tandem repeat (VNTR) loci was validated in multiplex format compatible with a routine typing laboratory and showed excellent concordance with tandem repeat sequence typing (TRST) and high discriminatory power.

  5. A conserved sequence extending motif III of the motor domain in the Snf2-family DNA translocase Rad54 is critical for ATPase activity.

    Directory of Open Access Journals (Sweden)

    Xiao-Ping Zhang

    Full Text Available Rad54 is a dsDNA-dependent ATPase that translocates on duplex DNA. Its ATPase function is essential for homologous recombination, a pathway critical for meiotic chromosome segregation, repair of complex DNA damage, and recovery of stalled or broken replication forks. In recombination, Rad54 cooperates with Rad51 protein and is required to dissociate Rad51 from heteroduplex DNA to allow access by DNA polymerases for recombination-associated DNA synthesis. Sequence analysis revealed that Rad54 contains a perfect match to the consensus PIP box sequence, a widely spread PCNA interaction motif. Indeed, Rad54 interacts directly with PCNA, but this interaction is not mediated by the Rad54 PIP box-like sequence. This sequence is located as an extension of motif III of the Rad54 motor domain and is essential for full Rad54 ATPase activity. Mutations in this motif render Rad54 non-functional in vivo and severely compromise its activities in vitro. Further analysis demonstrated that such mutations affect dsDNA binding, consistent with the location of this sequence motif on the surface of the cleft formed by two RecA-like domains, which likely forms the dsDNA binding site of Rad54. Our study identified a novel sequence motif critical for Rad54 function and showed that even perfect matches to the PIP box consensus may not necessarily identify PCNA interaction sites.

  6. The human TTAGGG repeat factors 1 and 2 bind to a subset of interstitial telomeric sequences and satellite repeats

    Institute of Scientific and Technical Information of China (English)

    Thomas Simonet; Elena Giulotto; Frederique Magdinier; Béatrice Horard; Pascal Barbry; Rainer Waldmann; Eric Gison; Laure-Emmanuelle Zaragosi; Claude Philippe; Kevin Lebrigand; Clémentine Schouteden; Adeline Augereau; Serge Bauwens; Jing Ye; Marco Santagostino

    2011-01-01

    The study of the proteins that bind to telomeric DNA in mammals has provided a deep understanding of the mech anisms involved in chromosome-end protection. However, very little is known on the binding of these proteins to nontelomeric DNA sequences. The TTAGGG DNA repeat proteins 1 and 2 (TRF1 and TRF2) bind to mammalian telomeres as part of the shelterin complex and are essential for maintaining chromosome end stability. In this study, we combined chromatin immunoprecipitation with high-throughput sequencing to map at high sensitivity and resolution the human chromosomal sites to which TRF1 and TRF2 bind. While most of the identified sequences correspond to telomeric regions, we showed that these two proteins also bind to extratelomeric sites. The vast majority of these extratelomeric sites contains interstitial telomeric sequences (or ITSs). However, we also identified non-iTS sites, which correspond to centromeric and pericentromeric satellite DNA. Interestingly, the TRF-binding sites are often located in the proximity of genes or within introns. We propose that TRF1 and TRF2 couple the functional state of telomeres to the long-range organization of chromosomes and gene regulation networks by binding to extratelomeric sequences.

  7. Spatio-temporal Variations of Characteristic Repeating Earthquake Sequences along the Middle America Trench in Mexico

    Science.gov (United States)

    Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.

    2015-12-01

    Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.

  8. Functional importance of GGXG sequence motifs in putative reentrant loops of 2HCT and ESS transport proteins.

    Science.gov (United States)

    Dobrowolski, Adam; Lolkema, Juke S

    2009-08-11

    The 2HCT and ESS families are two families of secondary transporters. Members of the two families are unrelated in amino acid sequence but share similar hydropathy profiles, which suggest a similar folding of the proteins in membranes. Structural models show two homologous domains containing five transmembrane segments (TMSs) each, with a reentrant or pore loop between the fourth and fifth TMSs in each domain. Here we show that GGXG sequence motifs present in the putative reentrant loops are important for the activity of the transporters. Mutation of the conserved Gly residues to Cys in the motifs of the Na(+)-citrate transporter CitS in the 2HCT family and the Na(+)-glutamate transporter GltS in the ESS family resulted in strongly reduced transport activity. Similarly, mutation of the variable residue "X" to Cys in the N-terminal half of GltS essentially inactivated the transporter. The corresponding mutations in the N- and C-terminal halves of CitS reduced transport activity to 60 and 25% of that of the wild type, respectively. Residual activity of any of the mutants could be further reduced by treatment with the membrane permeable thiol reagent N-ethylmaleimide (NEM). The X to Cys mutation (S405C) in the cytoplasmic loop in the C-terminal half of CitS rendered the protein sensitive to the bulky, membrane impermeable thiol reagent 4-acetamido-4'-maleimidylstilbene-2,2'-disulfonic acid (AmdiS) added at the periplasmic side of the membrane, providing further evidence that this part of the loop is positioned between the transmembrane segments. The putative reentrant loop in the C-terminal half of the ESS family does not contain the GGXG motif, but a conserved stretch rich in Gly residues. Cysteine-scanning mutagenesis of a stretch of 18 residues in the GltS protein revealed two residues important for function. Mutant N356C was completely inactivated by treatment with NEM, and mutant P351C appeared to be the counterpart of mutant S405C of CitS; the mutant was

  9. Use of sequence motifs as barcodes and secondary structures of Internal Transcribed spacer 2 (ITS2, rDNA) for identification of the Indian liver fluke, Fasciola (Trematoda: Fasciolidae)

    Science.gov (United States)

    Prasad, PK; Tandon, V; Biswal, DK; Goswami, LM; Chatterjee, A

    2009-01-01

    Most phylogenetic studies using current methods have focused on primary DNA sequence information. However, RNA secondary structures are particularly useful in systematics because they include characteristics that give “morphological” information which is not found in the primary sequence. Also DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat are useful for identification of trematodes. The species of liver flukes of the genus Fasciola (Platyhelminthes: Digenea: Fasciolidae) are obligate parasitic trematodes residing in the large biliary ducts of herbivorous mammals. While Fasciola hepatica has a cosmopolitan distribution, the other major species, i.e., F. gigantica is reportedly prevalent in the tropical and subtropical regions of Africa and Asia. To determine the Fasciola sp. of Assam (India) origin based on rDNA molecular data, ribosomal ITS2 region was sequenced (EF027103) and analysed. NCBI databases were used for sequence homology analysis and the phylogenetic trees were constructed based upon the ITS2 using MEGA and a Bayesian analysis of the combined data. The latter approach allowed us to include both primary sequence and RNA molecular morphometrics and revealed a close relationship with isolates of F. gigantica from China, Indonesia and Japan, the isolate from China with significant bootstrap values being the closest. ITS2 sequence motifs allowed an accurate in silico distinction of liver flukes. The data indicate that ITS2 motifs (≤ 50 bp in size) can be considered promising tool for trematode species identification. Using the novel approach of molecular morphometrics that is based on ITS2 secondary structure homologies, phylogenetic relationships of the various isolates of fasciolid species have been discussed. PMID:19294000

  10. NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole

    2011-01-01

    to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs...

  11. Significance of satellite DNA revealed by conservation of a widespread repeat DNA sequence among angiosperms.

    Science.gov (United States)

    Mehrotra, Shweta; Goel, Shailendra; Raina, Soom Nath; Rajpal, Vijay Rani

    2014-08-01

    The analysis of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of plant nuclear DNA. In the present study, we analyzed the nature of pCtKpnI-I and pCtKpnI-II tandem repeated sequences, reported earlier in Carthamus tinctorius. Interestingly, homolog of pCtKpnI-I repeat sequence was also found to be present in widely divergent families of angiosperms. pCtKpnI-I showed high sequence similarity but low copy number among various taxa of different families of angiosperms analyzed. In comparison, pCtKpnI-II was specific to the genus Carthamus and was not present in any other taxa analyzed. The molecular structure of pCtKpnI-I was analyzed in various unrelated taxa of angiosperms to decipher the evolutionary conserved nature of the sequence and its possible functional role.

  12. Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.)

    Science.gov (United States)

    Background: Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed S...

  13. Unstable microsatellite repeats facilitate rapid evolution of coding and regulatory sequences.

    Science.gov (United States)

    Jansen, A; Gemayel, R; Verstrepen, K J

    2012-01-01

    Tandem repeats are intrinsically highly variable sequences since repeat units are often lost or gained during replication or following unequal recombination events. Because of their low complexity and their instability, these repeats, which are also called satellite repeats, are often considered to be useless 'junk' DNA. However, recent findings show that tandem repeats are frequently found within promoters of stress-induced genes and within the coding regions of genes encoding cell-surface and regulatory proteins. Interestingly, frequent changes in these repeats often confer phenotypic variability. Examples include variation in the microbial cell surface, rapid tuning of internal molecular clocks in flies, and enhanced morphological plasticity in mammals. This suggests that instead of being useless junk DNA, some variable tandem repeats are useful functional elements that confer 'evolvability', facilitating swift evolution and rapid adaptation to changing environments. Since changes in repeats are frequent and reversible, repeats provide a unique type of mutation that bridges the gap between rare genetic mutations, such as single nucleotide polymorphisms, and highly unstable but reversible epigenetic inheritance.

  14. Evolutionary conservation of sequence and secondary structures inCRISPR repeats

    Energy Technology Data Exchange (ETDEWEB)

    Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip

    2006-09-01

    Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.

  15. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data.

    Science.gov (United States)

    Gelfond, Jonathan A L; Gupta, Mayetri; Ibrahim, Joseph G

    2009-12-01

    We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.

  16. Survey and analysis of simple sequence repeats in the Ustilaginoidea virens genome and the development of microsatellite markers.

    Science.gov (United States)

    Yu, Mina; Yu, Junjie; Li, Huanhuan; Wang, Yahui; Yin, Xiaole; Bo, Huiwen; Ding, Hui; Zhou, Yuxin; Liu, Yongfeng

    2016-07-01

    Ustilaginoidea virens is the causal agent of rice false smut, causing quantitative and qualitative losses in rice industry. However, the development and application of simple sequence repeat (SSR) markers for genetic diversity studies in U. virens were limited. This study is the first to perform large-scale development of SSR markers of this pathogen at the genome level, to (1) compare these SSR markers with those of other fungi, (2) analyze the pattern of the SSRs, and (3) obtain more informative genetic markers. U. virens is rich in SSRs, and 13,778 SSRs were identified with a relative abundance of 349.7SSRs/Mb. The most common motifs in the genome or in noncoding regions were mononucleotides, whereas trinucleotides in coding sequences. A total of 6 out of 127 primers were randomly selected to be used to analyze 115 isolates, and these 6 primers showed high polymorphism in U. virens. This study may serve as an important resource for molecular genetic studies in U. virens.

  17. Peptomics, identification of novel cationic Arabidopsis peptides with conserved sequence motifs

    DEFF Research Database (Denmark)

    Olsen, Addie Nina; Mundy, John; Skriver, Karen

    2002-01-01

    Few plant peptides involved in intercellular communication have been experimentally isolated. Sequence analysis of the Arabidopsis thaliana genome has revealed numerous transmembrane receptors predicted to bind proteinacious ligands, emphasizing the importance of identifying peptides with signali...

  18. Sequences spanning the leader-repeat junction mediate CRISPR adaptation to phage in Streptococcus thermophilus.

    Science.gov (United States)

    Wei, Yunzhou; Chesne, Megan T; Terns, Rebecca M; Terns, Michael P

    2015-02-18

    CRISPR-Cas systems are RNA-based immune systems that protect prokaryotes from invaders such as phages and plasmids. In adaptation, the initial phase of the immune response, short foreign DNA fragments are captured and integrated into host CRISPR loci to provide heritable defense against encountered foreign nucleic acids. Each CRISPR contains a ∼100-500 bp leader element that typically includes a transcription promoter, followed by an array of captured ∼35 bp sequences (spacers) sandwiched between copies of an identical ∼35 bp direct repeat sequence. New spacers are added immediately downstream of the leader. Here, we have analyzed adaptation to phage infection in Streptococcus thermophilus at the CRISPR1 locus to identify cis-acting elements essential for the process. We show that the leader and a single repeat of the CRISPR locus are sufficient for adaptation in this system. Moreover, we identified a leader sequence element capable of stimulating adaptation at a dormant repeat. We found that sequences within 10 bp of the site of integration, in both the leader and repeat of the CRISPR, are required for the process. Our results indicate that information at the CRISPR leader-repeat junction is critical for adaptation in this Type II-A system and likely other CRISPR-Cas systems.

  19. Chromatin structure of repeating CTG/CAG and CGG/CCG sequences in human disease.

    Science.gov (United States)

    Wang, Yuh-Hwa

    2007-05-01

    In eukaryotic cells, chromatin structure organizes genomic DNA in a dynamic fashion, and results in regulation of many DNA metabolic processes. The CTG/CAG and CGG/CCG repeating sequences involved in several neuromuscular degenerative diseases display differential abilities for the binding of histone octamers. The effect of the repeating DNA on nucleosome assembly could be amplified as the number of repeats increases. Also, CpG methylation, and sequence interruptions within the triplet repeats exert an impact on the formation of nucleosomes along these repeating DNAs. The two most common triplet expansion human diseases, myotonic dystrophy 1 and fragile X syndrome, are caused by the expanded CTG/CAG and CGG/CCG repeats, respectively. In addition to the expanded repeats and CpG methylation, histone modifications, chromatin remodeling factors, and noncoding RNA have been shown to coordinate the chromatin structure at both myotonic dystrophy 1 and fragile X loci. Alterations in chromatin structure at these two loci can affect transcription of these disease-causing genes, leading to disease symptoms. These observations have brought a new appreciation that a full understanding of disease gene expression requires a knowledge of the structure of the chromatin domain within which the gene resides.

  20. A Novel Signal Processing Measure to Identify Exact and Inexact Tandem Repeat Patterns in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Ravi Gupta

    2007-03-01

    Full Text Available The identification and analysis of repetitive patterns are active areas of biological and computational research. Tandem repeats in telomeres play a role in cancer and hypervariable trinucleotide tandem repeats are linked to over a dozen major neurodegenerative genetic disorders. In this paper, we present an algorithm to identify the exact and inexact repeat patterns in DNA sequences based on orthogonal exactly periodic subspace decomposition technique. Using the new measure our algorithm resolves the problems like whether the repeat pattern is of period P or its multiple (i.e., 2P, 3P, etc., and several other problems that were present in previous signal-processing-based algorithms. We present an efficient algorithm of O(NLw logLw, where N is the length of DNA sequence and Lw is the window length, for identifying repeats. The algorithm operates in two stages. In the first stage, each nucleotide is analyzed separately for periodicity, and in the second stage, the periodic information of each nucleotide is combined together to identify the tandem repeats. Datasets having exact and inexact repeats were taken up for the experimental purpose. The experimental result shows the effectiveness of the approach.

  1. Isolation and Characterization of Simple Sequence Repeats (SSR Markers from the Moss Genus Orthotrichum Using a Small Throughput Pyrosequencing Machine

    Directory of Open Access Journals (Sweden)

    Vítězslav Plášek

    2012-06-01

    Full Text Available Here, we report the results of next-generation sequencing on the GS Junior system to identify a large number of microsatellites from the epiphytic moss Orthotrichum speciosum. Using a combination of a total (non-enrichment genomic library and small-scale 454 pyrosequencing, we determined 5382 contigs whose length ranged from 103 to 5445 bp. In this dataset we identified 92 SSR (simple sequence repeats motifs in 89 contigs. Forty-six of these had flanking regions suitable for primer design. We tested PCR amplification, reproducibility, and the level of polymorphism of 46 primer pairs for Orthotrichum speciosum using 40 individuals from two populations. As a result, the designed primers revealed 35 polymorphic loci with more than two alleles detected. This method is cost- and time-effective in comparison with traditional approaches involving cloning and sequencing.

  2. De novo computational identification of stress-related sequence motifs and microRNA target sites in untranslated regions of a plant translatome

    Science.gov (United States)

    Munusamy, Prabhakaran; Zolotarov, Yevgen; Meteignier, Louis-Valentin; Moffett, Peter; Strömvik, Martina V.

    2017-01-01

    Gene regulation at the transcriptional and translational level leads to diversity in phenotypes and function in organisms. Regulatory DNA or RNA sequence motifs adjacent to the gene coding sequence act as binding sites for proteins that in turn enable or disable expression of the gene. Whereas the known DNA and RNA binding proteins range in the thousands, only a few motifs have been examined. In this study, we have predicted putative regulatory motifs in groups of untranslated regions from genes regulated at the translational level in Arabidopsis thaliana under normal and stressed conditions. The test group of sequences was divided into random subgroups and subjected to three de novo motif finding algorithms (Seeder, Weeder and MEME). In addition to identifying sequence motifs, using an in silico tool we have predicted microRNA target sites in the 3′ UTRs of the translationally regulated genes, as well as identified upstream open reading frames located in the 5′ UTRs. Our bioinformatics strategy and the knowledge generated contribute to understanding gene regulation during stress, and can be applied to disease and stress resistant plant development. PMID:28276452

  3. Analysis of simple sequence repeats in the Gaeumannomyces graminis var. tritici genome and the development of microsatellite markers.

    Science.gov (United States)

    Li, Wei; Feng, Yanxia; Sun, Haiyan; Deng, Yuanyu; Yu, Hanshou; Chen, Huaigu

    2014-11-01

    Understanding the genetic structure of Gaeumannomyces graminis var. tritici is essential for the establishment of efficient disease control strategies. It is becoming clear that microsatellites, or simple sequence repeats (SSRs), play an important role in genome organization and phenotypic diversity, and are a large source of genetic markers for population genetics and meiotic maps. In this study, we examined the G. graminis var. tritici genome (1) to analyze its pattern of SSRs, (2) to compare it with other plant pathogenic filamentous fungi, such as Magnaporthe oryzae and M. poae, and (3) to identify new polymorphic SSR markers for genetic diversity. The G. graminis var. tritici genome was rich in SSRs; a total 13,650 SSRs have been identified with mononucleotides being the most common motifs. In coding regions, the densities of tri- and hexanucleotides were significantly higher than in noncoding regions. The di-, tri-, tetra, penta, and hexanucleotide repeats in the G. graminis var. tritici genome were more abundant than the same repeats in M. oryzae and M. poae. From 115 devised primers, 39 SSRs are polymorphic with G. graminis var. tritici isolates, and 8 primers were randomly selected to analyze 116 isolates from China. The number of alleles varied from 2 to 7 and the expected heterozygosity (He) from 0.499 to 0.837. In conclusion, SSRs developed in this study were highly polymorphic, and our analysis indicated that G. graminis var. tritici is a species with high genetic diversity. The results provide a pioneering report for several applications, such as the assessment of population structure and genetic diversity of G. graminis var. tritici.

  4. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2007-02-01

    Full Text Available Abstract Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

  5. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.

    2010-07-12

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  6. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Matt J Cahill

    Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.

  7. Tandem repeats and G-rich sequences are enriched at human CNV breakpoints.

    Directory of Open Access Journals (Sweden)

    Promita Bose

    Full Text Available Chromosome breakage in germline and somatic genomes gives rise to copy number variation (CNV responsible for genomic disorders and tumorigenesis. DNA sequence is known to play an important role in breakage at chromosome fragile sites; however, the sequences susceptible to double-strand breaks (DSBs underlying CNV formation are largely unknown. Here we analyze 140 germline CNV breakpoints from 116 individuals to identify DNA sequences enriched at breakpoint loci compared to 2800 simulated control regions. We find that, overall, CNV breakpoints are enriched in tandem repeats and sequences predicted to form G-quadruplexes. G-rich repeats are overrepresented at terminal deletion breakpoints, which may be important for the addition of a new telomere. Interstitial deletions and duplication breakpoints are enriched in Alu repeats that in some cases mediate non-allelic homologous recombination (NAHR between the two sides of the rearrangement. CNV breakpoints are enriched in certain classes of repeats that may play a role in DNA secondary structure, DSB susceptibility and/or DNA replication errors.

  8. Efficient multiplex simple sequence repeat genotyping of the oomycete plant pathogen Phytophthora infestans

    NARCIS (Netherlands)

    Li, Y.; Cooke, D.E.L.; Jacobsen, E.; Lee, van der T.A.J.

    2013-01-01

    Genotyping is fundamental to population analysis. To accommodate fast, accurate and cost-effective genotyping, a one-step multiplex PCR method employing twelve simple sequence repeat (SSR) markers was developed for high-throughput screening of Phytophthora infestans populations worldwide. The SSR

  9. Expression of a new chimeric protein with a highly repeated sequence in tobacco cells.

    Science.gov (United States)

    Saumonneau, Amélie; Rottier, Karine; Conrad, Udo; Popineau, Yves; Guéguen, Jacques; Francin-Allami, Mathilde

    2011-07-01

    In wheat, the high-molecular weight (HMW) glutenin subunits are known to contribute to gluten viscoelasticity, and show some similarities to elastomeric animal proteins as elastin. When combining the sequence of a glutenin with that of elastin is a way to create new chimeric functional proteins, which could be expressed in plants. The sequence of a glutenin subunit was modified by the insertion of several hydrophobic and elastic motifs derived from elastin (elastin-like peptide, ELP) into the hydrophilic repetitive domain of the glutenin subunit to create a triblock protein, the objective being to improve the mechanical (elastomeric) properties of this wheat storage protein. In this study, we investigated an expression model system to analyze the expression and trafficking of the wild-type HMW glutenin subunit (GS(W)) and an HMW glutenin subunit mutated by the insertion of elastin motifs (GS(M)-ELP). For this purpose, a series of constructs was made to express wild-type subunits and subunits mutated by insertion of elastin motifs in fusion with green fluorescent protein (GFP) in tobacco BY-2 cells. Our results showed for the first time the expression of HMW glutenin fused with GFP in tobacco protoplasts. We also expressed and localized the chimeric protein composed of plant glutenin and animal elastin-like peptides (ELP) in BY-2 protoplasts, and demonstrated its presence in protein body-like structures in the endoplasmic reticulum. This work, therefore, provides a basis for heterologous production of the glutenin-ELP triblock protein to characterize its mechanical properties.

  10. Sequence analysis of trinucleotide repeat microsatellites from an enrichment library of the equine genome.

    Science.gov (United States)

    Tozaki, T; Inoue, S; Mashima, S; Ohta, M; Miura, N; Tomita, M

    2000-04-01

    Microsatellites are useful tools for the construction of a linkage map and parentage testing of equines, but only a limited number of equine microsatellites have been elucidated. Thus, we constructed the equine genomic library enriched for DNA fragments containing (CAG)n repeats. The enriched method includes hybridization-capture of repeat regions using biotin-conjugated oligonucleotides, nucleotide substrate-biased polymerase reaction with the oligonucleotides and subsequent PCR amplification, because these procedures are useful for the cloning of less abundant trinucleotide microsatellites. Microsatellites containing (CAG)n repeats were obtained at the ratio of one per 3-4 clones, indicating an enrichment value about 10(4)-fold, resulting in less time consumption and less cost for cloning. In this study, 66 different microsatellites, (CAG)n repeats, were identified. The number of complete simple CAG repeats in our clones ranged 4-33, with an average repeat length of 8.8 units. The microsatellites were useful as sequence-tagged site (STS) markers. In addition, some clones containing (CAG)n repeats showed homology to human (CAG)n-containing genes, which have been previously mapped. These results indicate that the clones might be a useful tool for chromosome comparison between equines and humans.

  11. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    Science.gov (United States)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  12. Characterization and compilation of polymorphic simple sequence repeat (SSR markers of peanut from public database

    Directory of Open Access Journals (Sweden)

    Zhao Yongli

    2012-07-01

    Full Text Available Abstract Background There are several reports describing thousands of SSR markers in the peanut (Arachis hypogaea L. genome. There is a need to integrate various research reports of peanut DNA polymorphism into a single platform. Further, because of lack of uniformity in the labeling of these markers across the publications, there is some confusion on the identities of many markers. We describe below an effort to develop a central comprehensive database of polymorphic SSR markers in peanut. Findings We compiled 1,343 SSR markers as detecting polymorphism (14.5% within a total of 9,274 markers. Amongst all polymorphic SSRs examined, we found that AG motif (36.5% was the most abundant followed by AAG (12.1%, AAT (10.9%, and AT (10.3%.The mean length of SSR repeats in dinucleotide SSRs was significantly longer than that in trinucleotide SSRs. Dinucleotide SSRs showed higher polymorphism frequency for genomic SSRs when compared to trinucleotide SSRs, while for EST-SSRs, the frequency of polymorphic SSRs was higher in trinucleotide SSRs than in dinucleotide SSRs. The correlation of the length of SSR and the frequency of polymorphism revealed that the frequency of polymorphism was decreased as motif repeat number increased. Conclusions The assembled polymorphic SSRs would enhance the density of the existing genetic maps of peanut, which could also be a useful source of DNA markers suitable for high-throughput QTL mapping and marker-assisted selection in peanut improvement and thus would be of value to breeders.

  13. Expressed sequence tags (ESTs and simple sequence repeat (SSR markers from octoploid strawberry (Fragaria × ananassa

    Directory of Open Access Journals (Sweden)

    Bies Dawn H

    2005-06-01

    Full Text Available Abstract Background Cultivated strawberry (Fragaria × ananassa represents one of the most valued fruit crops in the United States. Despite its economic importance, the octoploid genome presents a formidable barrier to efficient study of genome structure and molecular mechanisms that underlie agriculturally-relevant traits. Many potentially fruitful research avenues, especially large-scale gene expression surveys and development of molecular genetic markers have been limited by a lack of sequence information in public databases. As a first step to remedy this discrepancy a cDNA library has been developed from salicylate-treated, whole-plant tissues and over 1800 expressed sequence tags (EST's have been sequenced and analyzed. Results A putative unigene set of 1304 sequences – 133 contigs and 1171 singlets – has been developed, and the transcripts have been functionally annotated. Homology searches indicate that 89.5% of sequences share significant similarity to known/putative proteins or Rosaceae ESTs. The ESTs have been functionally characterized and genes relevant to specific physiological processes of economic importance have been identified. A set of tools useful for SSR development and mapping is presented. Conclusion Sequences derived from this effort may be used to speed gene discovery efforts in Fragaria and the Rosaceae in general and also open avenues of comparative mapping. This report represents a first step in expanding molecular-genetic analyses in strawberry and demonstrates how computational tools can be used to optimally mine a large body of useful information from a relatively small data set.

  14. Alignment of capsid protein VP1 sequences of all human rhinovirus prototype strains: conserved motifs and functional domains.

    Science.gov (United States)

    Laine, Pia; Blomqvist, Soile; Savolainen, Carita; Andries, Koen; Hovi, Tapani

    2006-01-01

    An alignment was made of the deduced amino acid sequences of the entire capsid protein VP1 of all human rhinovirus (HRV) prototype strains to examine conserved motifs in the primary structure. A set of previously proposed crucially important amino acids in the footprints of the two known receptor molecules was not conserved in a receptor group-specific way. In contrast, VP1 and VP3 amino acids in the minor receptor-group strains corresponding to most of the predicted ICAM-1 footprint definitely differed from those of the ICAM-1-using major receptor-group strains. Previous antiviral-sensitivity classification showed an almost-complete agreement with the species classification and a fair correlation with amino acids aligning in the antiviral pocket. It was concluded that systematic alignment of sequences of related virus strains can be used to test hypotheses derived from molecular studies of individual model viruses and to generate ideas for future studies on virus structure and replication.

  15. Peptide sequences identified by phage display are immunodominant functional motifs of Pet and Pic serine proteases secreted by Escherichia coli and Shigella flexneri.

    Science.gov (United States)

    Ulises, Hernández-Chiñas; Tatiana, Gazarian; Karlen, Gazarian; Guillermo, Mendoza-Hernández; Juan, Xicohtencatl-Cortes; Carlos, Eslava

    2009-12-01

    Plasmid-encoded toxin (Pet) and protein involved in colonization (Pic), are serine protease autotransporters of Enterobacteriaceae (SPATEs) secreted by enteroaggregative Escherichia coli (EAEC), which display the GDSGSG sequence or the serine motif. Our research was directed to localize functional sites in both proteins using the phage display method. From a 12mer linear and a 7mer cysteine-constrained (C7C) libraries displayed on the M13 phage pIII protein we selected different mimotopes using IgG purified from sera of children naturally infected with EAEC producing Pet and Pic proteins, and anti-Pet and anti-Pic IgG purified from rabbits immunized with each one of these proteins. Children IgG selected a homologous group of sequences forming the consensus sequence, motif, PQPxK, and the motifs PGxI/LN and CxPDDSSxC were selected by the rabbit anti-Pet and anti-Pic IgGs, respectively. Analysis of the amino terminal region of a panel of SPATEs showed the presence in all of them of sequences matching the PGxI/LN or CxPDDSSxC motifs, and in a three-dimensional model (Modeller 9v2) designed for Pet, both these motifs were found in the globular portion of the protein, close to the protease active site GDSGSG. Antibodies induced in mice by mimotopes carrying the three aforementioned motifs were reactive with Pet, Pic, and with synthetic peptides carrying the immunogenic mimotope sequences TYPGYINHSKA and LLPQPPKLLLP, thus confirming that the peptide moiety of the selected phages induced the antibodies specific for the toxins. The antibodies induced in mice to the PGxI/LN and CxPDDSSxC mimotopes inhibited fodrin proteolysis and macrophage chemotaxis biological activities of Pet. Our results showed that we were able to generate, by a phage display procedure, mimotopes with sequence motifs PGxI/LN and CxPDDSSxC, and to identify them as functional motifs of the Pet, Pic and other SPATEs involved in their biological activities.

  16. Analysis of simple sequence repeats in rice bean(Vigna umbellata) using an SSR-enriched library

    Institute of Scientific and Technical Information of China (English)

    Lixia Wang; Kyung Do Kim; Dongying Gao; Honglin Chen; Suhua Wang; Suk Ha Lee; Scott A. Jackson; Xuzhen Cheng

    2016-01-01

    Rice bean(Vigna umbellata Thunb.), a warm-season annual legume, is grown in Asia mainly for dried grain or fodder and plays an important role in human and animal nutrition because the grains are rich in protein and some essential fatty acids and minerals. With the aim of expediting the genetic improvement of rice bean, we initiated a project to develop genomic resources and tools for molecular breeding in this little-known but important crop.Here we report the construction of an SSR-enriched genomic library from DNA extracted from pooled young leaf tissues of 22 rice bean genotypes and developing SSR markers.In 433,562 reads generated by a Roche 454 GS-FLX sequencer, we identified 261,458 SSRs, of which 48.8% were of compound form. Dinucleotide repeats were predominant with an absolute proportion of 81.6%, followed by trinucleotides(17.8%). Other types together accounted for 0.6%. The motif AC/GT accounted for 77.7% of the total, followed by AAG/CTT(14.3%), and all others accounted for 12.0%. Among the flanking sequences, 2928 matched putative genes or gene models in the protein database of Arabidopsis thaliana, corresponding with 608 non-redundant Gene Ontology terms. Of these sequences, 11.2% were involved in cellular components, 24.2% were involved molecular functions, and 64.6% were associated with biological processes. Based on homolog analysis, 1595 flanking sequences were similar to mung bean and 500 to common bean genomic sequences. Comparative mapping was conducted using 350 sequences homologous to both mung bean and common bean sequences. Finally, a set of primer pairs were designed, and a validation test showed that58 of 220 new primers can be used in rice bean and 53 can be transferred to mung bean.However, only 11 were polymorphic when tested on 32 rice bean varieties. We propose that this study lays the groundwork for developing novel SSR markers and will enhance the mapping of qualitative and quantitative traits and marker-assisted selection in rice

  17. An isoform of Taiman that contains a PRD-repeat motif is indispensable for transducing the vitellogenic juvenile hormone signal in Locusta migratoria.

    Science.gov (United States)

    Wang, Zhiming; Yang, Libin; Song, Jiasheng; Kang, Le; Zhou, Shutang

    2017-03-01

    Taiman (Tai) has been recently identified as the dimerizing partner of juvenile hormone (JH) receptor, Methoprene-tolerant (Met). However, the role of Tai isoforms in transducing vitellogenic signal of JH has not been determined. In this study, we show that the migratory locust Locusta migratoria has two Tai isoforms, which differ in an INDEL-1 domain with the PRD-repeat motif rich in histidine and proline at the C-terminus. Tai-A with the INDEL-1 is expressed at levels about 50-fold higher than Tai-B without the INDEL-1 in the fat body of vitellogenic adult females. Knockdown of Tai-A but not Tai-B results in a substantial reduction of vitellogenin expression in the fat body accompanied by the arrest of ovarian development and oocyte maturation, similar to that caused by depletion of both Tai isoforms. Either Tai-A or Tai-B combined with Met can induce target gene transcription in response to JH, but Tai-A appears to mediate a significantly higher transactivation. Our data suggest that the INDEL-1 domain plays a critical role in Tai function during reproduction as Tai-A appears be more active than Tai-B in transducing the vitellogenic JH signal in L. migratoria.

  18. Multilocus sequence evaluation for differentiating species of the trematode Family Gastrothylacidae, with a note on the utility of mitochondrial COI motifs in species identification.

    Science.gov (United States)

    Ghatani, Sudeep; Shylla, Jollin Andrea; Roy, Bishnupada; Tandon, Veena

    2014-09-15

    Amphistomiasis, a neglected trematode infectious disease of ruminants, is caused by numerous species of amphistomes belonging to six families under the Superfamily Paramphistomoidea. In the present study, four frequently used DNA markers, viz. nuclear ribosomal 28S (D1-D3 regions), 18S and ITS2 and mitochondrial COI genes, as well as sequence motifs from these genes were evaluated for their utility in species characterization of members of the amphistomes' Family Gastrothylacidae commonly prevailing in Northeast India. In sequence and phylogenetic analyses the COI gene turned out to be the most useful marker in identifying the gastrothylacid species, with the exception of Gastrothylax crumenifer, which showed a high degree of intraspecific variations among its isolates. The sequence analysis data also showed the ITS2 region to be effective for interspecies characterization, though the 28S and 18S genes were found unsuitable for the purpose. On the other hand, sequence motif analysis data revealed the motifs from the COI gene to be highly conserved and specific for their target species which allowed accurate in silico identification of the gastrothylacid species irrespective of their intraspecific differences. We propose the use of COI motifs generated in the study as a potential tool for identification of these species.

  19. Chromosomal localization of a tandemly repeated DNA sequence in Trifilium repens L.

    Institute of Scientific and Technical Information of China (English)

    ZHUJM; NWELLISON; 等

    1996-01-01

    A karyotype of Trifolium repens constructed from mitotic cells revealed 13 pairs of metacentric and 3 pairs of submetacentric chromosomes including a pair of satellites located at the end of the short arm of chromosome 16.C-bands were identified around the centromeric regions of 8 pairs of chromosomes.A 350 bp tandemly repeated DNAsequence from T.repens labelled with digoxygenin hybridized to the proximal centromeric regions of 12 chromosome pairs.Some correlation between the distribution of the repeat sequence and the distribution of C-banding was demonstrated.

  20. Application of inter simple sequence repeat (ISSR) markers to plant genetics.

    Science.gov (United States)

    Godwin, I D; Aitken, E A; Smith, L W

    1997-08-01

    Microsatellites or simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Single-locus SSR markers have been developed for a number of species, although there is a major bottleneck in developing SSR markers whereby flanking sequences must be known to design 5'-anchors for polymerase chain reaction (PCR) primers. Inter SSR (ISSR) fingerprinting was developed such that no sequence knowledge was required. Primers based on a repeat sequence, such as (CA)n, can be made with a degenerate 3'-anchor, such as (CA)8RG or (AGC)6TY. The resultant PCR reaction amplifies the sequence between two SSRs, yielding a multilocus marker system useful for fingerprinting, diversity analysis and genome mapping. PCR products are radiolabelled with 32P or 33P via end-labelling or PCR incorporation, and separated on a polyacrylamide sequencing gel prior to autoradiographic visualisation. A typical reaction yields 20-100 bands per lane depending on the species and primer. We have used ISSR fingerprinting in a number of plant species, and report here some results on two important tropical species, sorghum and banana. Previous investigators have demonstrated that ISSR analysis usually detects a higher level of polymorphism than that detected with restriction fragment length polymorphism (RFLP) or random amplified polymorphic DNA (RAPD) analyses. Our data indicate that this is not a result of greater polymorphism genetically, but rather technical reasons related to the detection methodology used for ISSR analysis.

  1. Modification of cyclic NGR tumor neovasculature-homing motif sequence to human plasminogen kringle 5 improves inhibition of tumor growth.

    Directory of Open Access Journals (Sweden)

    Weiwei Jiang

    Full Text Available BACKGROUND: Blood vessels in tumors express higher level of aminopeptidase N (APN than normal tissues. Evidence suggests that the CNGRC motif is an APN ligand which targets tumor vasculature. Increased expression of APN in tumor vascular endothelium, therefore, offers an opportunity for targeted delivery of NGR peptide-linked drugs to tumors. METHODS/PRINCIPAL FINDINGS: To determine whether an additional cyclic CNGRC sequence could improve endothelial cell homing and antitumor effect, human plasminogen kringle 5 (hPK5 was modified genetically to introduce a CNGRC motif (NGR-hPK5 and was subsequently expressed in yeast. The biological activity of NGR-hPK5 was assessed and compared with that of wild-type hPK5, in vitro and in vivo. NGR-hPK5 showed more potent antiangiogenic activity than wild-type hPK5: the former had a stronger inhibitory effect on proliferation, migration and cord formation of vascular endothelial cells, and produced a stronger antiangiogenic response in the CAM assay. To evaluate the tumor-targeting ability, both wild-type hPK5 and NGR-hPK5 were (99 mTc-labeled, for tracking biodistribution in the in vivo tumor model. By planar imaging and biodistribution analyses of major organs, NGR-hPK5 was found localized to tumor tissues at a higher level than wild-type hPK5 (approximately 3-fold. Finally, the effects of wild-type hPK5 and NGR-modified hPK5 on tumor growth were investigated in two tumor model systems. NGR modification improved tumor localization and, as a consequence, effectively inhibited the growth of mouse Lewis lung carcinoma (LLC and human colorectal adenocarcinoma (Colo 205 cells in tumor-bearing mice. CONCLUSIONS/SIGNIFICANCE: These studies indicated that the addition of an APN targeting peptide NGR sequence could improve the ability of hPK5 to inhibit angiogenesis and tumor growth.

  2. Automated discovery of single nucleotide polymorphism and simple sequence repeat molecular genetic markers.

    Science.gov (United States)

    Batley, Jacqueline; Jewell, Erica; Edwards, David

    2007-01-01

    Molecular genetic markers represent one of the most powerful tools for the analysis of genomes. Molecular marker technology has developed rapidly over the last decade, and two forms of sequence-based markers, simple sequence repeats (SSRs), also known as microsatellites, and single nucleotide polymorphisms (SNPs), now predominate applications in modern genetic analysis. The availability of large sequence data sets permits mining for SSRs and SNPs, which may then be applied to genetic trait mapping and marker-assisted selection. Here, we describe Web-based automated methods for the discovery of these SSRs and SNPs from sequence data. SSRPrimer enables the real-time discovery of SSRs within submitted DNA sequences, with the concomitant design of PCR primers for SSR amplification. Alternatively, users may browse the SSR Taxonomy Tree to identify predetermined SSR amplification primers for any species represented within the GenBank database. SNPServer uses a redundancy-based approach to identify SNPs within DNA sequence data. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences, and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms.

  3. Flow Cytometry-assisted Cloning of Specific Sequence Motifs fromComplex 16S ribosomal RNA Gene Libraries.

    Energy Technology Data Exchange (ETDEWEB)

    Nielsen, J.L.; Schramm, A.; Bernhard, A.E.; van den Engh, G.J.; Stahl, D.A.

    2004-07-21

    A flow cytometry method was developed for rapid screeningand recovery of cloned DNA containing common sequence motifs. Thisapproach, termed fluorescence-activated cell sorting-assisted cloning,was used to recover sequences affiliated with a unique lineage within theBacteroidetes not abundant in a clone library of environmental 16S rRNAgenes. Retrieval and sequence analysis of phylogenetically informativegenes has become a standard cultivation-independent technique toinvestigate microbial diversity in nature (7, 18). Genes encoding the 16SrRNA, because of the relative ease of their selective amplification, havebeen most frequently employed for general diversity surveys (16).Environmental studies have also focused on specific subpopulationsaffiliated with a phylogenetic group or identified by genes encodingspecific metabolic functions (e.g., ammonia oxidation, sulfaterespiration, and nitrate reduction) (8,15,20). However, specificpopulations may be of low abundance (1,23), or the genes encodingspecific metabolic functions may be insufficiently conserved to providepriming sites for general PCR amplification. Three general approacheshave been used to obtain 16S rRNA sequence information from low-abundancepopulations: screening hundreds to thousands of clones in a general 16SrRNA gene library (21), flow cytometric sorting of a subpopulation ofenvironmentally derived cells labeled by fluorescent in situhybridization (FISH) (27), or selective PCR amplification using primersspecific for the subpopulation (2,23). While the first approach is simplytime-consuming and tedious, the second has been restricted to fairlylarge and strongly fluorescent cells from aquatic samples (5, 27). Thethird approach often generates fragments of only a few hundred bases dueto the limited number of specific priming sites. Partial sequenceinformation often degrades analysis, obscuring or distorting thephylogenetic placement of the new sequences (11, 20). A more robustcharacterization of environ

  4. The oligodeoxynucleotide sequences corresponding to never-expressed peptide motifs are mainly located in the non-coding strand

    Directory of Open Access Journals (Sweden)

    Bickis Mik

    2010-07-01

    Full Text Available Abstract Background We study the usage of specific peptide platforms in protein composition. Using the pentapeptide as a unit of length, we find that in the universal proteome many pentapeptides are heavily repeated (even thousands of times, whereas some are quite rare, and a small number do not appear at all. To understand the physico-chemical-biological basis underlying peptide usage at the proteomic level, in this study we analyse the energetic costs for the synthesis of rare and never-expressed versus frequent pentapeptides. In addition, we explore residue bulkiness, hydrophobicity, and codon number as factors able to modulate specific peptide frequencies. Then, the possible influence of amino acid composition is investigated in zero- and high-frequency pentapeptide sets by analysing the frequencies of the corresponding inverse-sequence pentapeptides. As a final step, we analyse the pentadecamer oligodeoxynucleotide sequences corresponding to the never-expressed pentapeptides. Results We find that only DNA context-dependent constraints (such as oligodeoxynucleotide sequence location in the minus strand, introns, pseudogenes, frameshifts, etc. provide a coherent mechanistic platform to explain the occurrence of never-expressed versus frequent pentapeptides in the protein world. Conclusions This study is of importance in cell biology. Indeed, the rarity (or lack of expression of specific 5-mer peptide modules implies the rarity (or lack of expression of the corresponding n-mer peptide sequences (with n

  5. Genotyping of simple sequence repeats--factors implicated in shadow band generation revisited.

    Science.gov (United States)

    Olejniczak, Marta; Krzyzosiak, Wlodzimierz J

    2006-10-01

    PCR amplification of microsatellite sequences generates, besides the main product corresponding to allele size, also additional, undesired products usually shorter by multiples of the repeated unit. These extra products known as shadow bands or stutter products may complicate genotyping. The mechanism by which these artifacts are formed is not well understood and so no effective remedy has been found to cope with these spurious products. In this study, using the DNA templates containing the CAG/CTG repeats flanked by gene-specific sequences and universal priming sites, we analyzed the effects of many PCR variables on the shadow band generation. The most important result was that at the decreased temperature of the denaturation step during PCR cycling the shadow bands were either not formed or were strongly suppressed. Several possible sources of this effect are discussed.

  6. In silico analysis of Simple Sequence Repeats from chloroplast genomes of Solanaceae species

    Directory of Open Access Journals (Sweden)

    Evandro Vagner Tambarussi

    2009-01-01

    Full Text Available The availability of chloroplast genome (cpDNA sequences of Atropa belladonna, Nicotiana sylvestris, N.tabacum, N. tomentosiformis, Solanum bulbocastanum, S. lycopersicum and S. tuberosum, which are Solanaceae species,allowed us to analyze the organization of cpSSRs in their genic and intergenic regions. In general, the number of cpSSRs incpDNA ranged from 161 in S. tuberosum to 226 in N. tabacum, and the number of intergenic cpSSRs was higher than geniccpSSRs. The mononucleotide repeats were the most frequent in studied species, but we also identified di-, tri-, tetra-, pentaandhexanucleotide repeats. Multiple alignments of all cpSSRs sequences from Solanaceae species made the identification ofnucleotide variability possible and the phylogeny was estimated by maximum parsimony. Our study showed that the plastomedatabase can be exploited for phylogenetic analysis and biotechnological approaches.

  7. Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species

    Science.gov (United States)

    Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha

    2011-01-01

    Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309

  8. Comparison of highly repeated DNA sequences in some Lemuridae and taxonomic implications.

    Science.gov (United States)

    Montagnon, D; Crovella, S; Rumpler, Y

    1993-01-01

    Highly repeated DNA sequences of Eulemur fulvus mayottensis, E. coronatus, Lemur catta, and Hapalemur griseus griseus have been identified and compared. Sequence analysis of highly repeated DNA fragments isolated from L. catta and Hapalemur showed a high percentage of similarity (nearly 95%), as did fragments isolated from the two very close Eulemur species, whereas comparison of the DNA fragments isolated from the two Eulemur species and the L. catta/Hapalemur group showed a very low percentage (approximately 40%) of identity, as might be expected for distant species. These results confirm our previous data, obtained by Southern blot hybridization techniques on the same species, and strongly support the existence of a common trunk between L. catta and Hapalemur, but different from the leading to the Eulemur species.

  9. Cytogenetic analysis of Populus trichocarpa--ribosomal DNA, telomere repeat sequence, and marker-selected BACs.

    Science.gov (United States)

    Islam-Faridi, M N; Nelson, C D; DiFazio, S P; Gunter, L E; Tuskan, G A

    2009-01-01

    The 18S-28S rDNA and 5S rDNA loci in Populus trichocarpa were localized using fluorescent in situ hybridization (FISH). Two 18S-28S rDNA sites and one 5S rDNA site were identified and located at the ends of 3 different chromosomes. FISH signals from the Arabidopsis-type telomere repeat sequence were observed at the distal ends of each chromosome. Six BAC clones selected from 2 linkage groups based on genome sequence assembly (LG-I and LG-VI) were localized on 2 chromosomes, as expected. BACs from LG-I hybridized to the longest chromosome in the complement. All BAC positions were found to be concordant with sequence assembly positions. BAC-FISH will be useful for delineating each of the Populus trichocarpa chromosomes and improving the sequence assembly of this model angiosperm tree species.

  10. Cytogenetic Analysis of Populus trichocarpa - Ribosomal DNA, Telomere Repeat Sequence, and Marker-selected BACs

    Energy Technology Data Exchange (ETDEWEB)

    Tuskan, Gerald A [ORNL; Gunter, Lee E [ORNL; DiFazio, Stephen P [West Virginia University

    2009-01-01

    The 18S-28S rDNA and 5S rDNA loci in Populus trichocarpa were localized using fluorescent in situ hybridization (FISH). Two 18S-28S rDNA sites and one 5S rDNA site were identified and located at the ends of 3 different chromosomes. FISH signals from the Arabidopsis -type telomere repeat sequence were observed at the distal ends of each chromosome. Six BAC clones selected from 2 linkage groups based on genome sequence assembly (LG-I and LG-VI) were localized on 2 chromosomes, as expected. BACs from LG-I hybridized to the longest chromosome in the complement. All BAC positions were found to be concordant with sequence assembly positions. BAC-FISH will be useful for delineating each of the Populus trichocarpa chromosomes and improving the sequence assembly of this model angiosperm tree species.

  11. Localization and trafficking of an isoform of the AtPRA1 family to the Golgi apparatus depend on both N- and C-terminal sequence motifs.

    Science.gov (United States)

    Jung, Chan Jin; Lee, Myoung Hui; Min, Myung Ki; Hwang, Inhwan

    2011-02-01

    Prenylated Rab acceptors (PRAs) bind to prenylated Rab proteins and possibly aid in targeting Rabs to their respective compartments. In Arabidopsis, 19 isoforms of PRA1 have been identified and, depending upon the isoforms, they localize to the endoplasmic reticulum (ER), Golgi apparatus and endosomes. Here, we investigated the localization and trafficking of AtPRA1.B6, an isoform of the Arabidopsis PRA1 family. In colocalization experiments with various organellar markers, AtPRA1.B6 tagged with hemagglutinin (HA) at the N-terminus localized to the Golgi apparatus in protoplasts and transgenic plants. The valine residue at the C-terminal end and an EEE motif in the C-terminal cytoplasmic domain were critical for anterograde trafficking from the ER to the Golgi apparatus. The N-terminal region contained a sequence motif for retention of AtPRA1.B6 at the Golgi apparatus. In addition, anterograde trafficking of AtPRA1.B6 from the ER to the Golgi apparatus was highly sensitive to the HA:AtPRA1.B6 level. The region that contains the sequence motif for Golgi retention also conferred the abundance-dependent trafficking inhibition. On the basis of these results, we propose that AtPRA1.B6 localizes to the Golgi apparatus and its ER-to-Golgi trafficking and localization to the Golgi apparatus are regulated by multiple sequence motifs in both the C- and N-terminal cytoplasmic domains.

  12. Genetic Diversity Assessment and Identification of New Sour Cherry Genotypes Using Intersimple Sequence Repeat Markers

    OpenAIRE

    Roghayeh Najafzadeh; Kazem Arzani; Naser Bouzari; Ali Saei

    2014-01-01

    Iran is one of the chief origins of subgenus Cerasus germplasm. In this study, the genetic variation of new Iranian sour cherries (which had such superior growth characteristics and fruit quality as to be considered for the introduction of new cultivars) was investigated and identified using 23 intersimple sequence repeat (ISSR) markers. Results indicated a high level of polymorphism of the genotypes based on these markers. According to these results, primers tested in this study specially IS...

  13. Inter simple sequence repeat fingerprints for assess genetic diversity of tunisian garlic populations

    OpenAIRE

    Jabbes, Naouel; Geoffriau, Emmanuel; Le Clerc, Valérie; Dridi, Boutheina; Hannechi, Chérif

    2011-01-01

    Garlic (Allium sativum L.) that is cultivated in Tunisia is heterogeneous and unclassified with no registered local cultivars. At present, the level of genetic diversity in Tunisian garlic is almost unknown. Inter Simple Sequence Repeats (ISSR) genetic markers were therefore used to assess the genetic diversity and its distribution in 31 Tunisian garlic accessions with 4 French classified clones used as control. It was the first time that ISSR markers were used to detect diversity in garlic. ...

  14. A tandem sequence motif acts as a distance-dependent enhancer in a set of genes involved in translation by binding the proteins NonO and SFPQ

    Directory of Open Access Journals (Sweden)

    Roepcke Stefan

    2011-12-01

    Full Text Available Abstract Background Bioinformatic analyses of expression control sequences in promoters of co-expressed or functionally related genes enable the discovery of common regulatory sequence motifs that might be involved in co-ordinated gene expression. By studying promoter sequences of the human ribosomal protein genes we recently identified a novel highly specific Localized Tandem Sequence Motif (LTSM. In this work we sought to identify additional genes and LTSM-binding proteins to elucidate potential regulatory mechanisms. Results Genome-wide analyses allowed finding a considerable number of additional LTSM-positive genes, the products of which are involved in translation, among them, translation initiation and elongation factors, and 5S rRNA. Electromobility shift assays then showed specific signals demonstrating the binding of protein complexes to LTSM in ribosomal protein gene promoters. Pull-down assays with LTSM-containing oligonucleotides and subsequent mass spectrometric analysis identified the related multifunctional nucleotide binding proteins NonO and SFPQ in the binding complex. Functional characterization then revealed that LTSM enhances the transcriptional activity of the promoters in dependency of the distance from the transcription start site. Conclusions Our data demonstrate the power of bioinformatic analyses for the identification of biologically relevant sequence motifs. LTSM and the here found LTSM-binding proteins NonO and SFPQ were discovered through a synergistic combination of bioinformatic and biochemical methods and are regulators of the expression of a set of genes of the translational apparatus in a distance-dependent manner.

  15. Dendrimeric template of Plasmodium falciparum histidine rich protein II repeat motifs bearing Asp→Asn mutation exhibits heme binding and β-hematin formation.

    Directory of Open Access Journals (Sweden)

    Pinky Kumari

    Full Text Available Plasmodium falciparum (Pf employs a crucial PfHRPII catalyzed reaction that converts toxic heme into hemozoin. Understanding heme polymerization mechanism is the first step for rational design of new drugs, targeting this pathway. Heme binding and hemozoin formation have been ascribed to PfHRPII aspartate carboxylate-heme metal ionic interactions. To investigate, if this ionic interaction is indeed pivotal, we examined the comparative heme binding and β-hematin forming abilities of a wild type dendrimeric peptide BNT1 {harboring the native sequence motif of PfHRPII (AHHAHHAADA} versus a mutant dendrimeric peptide BNTM {in which ionic Aspartate residues have been replaced by the neutral Asparaginyl residues (AHHAHHAANA}. UV and IR data reported here reveal that at pH 5, both BNT1 and BNTM exhibit comparable heme binding as well as β-hematin forming abilities, thus questioning the role of PfHRPII aspartate carboxylate-heme metal ionic interactions in heme binding and β-hematin formation. Based on our data and information in the literature we suggest the possible role of weak dispersive interactions like N-H···π and lone-pair···π in heme binding and hemozoin formation.

  16. Repeated-Sprint Sequences During Female Soccer Matches Using Fixed and Individual Speed Thresholds.

    Science.gov (United States)

    Nakamura, Fábio Y; Pereira, Lucas A; Loturco, Irineu; Rosseti, Marcelo; Moura, Felipe A; Bradley, Paul S

    2017-07-01

    Nakamura, FY, Pereira, LA, Loturco, I, Rosseti, M, Moura, FA, and Bradley, PS. Repeated-sprint sequences during female soccer matches using fixed and individual speed thresholds. J Strength Cond Res 31(7): 1802-1810, 2017-The main objective of this study was to characterize the occurrence of single sprint and repeated-sprint sequences (RSS) during elite female soccer matches, using fixed (20 km·h) and individually based speed thresholds (>90% of the mean speed from a 20-m sprint test). Eleven elite female soccer players from the same team participated in the study. All players performed a 20-m linear sprint test, and were assessed in up to 10 official matches using Global Positioning System technology. Magnitude-based inferences were used to test for meaningful differences. Results revealed that irrespective of adopting fixed or individual speed thresholds, female players produced only a few RSS during matches (2.3 ± 2.4 sequences using the fixed threshold and 3.3 ± 3.0 sequences using the individually based threshold), with most sequences composing of just 2 sprints. Additionally, central defenders performed fewer sprints (10.2 ± 4.1) than other positions (fullbacks: 28.1 ± 5.5; midfielders: 21.9 ± 10.5; forwards: 31.9 ± 11.1; with the differences being likely to almost certainly associated with effect sizes ranging from 1.65 to 2.72), and sprinting ability declined in the second half. The data do not support the notion that RSS occurs frequently during soccer matches in female players, irrespective of using fixed or individual speed thresholds to define sprint occurrence. However, repeated-sprint ability development cannot be ruled out from soccer training programs because of its association with match-related performance.

  17. Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains : a web-based resource

    Directory of Open Access Journals (Sweden)

    Vergnaud Gilles

    2004-01-01

    Full Text Available Abstract Background Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison. Results In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors. Conclusions We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial

  18. Long CAG repeat sequence and protein expression of androgen receptor considered as prognostic indicators in male breast carcinoma.

    Directory of Open Access Journals (Sweden)

    Yan-Ni Song

    Full Text Available BACKGROUND: The androgen receptor (AR expression and the CAG repeat length within the AR gene appear to be involved in the carcinogenesis of male breast carcinoma (MBC. Although phenotypic differences have been observed between MBC and normal control group in AR gene, there is lack of correlation analysis between AR expression and CAG repeat length in MBC. The purpose of the study was to investigate the prognostic value of CAG repeat lengths and AR protein expression. METHODS: 81 tumor tissues were used for immunostaining for AR expression and CAG repeat length determination and 80 normal controls were analyzed with CAG repeat length in AR gene. The CAG repeat length and AR expression were analyzed in relation to clinicopathological factors and prognostic indicators. RESULTS: AR gene in many MBCs has long CAG repeat sequence compared with that in control group (P = 0.001 and controls are more likely to exhibit short CAG repeat sequence than MBCs. There was statistically significant difference in long CAG repeat sequence between AR status for MBC patients (P = 0.004. The presence of long CAG repeat sequence and AR-positive expression were associated with shorter survival of MBC patients (CAG repeat: P = 0.050 for 5y-OS; P = 0.035 for 5y-DFS AR status: P = 0.048 for 5y-OS; P = 0.029 for 5y-DFS, respectively. CONCLUSION: The CAG repeat length within the AR gene might be one useful molecular biomarker to identify males at increased risk of breast cancer development. The presence of long CAG repeat sequence and AR protein expression were in relation to survival of MBC patients. The CAG repeat length and AR expression were two independent prognostic indicators in MBC patients.

  19. Analysis of the genome sequence of the pathogenic Muscovy duck parvovirus strain YY reveals a 14-nucleotide-pair deletion in the inverted terminal repeats.

    Science.gov (United States)

    Wang, Jianye; Huang, Yu; Zhou, Mingxu; Zhu, Guoqiang

    2016-09-01

    Genomic information about Muscovy duck parvovirus is still limited. In this study, the genome of the pathogenic MDPV strain YY was sequenced. The full-length genome of YY is 5075 nucleotides (nt) long, 57 nt shorter than that of strain FM. Sequence alignment indicates that the 5' and 3' inverted terminal repeats (ITR) of strain YY contain a 14-nucleotide-pair deletion in the stem of the palindromic hairpin structure in comparison to strain FM and FZ91-30. The deleted region contains one "E-box" site and one repeated motif with the sequence "TTCCGGT" or "ACCGGAA". Phylogenetic trees constructed based the protein coding genes concordantly showed that YY, together with nine other MDPV isolates from various places, clustered in a separate branch, distinct from the branch formed by goose parvovirus (GPV) strains. These results demonstrate that, despite the distinctive deletion, the YY strain still belongs to the classical MDPV group. Moreover, the deletion of ITR may contribute to the genome evolution of MDPV under immunization pressure.

  20. Diversity Analysis in Cannabis sativa Based on Large-Scale Development of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers

    OpenAIRE

    Chunsheng Gao; Pengfei Xin; Chaohua Cheng; Qing Tang; Ping Chen; Changbiao Wang; Gonggu Zang; Lining Zhao

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SS...

  1. Effective DNA fragmentation technique for simple sequence repeat detection with a microsatellite-enriched library and high-throughput sequencing.

    Science.gov (United States)

    Tanaka, Keisuke; Ohtake, Rumi; Yoshida, Saki; Shinohara, Takashi

    2017-04-01

    Two different techniques for genomic DNA fragmentation before microsatellite-enriched library construction-restriction enzyme (NlaIII and MseI) digestion and sonication-were compared to examine their effects on simple sequence repeat (SSR) detection using high-throughput sequencing. Tens of thousands of SSR regions from 5 species of the plant family Myrtaceae were detected when the output of individual samples was >1 million paired-end reads. Comparison of the two DNA fragmentation techniques showed that restriction enzyme digestion was superior to sonication for identification of heterozygous genotypes, whereas sonication was superior for detection of various SSR flanking regions with both species-specific and common characteristics. Therefore, choosing the most suitable DNA fragmentation method depends on the type of analysis that is planned.

  2. Simple Sequence Repeat Polymorphisms (SSRPs for Evaluation of Molecular Diversity and Germplasm Classification of Minor Crops

    Directory of Open Access Journals (Sweden)

    Nam-Soo Kim

    2009-11-01

    Full Text Available Evaluation of the genetic diversity among populations is an essential prerequisite for the preservation of endangered species. Thousands of new accessions are introduced into germplasm institutes each year, thereby necessitating assessment of their molecular diversity before elimination of the redundant genotypes. Of the protocols that facilitate the assessment of molecular diversity, SSRPs (simple sequence repeat polymorphisms or microsatellite variation is the preferred system since it detects a large number of DNA polymorphisms with relatively simple technical complexity. The paucity of information on DNA sequences has limited their widespread utilization in the assessment of genetic diversity of minor or neglected crop species. However, recent advancements in DNA sequencing and PCR technologies in conjunction with sophisticated computer software have facilitated the development of SSRP markers in minor crops. This review examines the development and molecular nature of SSR markers, and their utilization in many aspects of plant genetics and ecology.

  3. Genomic and polyploid evolution in genus Avena as revealed by RFLPs of repeated DNA sequences.

    Science.gov (United States)

    Morikawa, Toshinobu; Nishihara, Miho

    2009-06-01

    Phylogenetic relationships and genome affinities were investigated by utilizing all the biological Avena species consisting of 11 diploid species (15 accessions), 8 tetraploid species (9 accessions) and 4 hexaploid species (5 accessions). Genomic DNA regions of As120a, avenin, and globulin were amplified by PCR. A total of 130 polymorphic fragments were detected out of 156 fragments generated by digesting the PCR-amplified fragments with 11 restriction enzymes. The number of fragments generated by PCR-amplification followed by digestion with restriction enzymes was almost the same as those among the three repeated DNA sequences. A high level of genetic distance was detected between A. damascena (Ad) and A. canariensis (Ac) genomes, which reflected their different morphology and reproductive isolation. The A. longiglumis (Al) and A. prostrata (Ap) genomes were closely related to the As genome group. The AB genome species formed a cluster with the AsAs genome artificial autotetraploid and the As genome diploids indicating near-autotetraploid origin. The A. macrostachya is an outbreeding autotetraploid closely related with the C genome diploid and the AC genome tetraploid species. The differences of genetic distances estimated from the repeated DNA sequence divergence among the Avena species were consistent with genome divergences and it was possible to compare the genetic intra- and inter-ploidy relationships produced by RFLPs. These results suggested that the PCR-mediated analysis of repeated DNA polymorphism can be used as a tool to examine genomic relationships of polyploidy species.

  4. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  5. Sequence and structural analysis of the Asp-box motif and Asp-box beta-propellers; a widespread propeller-type characteristic of the Vps10 domain family and several glycoside hydrolase families

    Directory of Open Access Journals (Sweden)

    Quistgaard Esben M

    2009-07-01

    Full Text Available Abstract Background The Asp-box is a short sequence and structure motif that folds as a well-defined β-hairpin. It is present in different folds, but occurs most prominently as repeats in β-propellers. Asp-box β-propellers are known to be characteristically irregular and to occur in many medically important proteins, most of which are glycosidase enzymes, but they are otherwise not well characterized and are only rarely treated as a distinct β-propeller family. We have analyzed the sequence, structure, function and occurrence of the Asp-box and s-Asp-box -a related shorter variant, and provide a comprehensive classification and computational analysis of the Asp-box β-propeller family. Results We find that all conserved residues of the Asp-box support its structure, whereas the residues in variable positions are generally used for other purposes. The Asp-box clearly has a structural role in β-propellers and is highly unlikely to be involved in ligand binding. Sequence analysis of the Asp-box β-propeller family reveals it to be very widespread especially in bacteria and suggests a wide functional range. Disregarding the Asp-boxes, sequence conservation of the propeller blades is very low, but a distinct pattern of residues with specific properties have been identified. Interestingly, Asp-boxes are occasionally found very close to other propeller-associated repeats in extensive mixed-motif stretches, which strongly suggests the existence of a novel class of hybrid β-propellers. Structural analysis reveals that the top and bottom faces of Asp-box β-propellers have striking and consistently different loop properties; the bottom is structurally conserved whereas the top shows great structural variation. Interestingly, only the top face is used for functional purposes in known structures. A structural analysis of the 10-bladed β-propeller fold, which has so far only been observed in the Asp-box family, reveals that the inner strands of the

  6. Strategy To Characterize the Number and Type of Repeating EPIYA Phosphorylation Motifs in the Carboxyl Terminus of CagA Protein in Helicobacter pylori Clinical Isolates▿ †

    OpenAIRE

    Panayotopoulou, Effrosini G.; Sgouras, Dionyssios N.; Papadakos, Konstantinos; Kalliaropoulos, Antonios; Papatheodoridis, George; Mentis, Andreas F; Archimandritis, Athanasios J

    2006-01-01

    Cytotoxin-associated gene A (CagA) diversity with regard to EPIYA-A, -B, -C, or -D phosphorylation motifs may play an important role in Helicobacter pylori pathogenesis, and therefore determination of these motifs in H. pylori clinical isolates can become a useful prognostic tool. We propose a strategy for the accurate determination of CagA EPIYA motifs in clinical strains, based upon one-step PCR amplification using primers that flank the EPIYA coding region. We thus analyzed 135 H. pylori i...

  7. Nuclear Receptor HNF4α Binding Sequences are Widespread in Alu Repeats

    Directory of Open Access Journals (Sweden)

    Bolotin Eugene

    2011-11-01

    Full Text Available Abstract Background Alu repeats, which account for ~10% of the human genome, were originally considered to be junk DNA. Recent studies, however, suggest that they may contain transcription factor binding sites and hence possibly play a role in regulating gene expression. Results Here, we show that binding sites for a highly conserved member of the nuclear receptor superfamily of ligand-dependent transcription factors, hepatocyte nuclear factor 4alpha (HNF4α, NR2A1, are highly prevalent in Alu repeats. We employ high throughput protein binding microarrays (PBMs to show that HNF4α binds > 66 unique sequences in Alu repeats that are present in ~1.2 million locations in the human genome. We use chromatin immunoprecipitation (ChIP to demonstrate that HNF4α binds Alu elements in the promoters of target genes (ABCC3, APOA4, APOM, ATPIF1, CANX, FEMT1A, GSTM4, IL32, IP6K2, PRLR, PRODH2, SOCS2, TTR and luciferase assays to show that at least some of those Alu elements can modulate HNF4α-mediated transactivation in vivo (APOM, PRODH2, TTR, APOA4. HNF4α-Alu elements are enriched in promoters of genes involved in RNA processing and a sizeable fraction are in regions of accessible chromatin. Comparative genomics analysis suggests that there may have been a gain in HNF4α binding sites in Alu elements during evolution and that non Alu repeats, such as Tiggers, also contain HNF4α sites. Conclusions Our findings suggest that HNF4α, in addition to regulating gene expression via high affinity binding sites, may also modulate transcription via low affinity sites in Alu repeats.

  8. BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations.

    Science.gov (United States)

    Bahr, A; Thompson, J D; Thierry, J C; Poch, O

    2001-01-01

    BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, trans-membrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg. fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2 /.

  9. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  10. Inter-simple sequence repeat (ISSR) loci mapping in the genome of perennial ryegrass

    DEFF Research Database (Denmark)

    Pivorienė, O; Pašakinskienė, I; Brazauskas, G;

    2008-01-01

    The aim of this study was to identify and characterize new ISSR markers and their loci in the genome of perennial ryegrass. A subsample of the VrnA F2 mapping family of perennial ryegrass comprising 92 individuals was used to develop a linkage map including inter-simple sequence repeat markers...... demonstrated a 70% similarity to the Hordeum vulgare germin gene GerA. Inter-SSR mapping will provide useful information for gene targeting, quantitative trait loci mapping and marker-assisted selection in perennial ryegrass....

  11. Localization of a new highly repeated DNA sequence of Lemur cafta (Lemuridae, Strepsirhini).

    Science.gov (United States)

    Boniotto, Michele; Ventura, Mario; Cardone, Maria Francesca; Boaretto, Francesca; Archidiacono, Nicoletta; Rocchi, Mariano; Crovella, Sergio

    2002-10-01

    We have isolated and cloned an 800-bp highly repeated DNA (HRDNA) sequence from Lemur catta (LCA) and described its localization on LCA chromosomes. Lemur catta HRDNA sequences were localized by performing FISH experiments on standard and elongated metaphasic chromosomes using an LCA HRDNA probe (LCASAT). A complex hybridization pattern was detected. A strong pericentromeric hybridization signal was observed on most LCA chromosomes. Chromosomes 7 and 13 were lit in pericentromeric regions, as well as in the interspersed heterochromatin. Chromosomes 1, 3, 4, 17, 19, X, and microchromosomes (20, 25, 26, and 27) showed no signals in the pericentromeric region, but chromosomes 3 and 4 showed a positive hybridization in heterochromatic regions. The 800-bp L catta HRDNA was species specific. We performed FISH experiments with the LCASAT probe on Eulemur macaco macaco (EMA) and Eulemur fulvus fulvus (EFU) metaphases and no positive signal of hybridization was detected. These findings were also confirmed by Southern blot analysis and PCR.

  12. Rhoptry-associated protein (rap-1) genes in the sheep pathogen Babesia sp. Xinjiang: Multiple transcribed copies differing by 3' end repeated sequences.

    Science.gov (United States)

    Niu, Qingli; Marchand, Jordan; Yang, Congshan; Bonsergent, Claire; Guan, Guiquan; Yin, Hong; Malandrin, Laurence

    2015-07-30

    Sheep babesiosis occurs mainly in tropical and subtropical areas. The sheep parasite Babesia sp. Xinjiang is widespread in China, and our goal is to characterize rap-1 (rhoptry-associated protein 1) gene diversity and expression as a first step of a long term goal aiming at developing a recombinant subunit vaccine. Seven different rap-1a genes were amplified in Babesia sp. Xinjiang, using degenerate primers designed from conserved motifs. Rap-1b and rap-1c gene types could not be identified. In all seven rap-1a genes, the 5' regions exhibited identical sequences over 936 nt, and the 3' regions differed at 28 positions over 147 nt, defining two types of genes designated α and β. The remaining 3' part varied from 72 to 360 nt in length, depending on the gene. This region consists of a succession of two to ten 36 nt repeats, which explains the size differences. Even if the nucleotide sequences varied, 6 repeats encoded the same stretch of amino acids. Transcription of at least four α and two β genes was demonstrated by standard RT-PCR. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Behavior of Repeating Earthquake Sequences in Central California and the Implications for Subsurface Fault Creep

    Energy Technology Data Exchange (ETDEWEB)

    Templeton, D C; Nadeau, R; Burgmann, R

    2007-07-09

    Repeating earthquakes (REs) are sequences of events that have nearly identical waveforms and are interpreted to represent fault asperities driven to failure by loading from aseismic creep on the surrounding fault surface at depth. We investigate the occurrence of these REs along faults in central California to determine which faults exhibit creep and the spatio-temporal distribution of this creep. At the juncture of the San Andreas and southern Calaveras-Paicines faults, both faults as well as a smaller secondary fault, the Quien Sabe fault, are observed to produce REs over the observation period of March 1984-May 2005. REs in this area reflect a heterogeneous creep distribution along the fault plane with significant variations in time. Cumulative slip over the observation period at individual sequence locations is determined to range from 5.5-58.2 cm on the San Andreas fault, 4.8-14.1 cm on the southern Calaveras-Paicines fault, and 4.9-24.8 cm on the Quien Sabe fault. Creep at depth appears to mimic the behaviors seen of creep on the surface in that evidence of steady slip, triggered slip, and episodic slip phenomena are also observed in the RE sequences. For comparison, we investigate the occurrence of REs west of the San Andreas fault within the southern Coast Range. Events within these RE sequences only occurred minutes to weeks apart from each other and then did not repeat again over the observation period, suggesting that REs in this area are not produced by steady aseismic creep of the surrounding fault surface.

  14. Target genes of microsatellite sequences in head and neck squamous cell carcinoma: mononucleotide repeats are not detected.

    Science.gov (United States)

    Wang, Yimin; Liu, Xuejuan; Li, Yulin

    2012-09-10

    Microsatellite instability (MSI) is detected in a wide variety of tumors. It is thought that mismatch repair gene mutation or inactivation is the major cause of MSI. Microsatellite sequences are predominantly distributed in intergenic or intronic DNA. However, MSI is found in the exonic sequences of some genes, causing their inactivation. In this report, we searched GenBank for candidate genes containing potential MSI sequences in exonic regions. Twenty seven target genes were selected for MSI analysis. Instability was found in 70% of these genes (14/20) with head and neck squamous cell carcinoma (HNSCC). Interestingly, no instability was detected in mononucleotide repeats in genes or in intergenic sequences. We conclude that instability of mononucleotide repeats is a rare event in HNSCC. High MSI phenotype in young HNSCC patients is limited to noncoding regions only. MSI percentage in HNSCC tumor is closely related to the repeat type, repeat location and patient's age.

  15. The impact of CRISPR repeat sequence on structures of a Cas6 protein-RNA complex

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Ruiying; Zheng, Han; Preamplume, Gan; Shao, Yaming; Li, Hong [FSU

    2012-03-15

    The repeat-associated mysterious proteins (RAMPs) comprise the most abundant family of proteins involved in prokaryotic immunity against invading genetic elements conferred by the clustered regularly interspaced short palindromic repeat (CRISPR) system. Cas6 is one of the first characterized RAMP proteins and is a key enzyme required for CRISPR RNA maturation. Despite a strong structural homology with other RAMP proteins that bind hairpin RNA, Cas6 distinctly recognizes single-stranded RNA. Previous structural and biochemical studies show that Cas6 captures the 5' end while cleaving the 3' end of the CRISPR RNA. Here, we describe three structures and complementary biochemical analysis of a noncatalytic Cas6 homolog from Pyrococcus horikoshii bound to CRISPR repeat RNA of different sequences. Our study confirms the specificity of the Cas6 protein for single-stranded RNA and further reveals the importance of the bases at Positions 5-7 in Cas6-RNA interactions. Substitutions of these bases result in structural changes in the protein-RNA complex including its oligomerization state.

  16. Simple sequence repeats in Neurospora crassa: distribution, polymorphism and evolutionary inference

    Directory of Open Access Journals (Sweden)

    Park Jongsun

    2008-01-01

    Full Text Available Abstract Background Simple sequence repeats (SSRs have been successfully used for various genetic and evolutionary studies in eukaryotic systems. The eukaryotic model organism Neurospora crassa is an excellent system to study evolution and biological function of SSRs. Results We identified and characterized 2749 SSRs of 963 SSR types in the genome of N. crassa. The distribution of tri-nucleotide (nt SSRs, the most common SSRs in N. crassa, was significantly biased in exons. We further characterized the distribution of 19 abundant SSR types (AST, which account for 71% of total SSRs in the N. crassa genome, using a Poisson log-linear model. We also characterized the size variation of SSRs among natural accessions using Polymorphic Index Content (PIC and ANOVA analyses and found that there are genome-wide, chromosome-dependent and local-specific variations. Using polymorphic SSRs, we have built linkage maps from three line-cross populations. Conclusion Taking our computational, statistical and experimental data together, we conclude that 1 the distributions of the SSRs in the sequenced N. crassa genome differ systematically between chromosomes as well as between SSR types, 2 the size variation of tri-nt SSRs in exons might be an important mechanism in generating functional variation of proteins in N. crassa, 3 there are different levels of evolutionary forces in variation of amino acid repeats, and 4 SSRs are stable molecular markers for genetic studies in N. crassa.

  17. Simple sequence repeat marker development and genetic mapping in quinoa (Chenopodium quinoa Willd.)

    Indian Academy of Sciences (India)

    D. E. Jarvis; O. R. Kopp; E. N. Jellen; M. A. Mallory; J. Pattee; A. Bonifacio; C. E. Coleman; M. R. Stevens; D. J. Fairbanks; P. J. Maughan

    2008-04-01

    Quinoa is a regionally important grain crop in the Andean region of South America. Recently quinoa has gained international attention for its high nutritional value and tolerances of extreme abiotic stresses. DNA markers and linkage maps are important tools for germplasm conservation and crop improvement programmes. Here we report the development of 216 new polymorphic SSR (simple sequence repeats) markers from libraries enriched for GA, CAA and AAT repeats, as well as 6 SSR markers developed from bacterial artificial chromosome-end sequences (BES-SSRs). Heterozygosity (H) values of the SSR markers ranges from 0.12 to 0.90, with an average value of 0.57. A linkage map was constructed for a newly developed recombinant inbred lines (RIL) population using these SSR markers. Additional markers, including amplified fragment length polymorphisms (AFLPs), two 11S seed storage protein loci, and the nucleolar organizing region (NOR), were also placed on the linkage map. The linkage map presented here is the first SSR-based map in quinoa and contains 275 markers, including 200 SSR. The map consists of 38 linkage groups (LGs) covering 913 cM. Segregation distortion was observed in the mapping population for several marker loci, indicating possible chromosomal regions associated with selection or gametophytic lethality. As this map is based primarily on simple and easily-transferable SSR markers, it will be particularly valuable for research in laboratories in Andean regions of South America.

  18. Simple sequence repeats provide a substrate for phenotypic variation in the Neurospora crassa circadian clock.

    Directory of Open Access Journals (Sweden)

    Todd P Michael

    Full Text Available BACKGROUND: WHITE COLLAR-1 (WC-1 mediates interactions between the circadian clock and the environment by acting as both a core clock component and as a blue light photoreceptor in Neurospora crassa. Loss of the amino-terminal polyglutamine (NpolyQ domain in WC-1 results in an arrhythmic circadian clock; this data is consistent with this simple sequence repeat (SSR being essential for clock function. METHODOLOGY/PRINCIPAL FINDINGS: Since SSRs are often polymorphic in length across natural populations, we reasoned that investigating natural variation of the WC-1 NpolyQ may provide insight into its role in the circadian clock. We observed significant phenotypic variation in the period, phase and temperature compensation of circadian regulated asexual conidiation across 143 N. crassa accessions. In addition to the NpolyQ, we identified two other simple sequence repeats in WC-1. The sizes of all three WC-1 SSRs correlated with polymorphisms in other clock genes, latitude and circadian period length. Furthermore, in a cross between two N. crassa accessions, the WC-1 NpolyQ co-segregated with period length. CONCLUSIONS/SIGNIFICANCE: Natural variation of the WC-1 NpolyQ suggests a mechanism by which period length can be varied and selected for by the local environment that does not deleteriously affect WC-1 activity. Understanding natural variation in the N.crassa circadian clock will facilitate an understanding of how fungi exploit their environments.

  19. Simple sequence repeat marker development and genetic mapping in quinoa (Chenopodium quinoa Willd.).

    Science.gov (United States)

    Jarvis, D E; Kopp, O R; Jellen, E N; Mallory, M A; Pattee, J; Bonifacio, A; Coleman, C E; Stevens, M R; Fairbanks, D J; Maughan, P J

    2008-04-01

    Quinoa is a regionally important grain crop in the Andean region of South America. Recently quinoa has gained international attention for its high nutritional value and tolerances of extreme abiotic stresses. DNA markers and linkage maps are important tools for germplasm conservation and crop improvement programmes. Here we report the development of 216 new polymorphic SSR (simple sequence repeats) markers from libraries enriched for GA, CAA and AAT repeats, as well as 6 SSR markers developed from bacterial artificial chromosome-end sequences (BES-SSRs). Heterozygosity (H) values of the SSR markers ranges from 0.12 to 0.90, with an average value of 0.57. A linkage map was constructed for a newly developed recombinant inbred lines (RIL) population using these SSR markers. Additional markers, including amplified fragment length polymorphisms (AFLPs), two 11S seed storage protein loci, and the nucleolar organizing region (NOR), were also placed on the linkage map. The linkage map presented here is the first SSR-based map in quinoa and contains 275 markers, including 200 SSR. The map consists of 38 linkage groups (LGs) covering 913 cM. Segregation distortion was observed in the mapping population for several marker loci, indicating possible chromosomal regions associated with selection or gametophytic lethality. As this map is based primarily on simple and easily-transferable SSR markers, it will be particularly valuable for research in laboratories in Andean regions of South America.

  20. Molecular characterization of long terminal repeat sequences from Brazilian human immunodeficiency virus type 1 isolates.

    Science.gov (United States)

    Ferraro, Geraldo A; Monteiro-Cunha, Joana P; Fernandes, Flora M C; Mota-Miranda, Aline C A; Brites, Carlos; Alcantara, Luiz C J; Galvão-Castro, Bernardo; Morgado, Mariza G

    2013-05-01

    HIV-1 provirus activation is under control of the long terminal repeat (LTR)-5' viral promoter region, which presents remarkable genetic variation among HIV-1 subtypes. It is possible that molecular features of the LTR contribute to the unusual profile of the subtype C epidemic in the Brazilian Southern region. To characterize the LTR of Brazilian HIV isolates, we analyzed sequences from 21 infected individuals from Porto Alegre and Salvador cities. Sequences were compared with subtype B and C reference strains from different countries. Phylogenetic analysis showed that 17 (81%) samples were subtype B and four (19%) were subtype C. Common patterns of transcription factor binding sites (TFBS) in subtypes B and C sequences were confirmed and other potential TFBS specific for subtype C were found. Brazilian subtype C sequences contained an additional NF-κB biding site, as previously described for the majority of subtype C isolates. The high level of LTR polymorphisms identified in this study might be important for viral fitness.

  1. Detection of sequence variability of the collagen type IIalpha 1 3' variable number of tandem repeat.

    Science.gov (United States)

    van Meurs, J B; Arp, P P; Fang, Y; Slagboom, P E; Meulenbelt, I; van Leeuwen, J P; Pols, H A; Uitterlinden, A G

    2000-11-01

    The variable number of tandem repeat (VNTR) 3' of the collagen type II (COL2A1) gene has been shown to be highly variable with a complex molecular structure. In a previous pilot experiment we observed discordance between methods to genotype this informative marker. To further investigate the extent and molecular nature of this discordance, we genotyped a random sample of 207 Caucasian individuals with two genotyping methods and sequenced new alleles. We compared single-strand (SS) analysis, which is based on detection of size differences between the different alleles, and heteroduplex analysis (HA), which is sensitive to both size and sequence differences. Overall, 26% of discordance between the two methods was detected. Approximately two thirds of this discordance was caused by subdivision of SS-alleles 13R1 and 14R2 into HA-alleles 4A + 4B and 3B + 3C, respectively. Sequence analysis of the COL2A1 VNTR alleles 4B and 3C showed that these alleles differed in sequence, but not in size, from already described SS-alleles, which explains why they escape detection by SS. The 4B allele is a frequent allele in the population (14%) and is, therefore, important to distinguish in association studies. We conclude that HA is a reliable method when the described optimized electrophoretic conditions are used. HA is a sensitive genotyping method to document allelic diversity at this locus, which can distinguish more alleles compared to the SS method.

  2. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active...... sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally...... valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects...

  3. Sequence-structure-function relations of the mosquito leucine-rich repeat immune proteins

    Directory of Open Access Journals (Sweden)

    Povelones Michael

    2010-09-01

    Full Text Available Abstract Background The discovery and characterisation of factors governing innate immune responses in insects has driven the elucidation of many immune system components in mammals and other organisms. Focusing on the immune system responses of the malaria mosquito, Anopheles gambiae, has uncovered an array of components and mechanisms involved in defence against pathogen infections. Two of these immune factors are LRIM1 and APL1C, which are leucine-rich repeat (LRR containing proteins that activate complement-like defence responses against malaria parasites. In addition to their LRR domains, these leucine-rich repeat immune (LRIM proteins share several structural features including signal peptides, patterns of cysteine residues, and coiled-coil domains. Results The identification and characterisation of genes related to LRIM1 and APL1C revealed putatively novel innate immune factors and furthered the understanding of their likely molecular functions. Genomic scans using the shared features of LRIM1 and APL1C identified more than 20 LRIM-like genes exhibiting all or most of their sequence features in each of three disease-vector mosquitoes with sequenced genomes: An. gambiae, Aedes aegypti, and Culex quinquefasciatus. Comparative sequence analyses revealed that this family of mosquito LRIM-like genes is characterised by a variable number of 6 to 14 LRRs of different lengths. The "Long" LRIM subfamily, with 10 or more LRRs, and the "Short" LRIMs, with 6 or 7 LRRs, also share the signal peptide, cysteine residue patterning, and coiled-coil sequence features of LRIM1 and APL1C. The "TM" LRIMs have a predicted C-terminal transmembrane region, and the "Coil-less" LRIMs exhibit the characteristic LRIM sequence signatures but lack the C-terminal coiled-coil domains. Conclusions The evolutionary plasticity of the LRIM LRR domains may provide templates for diverse recognition properties, while their coiled-coil domains could be involved in the formation

  4. Network motifs provide signatures that characterize metabolism†

    OpenAIRE

    Shellman, Erin R.; Burant, Charles F.; Schnell, Santiago

    2013-01-01

    Motifs are repeating patterns that determine the local properties of networks. In this work, we characterized all 3-node motifs using enzyme commission numbers of the International Union of Biochemistry and Molecular Biology to show that motif abundance is related to biochemical function. Further, we present a comparative analysis of motif distributions in the metabolic networks of 21 species across six kingdoms of life. We found the distribution of motif abundances to be similar between spec...

  5. Survey and analysis of simple sequence repeats (SSRs) in three genomes of Candida species.

    Science.gov (United States)

    Jia, Dongmei

    2016-06-15

    Simple sequence repeats (SSRs) or microsatellites, which composed of tandem repeated short units of 1-6 bp, have been paying attention continuously. Here, the distribution, composition and polymorphism of microsatellites and compound microsatellites were analyzed in three available genomes of Candida species (Candida dubliniensis, Candida glabrata and Candida orthopsilosis). The results show that there were 118,047, 66,259 and 61,119 microsatellites in genomes of C. dubliniensis, C. glabrata and C. orthopsilosis, respectively. The SSRs covered more than 1/3 length of genomes in the three species. The microsatellites, which just consist of bases A and (or) T, such as (A)n, (T)n, (AT)n, (TA)n, (AAT)n, (TAA)n, (TTA)n, (ATA)n, (ATT)n and (TAT)n, were predominant in the three genomes. The length of microsatellites was focused on 6 bp and 9 bp either in the three genomes or in its coding sequences. What's more, the relative abundance (19.89/kbp) and relative density (167.87 bp/kbp) of SSRs in sequence of mitochondrion of C. glabrata were significantly great than that in any one of genomes or chromosomes of the three species. In addition, the distance between any two adjacent microsatellites was an important factor to influence the formation of compound microsatellites. The analysis may be helpful for further studying the roles of microsatellites in genomes' origination, organization and evolution of Candida species. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Performance and physiological responses to repeated-sprint and jump sequences.

    Science.gov (United States)

    Buchheit, Martin

    2010-11-01

    In this study, the performance and selected physiological responses to team-sport specific repeated-sprint and jump sequence were investigated. On four occasions, 13 team-sport players (22 ± 3 year) performed alternatively six repeated maximal straight-line or shuttle-sprints interspersed with a jump ([RS(+j), 6 × 25 m] or [RSS(+j), 6 × (2 × 12.5 m)]) or not ([RS, 6 × 25 m] or [RSS, 6 × (2 × 12.5 m)]) within each recovery period. Mean running time, rate of perceived exertion (RPE), pulmonary oxygen uptake (V(O)₂), blood lactate ([La](b)), and vastus lateralis deoxygenation ([HHb]) were obtained for each condition. Mean sprint times were greater for RS(+j) versus RS (4.14 ± 0.17 vs. 4.09 ± 0.16 s, with the qualitative analysis revealing a 82% chance of RS(+j) times to be greater than RS) and for RSS(+j) versus RSS (5.43 ± 0.18 vs. 5.29 ± 0.17 s; 99% chance of RSS(+j) to be >RSS). The correlation between sprint and jump abilities were large-to-very-large, but below 0.71 for RSSs. Jumps increased RPE (Cohen's d ± 90% CL: +0.7 ± 0.5; 95% chance for RS(+j) > RS and +0.7 ± 0.5; 96% for RSS(+j) > RSS), V(O)₂(+0.4 ± 0.5; 80% for RS(+j) > RS and +0.5 ± 0.5; 86% for RSS(+j) > RSS), [La](b) (+0.5 ± 0.5; 59% for RS(+j) > RS and +0.2 ± 0.5; unclear for RSS(+j) > RSS), and [HHb] (+0.5 ± 0.5; 86% for RS(+j) > RS and +0.5 ± 0.5; 85% for RSS(+j) > RSS). To conclude, repeated- sprint and jump abilities could be considered as specific qualities. The addition of a jump within the recovery periods during repeated-sprint running sequences impairs sprinting performance and might be an effective training practice for eliciting both greater systemic and vastus lateralis physiological loads.

  7. Insertion sequence inversions mediated by ectopic recombination between terminal inverted repeats.

    Science.gov (United States)

    Ling, Alison; Cordaux, Richard

    2010-12-20

    Transposable elements are widely distributed and diverse in both eukaryotes and prokaryotes, as exemplified by DNA transposons. As a result, they represent a considerable source of genomic variation, for example through ectopic (i.e. non-allelic homologous) recombination events between transposable element copies, resulting in genomic rearrangements. Ectopic recombination may also take place between homologous sequences located within transposable element sequences. DNA transposons are typically bounded by terminal inverted repeats (TIRs). Ectopic recombination between TIRs is expected to result in DNA transposon inversions. However, such inversions have barely been documented. In this study, we report natural inversions of the most common prokaryotic DNA transposons: insertion sequences (IS). We identified natural TIR-TIR recombination-mediated inversions in 9% of IS insertion loci investigated in Wolbachia bacteria, which suggests that recombination between IS TIRs may be a quite common, albeit largely overlooked, source of genomic diversity in bacteria. We suggest that inversions may impede IS survival and proliferation in the host genome by altering transpositional activity. They may also alter genomic instability by modulating the outcome of ectopic recombination events between IS copies in various orientations. This study represents the first report of TIR-TIR recombination within bacterial IS elements and it thereby uncovers a novel mechanism of structural variation for this class of prokaryotic transposable elements.

  8. Nucleotide sequence, DNA damage location and protein stoichiometry influence base excision repair outcome at CAG/CTG repeats

    Science.gov (United States)

    Goula, Agathi-Vasiliki; Pearson, Christopher E.; Della Maria, Julie; Trottier, Yvon; Tomkinson, Alan E.; Wilson, David M.; Merienne, Karine

    2012-01-01

    Expansion of CAG/CTG repeats is the underlying cause of >fourteen genetic disorders, including Huntington’s disease (HD) and myotonic dystrophy. The mutational process is ongoing, with increases in repeat size enhancing the toxicity of the expansion in specific tissues. In many repeat diseases the repeats exhibit high instability in the striatum, whereas instability is minimal in the cerebellum. We provide molecular insights as to how base excision repair (BER) protein stoichiometry may contribute to the tissue-selective instability of CAG/CTG repeats by using specific repair assays. Oligonucleotide substrates with an abasic site were mixed with either reconstituted BER protein stoichiometries mimicking the levels present in HD mouse striatum or cerebellum, or with protein extracts prepared from HD mouse striatum or cerebellum. In both cases, repair efficiency at CAG/CTG repeats and at control DNA sequences was markedly reduced under the striatal conditions, likely due to the lower level of APE1, FEN1 and LIG1. Damage located towards the 5’ end of the repeat tract was poorly repaired accumulating incompletely processed intermediates as compared to an AP lesion in the centre or at the 3’ end of the repeats or within a control sequences. Moreover, repair of lesions at the 5’ end of CAG or CTG repeats involved multinucleotide synthesis, particularly under the cerebellar stoichiometry, suggesting that long-patch BER processes lesions at sequences susceptible to hairpin formation. Our results show that BER stoichiometry, nucleotide sequence and DNA damage position modulate repair outcome, and suggest that a suboptimal LP-BER activity promotes CAG/CTG repeat instability. PMID:22497302

  9. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution

    Directory of Open Access Journals (Sweden)

    Purves Joanne

    2012-09-01

    Full Text Available Abstract Background Staphylococcus aureus Repeat (STAR elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis.

  10. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution.

    Science.gov (United States)

    Purves, Joanne; Blades, Matthew; Arafat, Yasrab; Malik, Salman A; Bayliss, Christopher D; Morrissey, Julie A

    2012-09-28

    Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis.

  11. Localization of Proteins to the 1,2-Propanediol Utilization Microcompartment by Non-native Signal Sequences Is Mediated by a Common Hydrophobic Motif*

    Science.gov (United States)

    Jakobson, Christopher M.; Kim, Edward Y.; Slininger, Marilyn F.; Chien, Alex; Tullman-Ercek, Danielle

    2015-01-01

    Various bacteria localize metabolic pathways to proteinaceous organelles known as bacterial microcompartments (MCPs), enabling the metabolism of carbon sources to enhance survival and pathogenicity in the gut. There is considerable interest in exploiting bacterial MCPs for metabolic engineering applications, but little is known about the interactions between MCP signal sequences and the protein shells of different MCP systems. We found that the N-terminal sequences from the ethanolamine utilization (Eut) and glycyl radical-generating protein MCPs are able to target reporter proteins to the 1,2-propanediol utilization (Pdu) MCP, and that this localization is mediated by a conserved hydrophobic residue motif. Recapitulation of this motif by the addition of a single amino acid conferred targeting function on an N-terminal sequence from the ethanol utilization MCP system that previously did not act as a Pdu signal sequence. Moreover, the Pdu-localized signal sequences competed with native Pdu targeting sequences for encapsulation in the Pdu MCP. Salmonella enterica natively possesses both the Pdu and Eut operons, and our results suggest that Eut proteins might be localized to the Pdu MCP in vivo. We further demonstrate that S. enterica LT2 retained the ability to grow on 1,2-propanediol as the sole carbon source when a Pdu enzyme was replaced with its Eut homolog. Although the relevance of this finding to the native system remains to be explored, we show that the Pdu-localized signal sequences described herein allow control over the ratio of heterologous proteins encapsulated within Pdu MCPs. PMID:26283792

  12. Localization of proteins to the 1,2-propanediol utilization microcompartment by non-native signal sequences is mediated by a common hydrophobic motif.

    Science.gov (United States)

    Jakobson, Christopher M; Kim, Edward Y; Slininger, Marilyn F; Chien, Alex; Tullman-Ercek, Danielle

    2015-10-02

    Various bacteria localize metabolic pathways to proteinaceous organelles known as bacterial microcompartments (MCPs), enabling the metabolism of carbon sources to enhance survival and pathogenicity in the gut. There is considerable interest in exploiting bacterial MCPs for metabolic engineering applications, but little is known about the interactions between MCP signal sequences and the protein shells of different MCP systems. We found that the N-terminal sequences from the ethanolamine utilization (Eut) and glycyl radical-generating protein MCPs are able to target reporter proteins to the 1,2-propanediol utilization (Pdu) MCP, and that this localization is mediated by a conserved hydrophobic residue motif. Recapitulation of this motif by the addition of a single amino acid conferred targeting function on an N-terminal sequence from the ethanol utilization MCP system that previously did not act as a Pdu signal sequence. Moreover, the Pdu-localized signal sequences competed with native Pdu targeting sequences for encapsulation in the Pdu MCP. Salmonella enterica natively possesses both the Pdu and Eut operons, and our results suggest that Eut proteins might be localized to the Pdu MCP in vivo. We further demonstrate that S. enterica LT2 retained the ability to grow on 1,2-propanediol as the sole carbon source when a Pdu enzyme was replaced with its Eut homolog. Although the relevance of this finding to the native system remains to be explored, we show that the Pdu-localized signal sequences described herein allow control over the ratio of heterologous proteins encapsulated within Pdu MCPs. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  13. Complete genome sequence of a recombinant Marek's disease virus field strain with one reticuloendotheliosis virus long terminal repeat insert.

    Science.gov (United States)

    Su, Shuai; Cui, Ning; Cui, Zhizhong; Zhao, Peng; Li, Yanpeng; Ding, Jiabo; Dong, Xuan

    2012-12-01

    Marek's disease virus (MDV) Chinese strain GX0101, isolated in 2001 from a vaccinated flock of layer chickens with severe tumors, was the first reported recombinant MDV field strain with one reticuloendotheliosis virus (REV) long terminal repeat (LTR) insert. GX0101 belongs to very virulent MDV (vvMDV) but has higher horizontal transmission ability than the vvMDV strain Md5. The complete genome sequence of GX0101 is 178,101 nucleotides (nt) and contains only one REV-LTR insert at a site 267 nt upstream of the sorf2 gene. Moreover, GX0101 has 5 repeats of a 217-nt fragment in its terminal repeat short (TRS) region and 3 repeats in internal repeat short (IRS) region, compared to the other 10 strains with only 1 or 2 repeats in both TRS and IRS.

  14. Individual and population variation in invertebrates revealed by Inter-simple Sequence Repeats (ISSRs

    Directory of Open Access Journals (Sweden)

    Patrick Abbot

    2001-08-01

    Full Text Available PCR-based molecular markers are well suited for questions requiring large scale surveys of plant and animal populations. Inter-simple Sequence Repeats or ISSRs are analyzed by a recently developed technique based on the amplification of the regions between inverse-oriented microsatellite loci with oligonucleotides anchored in microsatellites themselves. ISSRs have shown much promise for the study of the population biology of plants, but have not yet been explored for similar studies of animals. The value of ISSRs is demonstrated for the study of animal species with low levels of within-population variation. Sets of primers are identified which reveal variation in two aphid species, Acyrthosiphon pisum and Pemphigus obesinymphae, in the yellow fever mosquito Aedes aegypti, and in a rotifer in the genus Philodina.

  15. Hitchcock's Motifs

    NARCIS (Netherlands)

    Walker, Michael

    2005-01-01

    Among the abundant Alfred Hitchcock literature, Hitchcock's Motifs has found a fresh angle. Starting from recurring objects, settings, character-types and events, Michael Walker tracks some forty motifs, themes and clusters across the whole of Hitchcock's oeuvre, including not only all his 52 extant

  16. Genome-wide analysis of simple sequence repeats in the model medicinal mushroom Ganoderma lucidum.

    Science.gov (United States)

    Qian, Jun; Xu, Haibin; Song, Jingyuan; Xu, Jiang; Zhu, Yingjie; Chen, Shilin

    2013-01-10

    Simple sequence repeats (SSRs) or microsatellites are one of the most popular sources of genetic markers and play a significant role in gene function and genome organization. We identified SSRs in the genome of Ganoderma lucidum and analyzed their frequency and distribution in different genomic regions. We also compared the SSRs in G. lucidum with six other Agaricomycetes genomes: Coprinopsis cinerea, Laccaria bicolor, Phanerochaete chrysosporium, Postia placenta, Schizophyllum commune and Serpula lacrymans. Based on our search criteria, the total number of SSRs found ranged from 1206 to 6104 and covered from 0.04% to 0.15% of the fungal genomes. The SSR abundance was not correlated with the genome size, and mono- to tri-nucleotide repeats outnumbered other SSR categories in all of the species examined. In G. lucidum, a repertoire of 2674 SSRs was detected, with mono-nucleotides being the most abundant. SSRs were found in all genomic regions and were more abundant in non-coding regions than coding regions. The highest SSR relative abundance was found in introns (108 SSRs/Mb), followed by intergenic regions (84 SSRs/Mb). A total of 684 SSRs were found in the protein-coding sequences (CDSs) of 588 gene models, with 81.4% of them being tri- or hexa-nucleotides. After scanning for InterPro domains, 280 of these genes were successfully annotated, and 215 of them could be assigned to Gene Ontology (GO) terms. SSRs were also identified in 28 bioactive compound synthesis-related gene models, including one 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR), three polysaccharide biosynthesis genes and 24 cytochrome P450 monooxygenases (CYPs). Primers were designed for the identified SSR loci, providing the basis for the future development of SSR markers of this medicinal fungus.

  17. Evidence for integration of retroviral vectors in a novel human repeat sequence

    Energy Technology Data Exchange (ETDEWEB)

    Kurdi-Haidar, B.; Friedmann, T. [USCD School of Medicine, La Jolla, CA (United States)

    1994-09-01

    Retroviruses have become attractive vehicles for the introduction of foreign genes into mammalian cells not only for gene therapy but also to serve as anchor points for long-range mapping purposes. The information relating to retroviral integration in mammalian cells is derived mostly from studies of rodent genomes. The absence of information regarding integration sites of murine-based retroviral vectors in human cells has prompted us to investigate the characteristics of integration sites in the human genome. We have constructed a Moloney murine leukemia virus-based retroviral vector that carries the pUC8 origin of replication and the chloramphenicol resistance gene to allow the rescue of the flanking genomic sequences in plasmid form. We have infected human primary fibroblasts and myoblasts with this retroviral vector and isolated independently transduced clones. Genomic DNA was obtained from independent clones and the genomic fragment carrying the provirus-host sequence boundary was isolated after digestion of the genomic DNA, circularization, and transformation by electroporation of E. coli C cells to chloramphenicol resistance. Restriction map and nucleotide sequence analysis of the rescued plasmids showed that a number of the clones shared the same integration site within the human genome. We have used the nucleotide sequence information about the human DNA adjacent to the 3{prime}LTR to design a PCR-based assay diagnostic for this common integration site. Analysis revealed the presence of the same integration site in four out of twelve human primary fibroblast clones infected with this specific retroviral vector, and in one out of twelve human primary myoblast clones infected with a second retroviral vector. Further analysis revealed the common integration site to be a previously unreported primate repeat present in monkey and human genomes and absent from rodent, bovine and avian genomes.

  18. The functional glycosyltransferase signature sequence of the human beta 1,3-glucuronosyltransferase is a XDD motif.

    Science.gov (United States)

    Gulberti, Sandrine; Fournel-Gigleux, Sylvie; Mulliert, Guillermo; Aubry, André; Netter, Patrick; Magdalou, Jacques; Ouzzine, Mohamed

    2003-08-22

    The human beta 1,3-glucuronosyltransferase I (GlcAT-I) is the key enzyme responsible for the completion of glycosaminoglycan-protein linkage tetrasaccharide of proteoglycans (GlcA beta 1,3Gal beta 1,3Gal beta 1,4Xyl beta 1-O-serine). We have investigated the role of aspartate residues Asp194-Asp195-Asp196 corresponding to the glycosyltransferase DXD signature motif, in GlcAT-I function by UDP binding experiments, kinetic analyses, and site-directed mutagenesis. We presented the first evidence that Mn2+ is not only essential for GlcAT-I activity but is also required for cosubstrate binding. In agreement, kinetic studies were consistent with a metal-activated enzyme model whereby activation probably occurs via binding of a Mn2+.UDP-GlcA complex to the enzyme. Mutational analysis showed that the Asp194-Asp195-Asp196 motif is a major element of the UDP/Mn2+ binding site. Furthermore, determination of the individual role of each aspartate showed that substitution of Asp195 as well as Asp196 to alanine strongly impaired GlcAT-I activity, whereas Asp194 replacement produced only a moderate alteration of the enzyme activity. These findings along with molecular modeling and three-dimensional structure comparison of the GlcAT-I catalytic center with that of the Bacillus subtilis glycosyltransferase SpsA provided evidence that the interactions of Asp195 with the ribose moiety of UDP and of Asp196 with the metal cation Mn2+ were crucial for GlcAT-I function. Altogether, these results indicated that, similarly to the SpsA enzyme, the nucleotide binding site of GlcAT-I contains a XDD motif rather than a DXD motif.

  19. Linear array of conserved sequence motifs to discriminate protein subfamilies: study on pyridine nucleotide-disulfide reductases

    Directory of Open Access Journals (Sweden)

    De Las Rivas Javier

    2007-03-01

    Full Text Available Abstract Background The pyridine nucleotide disulfide reductase (PNDR is a large and heterogeneous protein family divided into two classes (I and II, which reflect the divergent evolution of its characteristic disulfide redox active site. However, not all the PNDR members fit into these categories and this suggests the need of further studies to achieve a more comprehensive classification of this complex family. Results A workflow to improve the clusterization of protein families based on the array of linear conserved motifs is designed. The method is applied to the PNDR large family finding two main groups, which correspond to PNDR classes I and II. However, two other separate protein clusters, previously classified as class I in most databases, are outgrouped: the peroxide reductases (NAOX, NAPE and the type II NADH dehydrogenases (NDH-2. In this way, two novel PNDR classes III and IV for NAOX/NAPE and NDH-2 respectively are proposed. By knowledge-driven biochemical and functional data analyses done on the new class IV, a linear array of motifs putatively related to Cu(II-reductase activity is detected in a specific subset of NDH-2. Conclusion The results presented are a novel contribution to the classification of the complex and large PNDR protein family, supporting its reclusterization into four classes. The linear array of motifs detected within the class IV PNDR subfamily could be useful as a signature for a particular subgroup of NDH-2.

  20. The First Molecular Identification of an Olive Collection Applying Standard Simple Sequence Repeats and Novel Expressed Sequence Tag Markers

    Directory of Open Access Journals (Sweden)

    Soraya Mousavi

    2017-07-01

    Full Text Available Germplasm collections of tree crop species represent fundamental tools for conservation of diversity and key steps for its characterization and evaluation. For the olive tree, several collections were created all over the world, but only few of them have been fully characterized and molecularly identified. The olive collection of Perugia University (UNIPG, established in the years’ 60, represents one of the first attempts to gather and safeguard olive diversity, keeping together cultivars from different countries. In the present study, a set of 370 olive trees previously uncharacterized was screened with 10 standard simple sequence repeats (SSRs and nine new EST-SSR markers, to correctly and thoroughly identify all genotypes, verify their representativeness of the entire cultivated olive variation, and validate the effectiveness of new markers in comparison to standard genotyping tools. The SSR analysis revealed the presence of 59 genotypes, corresponding to 72 well known cultivars, 13 of them resulting exclusively present in this collection. The new EST-SSRs have shown values of diversity parameters quite similar to those of best standard SSRs. When compared to hundreds of Mediterranean cultivars, the UNIPG olive accessions were splitted into the three main populations (East, Center and West Mediterranean, confirming that the collection has a good representativeness of the entire olive variability. Furthermore, Bayesian analysis, performed on the 59 genotypes of the collection by the use of both sets of markers, have demonstrated their splitting into four clusters, with a well balanced membership obtained by EST respect to standard SSRs. The new OLEST (Olea expressed sequence tags SSR markers resulted as effective as the best standard markers. The information obtained from this study represents a high valuable tool for ex situ conservation and management of olive genetic resources, useful to build a common database from worldwide olive

  1. MIDDAS-M: motif-independent de novo detection of secondary metabolite gene clusters through the integration of genome sequencing and transcriptome data.

    Science.gov (United States)

    Umemura, Myco; Koike, Hideaki; Nagano, Nozomi; Ishii, Tomoko; Kawano, Jin; Yamane, Noriko; Kozone, Ikuko; Horimoto, Katsuhisa; Shin-ya, Kazuo; Asai, Kiyoshi; Yu, Jiujiang; Bennett, Joan W; Machida, Masayuki

    2013-01-01

    Many bioactive natural products are produced as "secondary metabolites" by plants, bacteria, and fungi. During the middle of the 20th century, several secondary metabolites from fungi revolutionized the pharmaceutical industry, for example, penicillin, lovastatin, and cyclosporine. They are generally biosynthesized by enzymes encoded by clusters of coordinately regulated genes, and several motif-based methods have been developed to detect secondary metabolite biosynthetic (SMB) gene clusters using the sequence information of typical SMB core genes such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS). However, no detection method exists for SMB gene clusters that are functional and do not include core SMB genes at present. To advance the exploration of SMB gene clusters, especially those without known core genes, we developed MIDDAS-M, a motif-independent de novodetection algorithm for SMB gene clusters. We integrated virtual gene cluster generation in an annotated genome sequence with highly sensitive scoring of the cooperative transcriptional regulation of cluster member genes. MIDDAS-M accurately predicted 38 SMB gene clusters that have been experimentally confirmed and/or predicted by other motif-based methods in 3 fungal strains. MIDDAS-M further identified a new SMB gene cluster for ustiloxin B, which was experimentally validated. Sequence analysis of the cluster genes indicated a novel mechanism for peptide biosynthesis independent of NRPS. Because it is fully computational and independent of empirical knowledge about SMB core genes, MIDDAS-M allows a large-scale, comprehensive analysis of SMB gene clusters, including those with novel biosynthetic mechanisms that do not contain any functionally characterized genes.

  2. Simple sequence repeat markers useful for sorghum downy mildew (Peronosclerospora sorghi and related species

    Directory of Open Access Journals (Sweden)

    Odvody Gary N

    2008-11-01

    Full Text Available Abstract Background A recent outbreak of sorghum downy mildew in Texas has led to the discovery of both metalaxyl resistance and a new pathotype in the causal organism, Peronosclerospora sorghi. These observations and the difficulty in resolving among phylogenetically related downy mildew pathogens dramatically point out the need for simply scored markers in order to differentiate among isolates and species, and to study the population structure within these obligate oomycetes. Here we present the initial results from the use of a biotin capture method to discover, clone and develop PCR primers that permit the use of simple sequence repeats (microsatellites to detect differences at the DNA level. Results Among the 55 primers pairs designed from clones from pathotype 3 of P. sorghi, 36 flanked microsatellite loci containing simple repeats, including 28 (55% with dinucleotide repeats and 6 (11% with trinucleotide repeats. A total of 22 microsatellites with CA/AC or GT/TG repeats were the most abundant (40% and GA/AG or CT/TC types contribute 15% in our collection. When used to amplify DNA from 19 isolates from P. sorghi, as well as from 5 related species that cause downy mildew on other hosts, the number of different bands detected for each SSR primer pair using a LI-COR- DNA Analyzer ranged from two to eight. Successful cross-amplification for 12 primer pairs studied in detail using DNA from downy mildews that attack maize (P. maydis & P. philippinensis, sugar cane (P. sacchari, pearl millet (Sclerospora graminicola and rose (Peronospora sparsa indicate that the flanking regions are conserved in all these species. A total of 15 SSR amplicons unique to P. philippinensis (one of the potential threats to US maize production were detected, and these have potential for development of diagnostic tests. A total of 260 alleles were obtained using 54 microsatellites primer combinations, with an average of 4.8 polymorphic markers per SSR across 34

  3. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

    DEFF Research Database (Denmark)

    Foulk, M. S.; Urban, J. M.; Casella, Cinzia;

    2015-01-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (lambda-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent...... are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na+ instead of K...

  4. Differential distribution and association of repeat DNA sequences in the lateral element of the synaptonemal complex in rat spermatocytes.

    Science.gov (United States)

    Hernández-Hernández, Abrahan; Rincón-Arano, Héctor; Recillas-Targa, Félix; Ortiz, Rosario; Valdes-Quezada, Christian; Echeverría, Olga M; Benavente, Ricardo; Vázquez-Nin, Gerardo H

    2008-02-01

    The synaptonemal complex (SC) is an evolutionarily conserved structure that mediates synapsis of homologous chromosomes during meiotic prophase I. Previous studies have established that the chromatin of homologous chromosomes is organized in loops that are attached to the lateral elements (LEs) of the SC. The characterization of the genomic sequences associated with LEs of the SC represents an important step toward understanding meiotic chromosome organization and function. To isolate these genomic sequences, we performed chromatin immunoprecipitation assays in rat spermatocytes using an antibody against SYCP3, a major structural component of the LEs of the SC. Our results demonstrated the reproducible and exclusive isolation of repeat deoxyribonucleic acid (DNA) sequences, in particular long interspersed elements, short interspersed elements, long terminal direct repeats, satellite, and simple repeats. The association of these repeat sequences to the LEs of the SC was confirmed by in situ hybridization of meiotic nuclei shown by both light and electron microscopy. Signals were also detected over the chromatin surrounding SCs and in small loops protruding from the lateral elements into the SC central region. We propose that genomic repeat DNA sequences play a key role in anchoring the chromosome to the protein scaffold of the SC.

  5. Long terminal repeat sequences from virulent and attenuated equine infectious anemia virus demonstrate distinct promoter activities.

    Science.gov (United States)

    Zhou, Tao; Yuan, Xiu-Fang; Hou, Shao-Hua; Tu, Ya-Bin; Peng, Jin-Mei; Wen, Jian-Xin; Qiu, Hua-Ji; Wu, Dong-Lai; Chen, Huan-Chun; Wang, Xiao-Jun; Tong, Guang-Zhi

    2007-09-01

    In the early 1970s, the Chinese Equine Infectious Anemia Virus (EIAV) vaccine, EIAV(DLA), was developed through successive passages of a wild-type virulent virus (EIAV(L)) in donkeys in vivo and then in donkey macrophages in vitro. EIAV attenuation and cell tropism adaptation are associated with changes in both envelope and long terminal repeat (LTR). However, specific LTR changes during Chinese EIAV attenuation have not been demonstrated. In this study, we compared LTR sequences from both virulent and attenuated EIAV strains and documented the diversities of LTR sequence from in vivo and in vitro infections. We found that EIAV LTRs of virulent strains were homologous, while EIAV vaccine have variable LTRs. Interestingly, experimental inoculation of EIAV(DLA) into a horse resulted in a restriction of the LTR variation. Furthermore, LTRs from EIAV(DLA) showed higher Tat transactivated activity than LTRs from virulent strains. By using chimeric clones of wild-type LTR and vaccine LTR, the main difference of activity was mapped to the changes of R region, rather than U3 region.

  6. Transcriptome characterisation and simple sequence repeat marker discovery in the seagrass Posidonia oceanica

    Science.gov (United States)

    D’Esposito, D.; Orrù, L.; Dattolo, E.; Bernardo, L.; Lamontara, A.; Orsini, L.; Serra, I.A; Mazzuca, S.; Procaccini, G.

    2016-01-01

    Posidonia oceanica is an endemic seagrass in the Mediterranean Sea, where it provides important ecosystem services and sustains a rich and diverse ecosystem. P. oceanica meadows extend from the surface to 40 meters depth. With the aim of boosting research in this iconic species, we generated a comprehensive RNA-Seq data set for P. oceanica by sequencing specimens collected at two depths and two times during the day. With this approach we attempted to capture the transcriptional diversity associated with change in light and other depth-related environmental factors. Using this extensive data set we generated gene predictions and identified an extensive catalogue of potential Simple Sequence Repeats (SSR) markers. The data generated here will open new avenues for the analysis of population genetic features and functional variation in P. oceanica. In total, 79,235 contigs were obtained by the assembly of 70,453,120 paired end reads. 43,711 contigs were successfully annotated. A total of 17,436 SSR were identified within 13,912 contigs. PMID:27996971

  7. The Motif Tracking Algorithm

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The search for patterns or motifs in data represents a problem area of key interest to finance and economic researchers. In this paper, we introduce the motif tracking algorithm (MTA), a novel immune inspired (IS) pattern identification tool that is able to identify unknown motifs of a non specified length which repeat within time series data. The power of the algorithm comes from the fact that it uses a small number of parameters with minimal assumptions regarding the data being examined or the underlying motifs. Our interest lies in applying the algorithm to financial time series data to identify unknown patterns that exist. The algorithm is tested using three separate data sets. Particular suitability to financial data is shown by applying it to oil price data. In all cases, the algorithm identifies the presence of a motif population in a fast and efficient manner due to the utilization of an intuitive symbolic representation.The resulting population of motifs is shown to have considerable potential value for other applications such as forecasting and algorithm seeding.

  8. The Motif Tracking Algorithm

    CERN Document Server

    Wilson, William; Aickelin, Uwe; 10.1007/s11633.008.0032.0

    2010-01-01

    The search for patterns or motifs in data represents a problem area of key interest to finance and economic researchers. In this paper we introduce the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs of a non specified length which repeat within time series data. The power of the algorithm comes from the fact that it uses a small number of parameters with minimal assumptions regarding the data being examined or the underlying motifs. Our interest lies in applying the algorithm to financial time series data to identify unknown patterns that exist. The algorithm is tested using three separate data sets. Particular suitability to financial data is shown by applying it to oil price data. In all cases the algorithm identifies the presence of a motif population in a fast and efficient manner due to the utilisation of an intuitive symbolic representation. The resulting population of motifs is shown to have considerable potential value for other ap...

  9. Creation and structure determination of an artificial protein with three complete sequence repeats

    Energy Technology Data Exchange (ETDEWEB)

    Adachi, Motoyasu, E-mail: adachi.motoyasu@jaea.go.jp; Shimizu, Rumi; Kuroki, Ryota [Japan Atomic Energy Agency, Shirakatashirane 2-4, Nakagun Tokaimura, Ibaraki 319-1195 (Japan); Blaber, Michael [Japan Atomic Energy Agency, Shirakatashirane 2-4, Nakagun Tokaimura, Ibaraki 319-1195 (Japan); Florida State University, Tallahassee, FL 32306-4300 (United States)

    2013-11-01

    An artificial protein with three complete sequence repeats was created and the structure was determined by X-ray crystallography. The structure showed threefold symmetry even though there is an amino- and carboxy-terminal. The artificial protein with threefold symmetry may be useful as a scaffold to capture small materials with C3 symmetry. Symfoil-4P is a de novo protein exhibiting the threefold symmetrical β-trefoil fold designed based on the human acidic fibroblast growth factor. First three asparagine–glycine sequences of Symfoil-4P are replaced with glutamine–glycine (Symfoil-QG) or serine–glycine (Symfoil-SG) sequences protecting from deamidation, and His-Symfoil-II was prepared by introducing a protease digestion site into Symfoil-QG so that Symfoil-II has three complete repeats after removal of the N-terminal histidine tag. The Symfoil-QG and SG and His-Symfoil-II proteins were expressed in Eschericha coli as soluble protein, and purified by nickel affinity chromatography. Symfoil-II was further purified by anion-exchange chromatography after removing the HisTag by proteolysis. Both Symfoil-QG and Symfoil-II were crystallized in 0.1 M Tris-HCl buffer (pH 7.0) containing 1.8 M ammonium sulfate as precipitant at 293 K; several crystal forms were observed for Symfoil-QG and II. The maximum diffraction of Symfoil-QG and II crystals were 1.5 and 1.1 Å resolution, respectively. The Symfoil-II without histidine tag diffracted better than Symfoil-QG with N-terminal histidine tag. Although the crystal packing of Symfoil-II is slightly different from Symfoil-QG and other crystals of Symfoil derivatives having the N-terminal histidine tag, the refined crystal structure of Symfoil-II showed pseudo-threefold symmetry as expected from other Symfoils. Since the removal of the unstructured N-terminal histidine tag did not affect the threefold structure of Symfoil, the improvement of diffraction quality of Symfoil-II may be caused by molecular characteristics of

  10. Intergenic regions of Borrelia plasmids contain phylogenetically conserved RNA secondary structure motifs

    Directory of Open Access Journals (Sweden)

    Delihas Nicholas

    2009-03-01

    Full Text Available Abstract Background Borrelia species are unusual in that they contain a large number of linear and circular plasmids. Many of these plasmids have long intergenic regions. These regions have many fragmented genes, repeated sequences and appear to be in a state of flux, but they may serve as reservoirs for evolutionary change and/or maintain stable motifs such as small RNA genes. Results In an in silico study, intergenic regions of Borrelia plasmids were scanned for phylogenetically conserved stem loop structures that may represent functional units at the RNA level. Five repeat sequences were found that could fold into stable RNA-type stem loop structures, three of which are closely linked to protein genes, one of which is a member of the Borrelia lipoprotein_1 super family genes and another is the complement regulator-acquiring surface protein_1 (CRASP-1 family. Modeled secondary structures of repeat sequences display numerous base-pair compensatory changes in stem regions, including C-G→A-U transversions when orthologous sequences are compared. Base-pair compensatory changes constitute strong evidence for phylogenetic conservation of secondary structure. Conclusion Intergenic regions of Borrelia species carry evolutionarily stable RNA secondary structure motifs. Of major interest is that some motifs are associated with protein genes that show large sequence variability. The cell may conserve these RNA motifs whereas allow a large flux in amino acid sequence, possibly to create new virulence factors but with associated RNA motifs intact.

  11. Characterization of comparative genome-derived simple sequence repeats for acanthopterygian fishes.

    Science.gov (United States)

    Gotoh, Ryo O; Tamate, Satoshi; Yokoyama, Jun; Tamate, Hidetoshi B; Hanzawa, Naoto

    2013-05-01

    Simple sequence repeats (SSRs) have become one of the most popular molecular markers for population genetic studies. The application of SSR markers has often been limited to source species because SSR loci are too labile to be maintained in even closely related species. However, a few extremely conserved SSR loci have been reported. Here, we tested for the presence of conserved SSR loci in acanthopterygian fishes, which include over 14 000 species, by comparing the genome sequences of four acanthopterygian fishes. We also examined the comparative genome-derived SSRs (CG-SSRs) for their transferability across acanthopterygian fishes and their applicability to population genetic analysis. Forty-six SSR loci with conserved flanking regions were detected and examined for their transferability among seven nonacanthopterygian and 27 acanthopterygian fishes. The PCR amplification success rate in nonacanthopterygian fishes was low, ranging from 2.2% to 21.7%, except for Lophius litulon (Lophiiformes; 80.4%). Conversely, the rate in most acanthopterygian fishes exceeded 70.0%. Sequencing of these 46 loci revealed the presence of SSRs suitable for scoring while fragment analysis of 20 loci revealed polymorphisms in most of the acanthopterygian fishes. Population genetic analysis of Cottus pollux (Scorpaeniformes) and Sphaeramia orbicularis (Perciformes) using CG-SSRs showed that these populations did not deviate from linkage equilibrium or Hardy-Weinberg equilibrium. Furthermore, almost no loci showed evidence of null alleles, suggesting that CG-SSRs have strong resolving power for population genetic analysis. Our findings will facilitate the use of these markers in species in which markers remain to be identified.

  12. Next generation sequencing (NGS database for tandem repeats with multiple pattern 2°-shaft multicore string matching

    Directory of Open Access Journals (Sweden)

    Chinta Someswara Rao

    2016-03-01

    Full Text Available Next generation sequencing (NGS technologies have been rapidly applied in biomedical and biological research in recent years. To provide the comprehensive NGS resource for the research, in this paper , we have considered 10 loci/codi/repeats TAGA, TCAT, GAAT, AGAT, AGAA, GATA, TATC, CTTT, TCTG and TCTA. Then we developed the NGS Tandem Repeat Database (TandemRepeatDB for all the chromosomes of Homo sapiens, Callithrix jacchus, Chlorocebus sabaeus, Gorilla gorilla, Macaca fascicularis, Macaca mulatta, Nomascus leucogenys, Pan troglodytes, Papio anubis and Pongo abelii genome data sets for all those locis. We find the successive occurence frequency for all the above 10 SSR (simple sequence repeats in the above genome data sets on a chromosome-by-chromosome basis with multiple pattern 2° shaft multicore string matching.

  13. Next generation sequencing (NGS) database for tandem repeats with multiple pattern 2°-shaft multicore string matching

    Science.gov (United States)

    Someswara Rao, Chinta; Raju, S. Viswanadha

    2016-01-01

    Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research in recent years. To provide the comprehensive NGS resource for the research, in this paper , we have considered 10 loci/codi/repeats TAGA, TCAT, GAAT, AGAT, AGAA, GATA, TATC, CTTT, TCTG and TCTA. Then we developed the NGS Tandem Repeat Database (TandemRepeatDB) for all the chromosomes of Homo sapiens, Callithrix jacchus, Chlorocebus sabaeus, Gorilla gorilla, Macaca fascicularis, Macaca mulatta, Nomascus leucogenys, Pan troglodytes, Papio anubis and Pongo abelii genome data sets for all those locis. We find the successive occurence frequency for all the above 10 SSR (simple sequence repeats) in the above genome data sets on a chromosome-by-chromosome basis with multiple pattern 2° shaft multicore string matching. PMID:26981434

  14. Next generation sequencing (NGS) database for tandem repeats with multiple pattern 2°-shaft multicore string matching.

    Science.gov (United States)

    Someswara Rao, Chinta; Raju, S Viswanadha

    2016-03-01

    Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research in recent years. To provide the comprehensive NGS resource for the research, in this paper , we have considered 10 loci/codi/repeats TAGA, TCAT, GAAT, AGAT, AGAA, GATA, TATC, CTTT, TCTG and TCTA. Then we developed the NGS Tandem Repeat Database (TandemRepeatDB) for all the chromosomes of Homo sapiens, Callithrix jacchus, Chlorocebus sabaeus, Gorilla gorilla, Macaca fascicularis, Macaca mulatta, Nomascus leucogenys, Pan troglodytes, Papio anubis and Pongo abelii genome data sets for all those locis. We find the successive occurence frequency for all the above 10 SSR (simple sequence repeats) in the above genome data sets on a chromosome-by-chromosome basis with multiple pattern 2° shaft multicore string matching.

  15. A novel tRNA variable number tandem repeat at human chromosome 1q23.3 is implicated as a boundary element based on conservation of a CTCF motif in mouse.

    Science.gov (United States)

    Darrow, Emily M; Chadwick, Brian P

    2014-06-01

    The human genome contains numerous large tandem repeats, many of which remain poorly characterized. Here we report a novel transfer RNA (tRNA) tandem repeat on human chromosome 1q23.3 that shows extensive copy number variation with 9-43 repeat units per allele and displays evidence of meiotic and mitotic instability. Each repeat unit consists of a 7.3 kb GC-rich sequence that binds the insulator protein CTCF and bears the chromatin hallmarks of a bivalent domain in human embryonic stem cells. A tRNA containing tandem repeat composed of at least three 7.6-kb GC-rich repeat units reside within a syntenic region of mouse chromosome 1. However, DNA sequence analysis reveals that, with the exception of the tRNA genes that account for less than 6% of a repeat unit, the remaining 7.2 kb is not conserved with the notable exception of a 24 base pair sequence corresponding to the CTCF binding site, suggesting an important role for this protein at the locus.

  16. Chromosomal organizations of major repeat families on potato (Solanum tuberosum) and further exploring in its sequenced genome.

    Science.gov (United States)

    Tang, Xiaomin; Datema, Erwin; Guzman, Myriam Olortegui; de Boer, Jan M; van Eck, Herman J; Bachem, Christian W B; Visser, Richard G F; de Jong, Hans

    2014-12-01

    One of the most powerful technologies in unraveling the organization of a eukaryotic plant genome is high-resolution Fluorescent in situ hybridization of repeats and single copy DNA sequences on pachytene chromosomes. This technology allows the integration of physical mapping information with chromosomal positions, including centromeres, telomeres, nucleolar-organizing region, and euchromatin and heterochromatin. In this report, we established chromosomal positions of different repeat fractions of the potato genomic DNA (Cot100, Cot500 and Cot1000) on the chromosomes. We also analysed various repeat elements that are unique to potato including the moderately repetitive P5 and REP2 elements, where the REP2 is part of a larger Gypsy-type LTR retrotransposon and cover most chromosome regions, with some brighter fluorescing spots in the heterochromatin. The most abundant tandem repeat is the potato genomic repeat 1 that covers subtelomeric regions of most chromosome arms. Extensive multiple alignments of these repetitive sequences in the assembled RH89-039-16 potato BACs and the draft assembly of the DM1-3 516 R44 genome shed light on the conservation of these repeats within the potato genome. The consensus sequences thus obtained revealed the native complete transposable elements from which they were derived.

  17. Analysis of sequences involved in IE2 transactivation of a baculovirus immediate-early gene promoter and identification of a new regulatory motif.

    Science.gov (United States)

    Shippam-Brett, C E; Willis, L G; Theilmann, D A

    2001-05-01

    Opep-2 is a unique baculovirus early gene that has only been identified in the Orgyia pseudotsugata multiple capsid nucleopolyhedrovirus (OpMNPV). Previous analyses have shown this gene is expressed at very early times post-infection (p.i.) but is shut down by 36-48 h p.i. The promoter of opep-2 therefore, represents a class of early genes that is temporally regulated. In this study, a detailed analysis of the opep-2 promoter is performed to analyze the role individual motifs play in early gene expression. A new 13 base pair regulatory element was identified and shown to be essential in controlling high-level expression of this gene. In addition, mutational analysis revealed that GATA and CACGTG motifs, which have been shown to bind cellular factors in Sf9 and Ld652Y cells, played minor roles in influencing opep-2 expression in the absence of other viral factors. The OpMNPV transactivator IE2 causes a significant activation of the opep-2 promoter. Cotransfection of an extensive number of promoter deletions and mutations did not show any sequence specificity for IE2 transactivation. This is the first detailed analysis of the sequence requirements for IE2 transactivation, and these results suggest that IE2 does not bind directly to specific elements in the opep-2 promoter.

  18. Associations of homologous RNA-binding motif gene on the X chromosome (RBMX) and its like sequence on chromosome 9(RBMXL9) with non-obstructive azoospermia

    Institute of Scientific and Technical Information of China (English)

    Akira Tsujimura; Masao Ota; Akihiko Okuyama; Kazutoshi Fujita; Kazuhiko Komori; Phanu Tanjapatkul; Yasushi Miyagawa; Shingo Takada; Kiyomi Matsumiya; Masaharu Sada; Yoshihiko Katsuyama

    2006-01-01

    Aim: To investigate the associations of autosomal and X-chromosome homologs of the RNA-binding-motif (RNA-binding-motif on the Y chromosome, RBMY) gene with non-obstructive azoospermia (NOA), as genetic factors for NOA may map to chromosomes other than the Y chromosome. Methods: Genomic DNA was extracted using a salting-out procedure after treatment of peripheral blood leukocytes with proteinase K from Japanese patients with NOA (n = 67) and normal fertile volunteers (n = 105). The DNA were analyzed for RBMX by expressed sequence tag (EST) deletion and for the like sequence on chromosome 9 (RBMXL9) by microsatellite polymorphism. Results: We examined six ESTs in and around RBMX and found a deletion of SHGC31764 in one patient with NOA and a deletion of DXS7491 in one other patient with NOA. No deletions were detected in control subjects. The association study with nine microsatellite markers near RBMXL9 revealed that D9S319 was less prevalent in patients than in control subjects, whereas D9S1853 was detected more frequently in patients than that in control subjects. Conclusion: We provide evidence that deletions in or around RBMX may be involved in NOA. In addition, analyses of markers in the vicinity of RBMXL9 on chromosome 9 suggest the possibility that variants of this gene may be associated with NOA.Although further studies are necessary, this is the first report of the association between RBMX and RBMXL9 with NOA.

  19. Identification and characterization of simple sequence repeats (SSRs) for population studies of Puccinia novopanici.

    Science.gov (United States)

    Orquera-Tornakian, Gabriela K; Garrido, Patricia; Kronmiller, Brent; Hunger, Robert; Tyler, Brett M; Garzon, Carla D; Marek, Stephen M

    2017-08-01

    Switchgrass (Panicum virgatum L.) can be severely affected by rust disease. Recently switchgrass rust caused by P. emaculata (now confirmed to be Puccinia novopanici) has received most of the attention by the research community because this pathogen is responsible for reducing the biomass production and biofuel feedstock quality of switchgrass. Microsatellite markers found in the literature were either not informative (no allele frequency) or showed few polymorphisms in the target populations, therefore additional markers are needed for future studies of the genetic variation and population structure of P. novopanici. This study reports the development and characterization of novel simple sequence repeat (SSR) markers from a Puccinia emaculata s.l. microsatellite-enriched library and expressed sequence tags (ESTs). Microsatellites were evaluated for polymorphisms on P. emaculata s.l. urediniospores collected in Iowa (IA), Mississippi (MS), Oklahoma (OK), South Dakota (SD) and Virginia (VA). Puccinia novopanici single spore whole genome amplifications were used as templates to validate the SSR reactions protocol and to assess a preliminary population genetics statistics of the pathogen. Eighteen microsatellite markers were polymorphic (average PIC=0.72) on individual urediniospores, with an average of 8.3 alleles per locus (range 3 to 17). Of the 49 SSRs loci initially identified in P. emaculata s.l., 18 were transferable to P. striiformis f. sp. tritici, 23 to P. triticina, 20 to P. sorghi and 31 to P. andropogonis. Thus, these markers could be useful for DNA fingerprinting and population structure analysis for population genetics, epidemiology and ecological studies of P. novopanici and potentially other related Puccinia species. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. TPRpred: a tool for prediction of TPR-, PPR- and SEL1-like repeats from protein sequences

    Directory of Open Access Journals (Sweden)

    Söding Johannes

    2007-01-01

    Full Text Available Abstract Background Solenoid repeat proteins of the Tetratrico Peptide Repeat (TPR family are involved as scaffolds in a broad range of protein-protein interactions. Several resources are available for the prediction of TPRs, however, they often fail to detect divergent repeat units. Results We have developed TPRpred, a profile-based method which uses a P-value-dependent score offset to include divergent repeat units and which exploits the tendency of repeats to occur in tandem. TPRpred detects not only TPR-like repeats, but also the related Pentatrico Peptide Repeats (PPRs and SEL1-like repeats. The corresponding profiles were generated through iterative searches, by varying the threshold parameters for inclusion of repeat units into the profiles, and the best profiles were selected based on their performance on proteins of known structure. We benchmarked the performance of TPRpred in detecting TPR-containing proteins and in delineating the individual repeats therein, against currently available resources. Conclusion TPRpred performs significantly better in detecting divergent repeats in TPR-containing proteins, and finds more individual repeats than the existing methods. The web server is available at http://tprpred.tuebingen.mpg.de, and the C++ and Perl sources of TPRpred along with the profiles can be downloaded from ftp://ftp.tuebingen.mpg.de/ebio/protevo/TPRpred/.

  1. The κB transcriptional enhancer motif and signal sequences of V(DJ recombination are targets for the zinc finger protein HIVEP3/KRC: a site selection amplification binding study

    Directory of Open Access Journals (Sweden)

    Wu Lai-Chu

    2002-08-01

    Full Text Available Abstract Background The ZAS family is composed of proteins that regulate transcription via specific gene regulatory elements. The amino-DNA binding domain (ZAS-N and the carboxyl-DNA binding domain (ZAS-C of a representative family member, named κB DNA binding and recognition component (KRC, were expressed as fusion proteins and their target DNA sequences were elucidated by site selection amplification binding assays, followed by cloning and DNA sequencing. The fusion proteins-selected DNA sequences were analyzed by the MEME and MAST computer programs to obtain consensus motifs and DNA elements bound by the ZAS domains. Results Both fusion proteins selected sequences that were similar to the κB motif or the canonical elements of the V(DJ recombination signal sequences (RSS from a pool of degenerate oligonucleotides. Specifically, the ZAS-N domain selected sequences similar to the canonical RSS nonamer, while ZAS-C domain selected sequences similar to the canonical RSS heptamer. In addition, both KRC fusion proteins selected oligonucleoties with sequences identical to heptamer and nonamer sequences within endogenous RSS. Conclusions The RSS are cis-acting DNA motifs which are essential for V(DJ recombination of antigen receptor genes. Due to its specific binding affinity for RSS and κB-like transcription enhancer motifs, we hypothesize that KRC may be involved in the regulation of V(DJ recombination.

  2. Genetic Diversity Assessment and Identification of New Sour Cherry Genotypes Using Intersimple Sequence Repeat Markers

    Directory of Open Access Journals (Sweden)

    Roghayeh Najafzadeh

    2014-01-01

    Full Text Available Iran is one of the chief origins of subgenus Cerasus germplasm. In this study, the genetic variation of new Iranian sour cherries (which had such superior growth characteristics and fruit quality as to be considered for the introduction of new cultivars was investigated and identified using 23 intersimple sequence repeat (ISSR markers. Results indicated a high level of polymorphism of the genotypes based on these markers. According to these results, primers tested in this study specially ISSR-4, ISSR-6, ISSR-13, ISSR-14, ISSR-16, and ISSR-19 produced good and various levels of amplifications which can be effectively used in genetic studies of the sour cherry. The genetic similarity among genotypes showed a high diversity among the genotypes. Cluster analysis separated improved cultivars from promising Iranian genotypes, and the PCoA supported the cluster analysis results. Since the Iranian genotypes were superior to the improved cultivars and were separated from them in most groups, these genotypes can be considered as distinct genotypes for further evaluations in the framework of breeding programs and new cultivar identification in cherries. Results also confirmed that ISSR is a reliable DNA marker that can be used for exact genetic studies and in sour cherry breeding programs.

  3. The Cipher Code of Simple Sequence Repeats in “Vampire Pathogens”

    Science.gov (United States)

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W.; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like “vampire pathogens” (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  4. Molecular identification of Aquilaria spp. by using inter-simple sequence repeat (ISSR)

    Science.gov (United States)

    Azhari, Hanif; Mohamad, Azhar; Othman, Roohaida

    2015-09-01

    Aquilaria species are very important economic plant for production of resin locally known as gaharu in Malaysia. There are five species that can be found in Malaysia and the most important Aquilaria species for gaharu production is A. malaccensis. Molecular markers for Aquilaria species are still insufficient and require more efficient, robust and reproducible molecular marker. Inter-simple sequence repeat (ISSR) markers are highly polymorphic and have high reproducibility which will be useful in areas of genetic diversity, phylogenetic studies, gene tagging, genome mapping and evolutionary biology in a wide range of crop species. Five selected ISSR primers were used to identify four Aquilaria species commonly found in Malaysia namely A. malaccensis, A. sub-integra, A. crassna and A. hirta. All the primers showed sufficient polymorphism to distinguish between the four species. Hence, the markers derived from ISSR can be used for molecular identification of Aquilaria spp. in ensuring homogenous species for plantation which may improve the quality of resin derived from known and certified materials.

  5. Agarose gel electrophoresis and polyacrylamide gel electrophoresis for visualization of simple sequence repeats.

    Science.gov (United States)

    Anderson, James; Wright, Drew; Meksem, Khalid

    2013-01-01

    In the modern age of genetic research there is a constant search for ways to improve the efficiency of plant selection. The most recent technology that can result in a highly efficient means of selection and still be done at a low cost is through plant selection directed by simple sequence repeats (SSRs or microsatellites). The molecular markers are used to select for certain desirable plant traits without relying on ambiguous phenotypic data. The best way to detect these is the use of gel electrophoresis. Gel electrophoresis is a common technique in laboratory settings which is used to separate deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) by size. Loading DNA and RNA onto gels allows for visualization of the size of fragments through the separation of DNA and RNA fragments. This is achieved through the use of the charge in the particles. As the fragments separate, they form into distinct bands at set sizes. We describe the ability to visualize SSRs on slab gels of agarose and polyacrylamide gel electrophoresis.

  6. Genetic characterization of the gypsy moth from China (Lepidoptera, Lymantriidae using inter simple sequence repeats markers.

    Directory of Open Access Journals (Sweden)

    Fang Chen

    Full Text Available This study provides the first genetic characterization of the gypsy moth from China (Lymantriadispar, one of the most recognized pests of forests and ornamental trees in the world. We assessed genetic diversity and structure in eight geographic populations of gypsy moths from China using five polymorphic Inter simple sequence repeat markers, which produced reproducible banding patterns. We observed 102 polymorphic loci across the 176 individuals sampled. Overall genetic diversity (Nei's, H was 0.2357, while the mean genetic diversity within geographic populations was 0.1845 ± 0.0150. The observed genetic distance among the eight populations ranged from 0.0432 to 0.1034. Clustering analysis (using an unweighted pair-group method with arithmetic mean and multidimensional scaling, revealed strong concordance between the strength of genetic relationships among populations and their geographic proximity. Analysis of molecular variance demonstrated that 25.43% of the total variability (F ST = 0.2543, P < 0.001 was attributable to variation among geographic populations. The results of our analyses investigating the degree of polymorphism, genetic diversity (Nei's and Shannon and genetic structure, suggest that individuals from Hebei may be better able to adapt to different environments and to disperse to new habitats. This study provides crucial genetic information needed to assess the distribution and population dynamics of this important pest species of global concern.

  7. Genetic diversity analysis of Lepidium sativum (Chandrasur) using inter simple sequence repeat (ISSR) markers

    Institute of Scientific and Technical Information of China (English)

    Amandeep Kaur; Rakesh Kumar; Suman Rani; Anita Grewal

    2015-01-01

    Lepidium sativum (commonly known as garden cress) belongs to the family Brassicaceae. It is a fast-growing erect, annual herbaceous plant. Its seeds possess significant fracture healing, anti-asthmatic, anti-diabetic, hypoglycemic, nephrocurative and nephroprotective activ-ities. In the present study, we assessed the genetic diversity of various genotypes of L. sativum using inter-simple sequence repeat (ISSR) markers. Out of 41 ISSR primers screened, 32 primers showed significant, clear and repro-ducible bands. A total of 510 amplified bands were obtained using 32 ISSR primers, out of which 422 bands were poly-morphic and 88 bands were monomorphic. The percentage of polymorphism was found to be 82. A total of 35 unique alleles ranging insize from 200 to 2,900 bp were observed. Cluster analysis based on unweighted pair-group method, arithmetic mean divided the 18 genotypes into two main clusters, with the first having only HCS-08 genotype of L. sativum and other having all of the other 17 genotypes. The Jaccard similarity coefficient revealed a broad range 32–72%genetic relatedness among the 18 genotypes.

  8. Simple Sequence Repeat Genetic Linkage Maps of A-genome Diploid Cotton (Gossypium arboreum)

    Institute of Scientific and Technical Information of China (English)

    Xue-Xia Ma; Bao-Liang Zhou; Yan-Hui Lü; Wang-Zhen Guo; Tian-Zhen Zhang

    2008-01-01

    This study introduces the construction of the first intraspacific genetic linkage map of the A-genome diploid cotton with newly developed simple sequence repeat (SSR) markers using 189 F2 plants derived from the cross of two Asiatic parents were detected using 6 092 pairs of SSR primers. Two-hundred and sixty-eight pairs of SSR pdmers with better polymorphisms were picked out to analyze the F2 population. In total, 320 polymorphic bands were generated and used to construct a linkage map with JoinMap3.0. Two-hundred and sixty-seven loci, Including three phenotypic traits were mapped at a logarithms of odds ratio (LOD) ≥ 3.0 on 13 linkage groups. The total length of the map was 2 508.71 cM, and the average distance between adjacent markers was 9.40 cM. Chromosome assignments were according to the association of linkages with our backbone tetraploid specific map using the 89 similar SSR loci. Comparisons among the 13 suites of orthologous linkage groups revealed that the A-genome chromosomes are largely collinear with the At and Dt sub-genome chromosomes. Chromosomes associated with inversions suggested that allopolyploidization was accompanied by homologous chromosomal rearrangement. The inter-chromosomal duplicated loci supply molecular evidence that the A-genome diploid Asiatic cotton is paleopolyploid.

  9. Genetic characterization of autochthonous grapevine cultivars from Eastern Turkey by simple sequence repeats (SSRs

    Directory of Open Access Journals (Sweden)

    Sadiye Peral Eyduran

    2016-01-01

    Full Text Available In this research, two well-recognized standard grape cultivars, Cabernet Sauvignon and Merlot, together with eight historical autochthonous grapevine cultivars from Eastern Anatolia in Turkey, were genetically characterized by using 12 pairs of simple sequence repeat (SSR primers in order to evaluate their genetic diversity and relatedness. All of the used SSR primers produced successful amplifications and revealed DNA polymorphisms, which were subsequently utilized to evaluate the genetic relatedness of the grapevine cultivars. Allele richness was implied by the identification of 69 alleles in 8 autochthonous cultivars with a mean value of 5.75 alleles per locus. The average expected heterozygosity and observed heterozygosity were found to be 0.749 and 0.739, respectively. Taking into account the generated alleles, the highest number was recorded in VVC2C3 and VVS2 loci (nine and eight alleles per locus, respectively, whereas the lowest number was recorded in VrZAG83 (three alleles per locus. Two main clusters were produced by using the unweighted pair-group method with arithmetic mean dendrogram constructed on the basis of the SSR data. Only Cabernet Sauvignon and Merlot cultivars were included in the first cluster. The second cluster involved the rest of the autochthonous cultivars. The results obtained during the study illustrated clearly that SSR markers have verified to be an effective tool for fingerprinting grapevine cultivars and carrying out grapevine biodiversity studies. The obtained data are also meaningful references for grapevine domestication.

  10. Simple sequence repeats and compositional bias in the bipartite Ralstonia solanacearum GMI1000 genome

    Directory of Open Access Journals (Sweden)

    Vandamme Peter

    2003-03-01

    Full Text Available Abstract Background Ralstonia solanacearum is an important plant pathogen. The genome of R. solananearum GMI1000 is organised into two replicons (a 3.7-Mb chromosome and a 2.1-Mb megaplasmid and this bipartite genome structure is characteristic for most R. solanacearum strains. To determine whether the megaplasmid was acquired via recent horizontal gene transfer or is part of an ancestral single chromosome, we compared the abundance, distribution and compositon of simple sequence repeats (SSRs between both replicons and also compared the respective compositional biases. Results Our data show that both replicons are very similar in respect to distribution and composition of SSRs and presence of compositional biases. Minor variations in SSR and compositional biases observed may be attributable to minor differences in gene expression and regulation of gene expression or can be attributed to the small sample numbers observed. Conclusions The observed similarities indicate that both replicons have shared a similar evolutionary history and thus suggest that the megaplasmid was not recently acquired from other organisms by lateral gene transfer but is a part of an ancestral R. solanacearum chromosome.

  11. Simple Sequence Repeat Analysis of Selected NSIC-registered Coffee Varieties in the Philippines

    Directory of Open Access Journals (Sweden)

    Daisy May C. Santos

    2016-06-01

    Full Text Available Coffee (Coffea sp. is an important commercial crop worldwide. Three species of coffee are used as beverage, namely Coffea arabica, C. canephora, and C. liberica. Coffea arabica L. is the most cultivated among the three coffee species due to its taste quality, rich aroma, and low caffeine content. Despite its inferior taste and aroma, C. canephora Pierre ex A. Froehner, which has the highest caffeine content, is the second most widely cultivated because of its resistance to coffee diseases. On the other hand, C. liberica W.Bull ex Hierncomes is characterized by its very strong taste and flavor. The Philippines used to be a leading exporter of coffee until coffee rust destroyed the farms in Batangas, home of the famous Kapeng Barako. The country has been attempting to revive the coffee industry by focusing on the production of specialty coffee with registered varieties on the National Seed Industry Council (NSIC. Correct identification and isolation of pure coffee beans are the main factors that determine coffee’s market value. Local farms usually misidentify and mix coffee beans of different varieties, leading to the depreciation of their value. This study used simple sequence repeat (SSR markers to evaluate and distinguish Philippine NSIC-registered coffee species and varieties. The neighbor-joining tree generated using PAUP showed high bootstrap support, separating C. arabica, C. canephora, and C. liberica from each other. Among the twenty primer pairs used, seven were able to distinguish C. arabica, nine for C. liberica, and one for C. canephora.

  12. Revisiting the TALE repeat.

    Science.gov (United States)

    Deng, Dong; Yan, Chuangye; Wu, Jianping; Pan, Xiaojing; Yan, Nieng

    2014-04-01

    Transcription activator-like (TAL) effectors specifically bind to double stranded (ds) DNA through a central domain of tandem repeats. Each TAL effector (TALE) repeat comprises 33-35 amino acids and recognizes one specific DNA base through a highly variable residue at a fixed position in the repeat. Structural studies have revealed the molecular basis of DNA recognition by TALE repeats. Examination of the overall structure reveals that the basic building block of TALE protein, namely a helical hairpin, is one-helix shifted from the previously defined TALE motif. Here we wish to suggest a structure-based re-demarcation of the TALE repeat which starts with the residues that bind to the DNA backbone phosphate and concludes with the base-recognition hyper-variable residue. This new numbering system is consistent with the α-solenoid superfamily to which TALE belongs, and reflects the structural integrity of TAL effectors. In addition, it confers integral number of TALE repeats that matches the number of bound DNA bases. We then present fifteen crystal structures of engineered dHax3 variants in complex with target DNA molecules, which elucidate the structural basis for the recognition of bases adenine (A) and guanine (G) by reported or uncharacterized TALE codes. Finally, we analyzed the sequence-structure correlation of the amino acid residues within a TALE repeat. The structural analyses reported here may advance the mechanistic understanding of TALE proteins and facilitate the design of TALEN with improved affinity and specificity.

  13. Artificial leucine rich repeats as new scaffolds for protein design.

    Science.gov (United States)

    Baabur-Cohen, Hemda; Dayalan, Subashini; Shumacher, Inbal; Cohen-Luria, Rivka; Ashkenasy, Gonen

    2011-04-15

    The leucine rich repeat (LRR) motif that participates in many biomolecular recognition events in cells was suggested as a general scaffold for producing artificial receptors. We describe here the design and first total chemical synthesis of small LRR proteins, and their structural analysis. When evaluating the tertiary structure as a function of different number of repeating units (1-3), we were able to find that the 3-repeats sequence, containing 90 amino acids, folds into the expected structure.

  14. Sequences Characterization of Microsatellite DNA Sequences in Pacific Abalone (Haliotis discus hannat)

    Institute of Scientific and Technical Information of China (English)

    LI Qi; Kijima Akihiro

    2007-01-01

    The microsatellite-enriched library was constructed using magnetic bead hybridization selection method, and the microsatellite DNA sequences were analyzed in Pacific abalone Haliotis discus hannai. Three hundred and fifty white colonies were screened using PCR-based technique, and 84 clones were identified to potentially contain microsatellite repeat motif. The 84 clones were sequenced, and 42 microsatellites and 4 minisatellites with a minimum of five repeats were found (13.1% of white colonies screened). Besides the motif of CA contained in the oligoprobe, we also found other 16 types of microsatellite repeats including a dinucleotide repeat, two tetranucleotide repeats, twelve pentanucleotide repeats and a hexanucleotide repeat. According to Weber(1990), the microsatellite sequences obtained could be categorized structurally into perfect repeats (73.3%), imperfect repeats(13.3%), and compound repeats (13.4%). Among the microsatellite repeats, relatively short arrays (< 20 repeats) were most abundant,accounting for 75.0%. The largest length of microsatellites was 48 repeats, and the average number of repeats was 13.4. The data on the composition and length distribution of microsatellites obtained in the present study can be useful for choosing the repeat motifs for microsatetlite isolation in other abalone species.

  15. Triplet repeat sequences in human DNA can be detected by hybridization to a synthetic (5'-CGG-3')17 oligodeoxyribonucleotide

    DEFF Research Database (Denmark)

    Behn-Krappa, A; Mollenhauer, J; Doerfler, W

    1993-01-01

    The seemingly autonomous amplification of naturally occurring triplet repeat sequences in the human genome has been implicated in the causation of human genetic disease, such as the fragile X (Martin-Bell) syndrome, myotonic dystrophy (Curshmann-Steinert), spinal and bulbar muscular atrophy...

  16. Variability of United States isolates of Macrophomina phaseolina based on simple sequence repeats and cross genus transferability to related Botryosphaeraceae

    Science.gov (United States)

    Twelve simple sequence repeat (SSRs) loci were used to evaluate genetic diversity of 109 isolates of Macrophomina phaseolina collected from different geographical regions and host species throughout the United States (U.S.). Genetic diversity was assessed using Nei’s minimum genetic distance and th...

  17. Distribution and evolution of repeated sequences in genomes of Triatominae (Hemiptera-Reduviidae inferred from genomic in situ hybridization.

    Directory of Open Access Journals (Sweden)

    Sebastian Pita

    Full Text Available The subfamily Triatominae, vectors of Chagas disease, comprises 140 species characterized by a highly homogeneous chromosome number. We analyzed the chromosomal distribution and evolution of repeated sequences in Triatominae genomes by Genomic in situ Hybridization using Triatoma delpontei and Triatoma infestans genomic DNAs as probes. Hybridizations were performed on their own chromosomes and on nine species included in six genera from the two main tribes: Triatomini and Rhodniini. Genomic probes clearly generate two different hybridization patterns, dispersed or accumulated in specific regions or chromosomes. The three used probes generate the same hybridization pattern in each species. However, these patterns are species-specific. In closely related species, the probes strongly hybridized in the autosomal heterochromatic regions, resembling C-banding and DAPI patterns. However, in more distant species these co-localizations are not observed. The heterochromatic Y chromosome is constituted by highly repeated sequences, which is conserved among 10 species of Triatomini tribe suggesting be an ancestral character for this group. However, the Y chromosome in Rhodniini tribe is markedly different, supporting the early evolutionary dichotomy between both tribes. In some species, sex chromosomes and autosomes shared repeated sequences, suggesting meiotic chromatin exchanges among these heterologous chromosomes. Our GISH analyses enabled us to acquire not only reliable information about autosomal repeated sequences distribution but also an insight into sex chromosome evolution in Triatominae. Furthermore, the differentiation obtained by GISH might be a valuable marker to establish phylogenetic relationships and to test the controversial origin of the Triatominae subfamily.

  18. Effects of GABA[subscript A] Modulators on the Repeated Acquisition of Response Sequences in Squirrel Monkeys

    Science.gov (United States)

    Campbell, Una C.; Winsauer, Peter J.; Stevenson, Michael W.; Moerschbaecher, Joseph M.

    2004-01-01

    The present study investigated the effects of positive and negative GABA[subscript A] modulators under three different baselines of repeated acquisition in squirrel monkeys in which the monkeys acquired a three-response sequence on three keys under a second-order fixed-ratio (FR) schedule of food reinforcement. In two of these baselines, the…

  19. The mammalian Rab family of small GTPases: definition of family and subfamily sequence motifs suggests a mechanism for functional specificity in the Ras superfamily.

    Science.gov (United States)

    Pereira-Leal, J B; Seabra, M C

    2000-08-25

    The Rab/Ypt/Sec4 family forms the largest branch of the Ras superfamily of GTPases, acting as essential regulators of vesicular transport pathways. We used the large amount of information in the databases to analyse the mammalian Rab family. We defined Rab-conserved sequences that we designate Rab family (RabF) motifs using the conserved PM and G motifs as "landmarks". The Rab-specific regions were used to identify new Rab proteins in the databases and suggest rules for nomenclature. Surprisingly, we find that RabF regions cluster in and around switch I and switch II regions, i.e. the regions that change conformation upon GDP or GTP binding. This finding suggests that specificity of Rab-effector interaction cannot be conferred solely through the switch regions as is usually inferred. Instead, we propose a model whereby an effector binds to RabF (switch) regions to discriminate between nucleotide-bound states and simultaneously to other regions that confer specificity to the interaction, possibly Rab subfamily (RabSF) specific regions that we also define here. We discuss structural and functional data that support this model and its general applicability to the Ras superfamily of proteins.

  20. Diversity, population structure, and evolution of local peach cultivars in China identified by simple sequence repeats.

    Science.gov (United States)

    Shen, Z J; Ma, R J; Cai, Z X; Yu, M L; Zhang, Z

    2015-01-15

    The fruit peach originated in China and has a history of domestication of more than 4000 years. Numerous local cultivars were selected during the long course of cultivation, and a great morphological diversity exists. To study the diversity and genetic background of local peach cultivars in China, a set of 158 accessions from different ecological regions, together with 27 modern varieties and 10 wild accessions, were evaluated using 49 simple sequence repeats (SSRs) covering the peach genome. Broad diversity was also observed in local cultivars at the SSR level. A total of 648 alleles were amplified with an average of 13.22 observed alleles per locus. The number of genotypes detected ranged from 9 (UDP96015) to 58 (BPPCT008) with an average of 27.00 genotypes per marker. Eight subpopulations divided by STRUCTURE basically coincided with the dendrogram of genetic relationships and could be explained by the traditional groups. The 8 subpopulations were juicy honey peach, southwestern peach I, wild peach, Buddha peach + southwestern peach II, northern peach, southern crisp peach, ornamental peach, and Prunus davidiana + P. kansuensis. Most modern varieties carried the genetic backgrounds of juicy honey peach and southwestern peach I, while others carried diverse genetic backgrounds, indicating that local cultivars were partly used in modern breeding programs. Based on the traditional evolution pathway, a modified pathway for the development of local peach cultivars in China was proposed using the genetic background of subpopulations that were identified by SSRs. Current status and prospects of utilization of Chinese local peach cultivars were also discussed according to the SSR information.

  1. Genetic Diversity and Structure of Lolium Species Surveyed on Nuclear Simple Sequence Repeat and Cytoplasmic Markers

    Directory of Open Access Journals (Sweden)

    Hongwei Cai

    2017-04-01

    Full Text Available To assess the genetic diversity and population structure of Lolium species, we used 32 nuclear simple sequence repeat (SSR markers and 7 cytoplasmic gene markers to analyze a total of 357 individuals from 162 accessions of 9 Lolium species. This survey revealed a high level of polymorphism, with an average number of alleles per locus of 23.59 and 5.29 and an average PIC-value of 0.83 and 0.54 for nuclear SSR markers and cytoplasmic gene markers, respectively. Analysis of molecular variance (AMOVA revealed that 16.27 and 16.53% of the total variation was due to differences among species, with the remaining 56.35 and 83.47% due to differences within species and 27.39 and 0% due to differences within individuals in 32 nuclear SSR markers set and 6 chloroplast gene markers set, respectively. The 32 nuclear SSR markers detected three subpopulations among 357 individuals, whereas the 6 chloroplast gene markers revealed three subpopulations among 160 accessions in the STRUCTURE analysis. In the clustering analysis, the three inbred species clustered into a single group, whereas the outbreeding species were clearly divided, especially according to nuclear SSR markers. In addition, almost all Lolium multiflorum populations were clustered into group C4, which could be further divided into three subgroups, whereas Lolium perenne populations primarily clustered into two groups (C2 and C3, with a few lines that instead grouped with L. multiflorum (C4 or Lolium rigidum (C6. Together, these results will useful for the use of Lolium germplasm for improvement and increase the effectiveness of ryegrass breeding.

  2. Genetic Diversity of Landraces in Gossypium arboreum L. Race sinense Assessed with Simple Sequence Repeat Markers

    Institute of Scientific and Technical Information of China (English)

    Wang-Zhen Guo; Bao-Liang Zhou; Lu-Ming Yang; Wei Wang; Tian-Zhen Zhang

    2006-01-01

    Asiatic cotton (Gossypium arboreum L.) is an "Old World" cultivated cotton species, the sinense race of which is planted extensively in China. This species is still used in the current tetraploid cotton breeding program as an elite germplasm line, and is also used as a model for genomic research in Gossypium. In the present study, 60 cotton microsatellite markers, averaging 4.6 markers for each A-genome chromosome,were chosen to assess the genetic diversity of 109 accessions. These included 106 G. arboreum landraces,collected from 18 provinces throughout four Asiatic cotton-growing regions in China. A total of 128 alleles were detected, with an average of 2.13 alleles per locus. The largest number of alleles, as well as the maximum number of polymorphic loci, was detected in the A03 linkage group. No polymorphic alleles were detected on chromosome 10. The polymorphism information content for the 22 polymorphic microsatellite loci varied from 0.52 to 0.98, with an average of 0.89. Genetic diversity analysis revealed that the landraces in the Southern region had more genetic variability than those from the other two regions, and no significant difference was detected between landraces in the Yangtze and the Yellow River Valley regions. These findings are consistent with the history of sinense introduction, with the Southern region being the presumed center of origin for Chinese Asiatic cotton, and with subsequent northeastward extension to the Yangtze and Yellow River Valleys. Cluster analysis, based on simple sequence repeat data for 60 microsatellite loci, clearly differentiated Vietnamese and G. herbaceum landraces from the sinense landrace. No relationship between inter-variety similarity and geographical ecological region was observed. The present findings indicate that the Southern region landraces may have been directly introduced into the provinces in the middle and lower Yangtze River Valley, where Asiatic cotton was most extensively grown, and further race

  3. Simple sequence repeat marker associated with a natural leaf defoliation trait in tetraploid cotton.

    Science.gov (United States)

    Abdurakhmonov, I Y; Abdullaev, A A; Saha, S; Buriev, Z T; Arslanov, D; Kuryazov, Z; Mavlonov, G T; Rizaeva, S M; Reddy, U K; Jenkins, J N; Abdullaev, A; Abdukarimov, A

    2005-01-01

    Cotton (Gossypium hirsutum L.) leaf defoliation has a significant ecological and economical impact on cotton production. Thus the utilization of a natural leaf defoliation trait, which exists in wild diploid cotton species, in the development of tetraploid cultivated cotton will not only be cost effective, but will also facilitate production of very high-grade fiber. The primary goal of our research was to tag loci associated with natural leaf defoliation using microsatellite markers in Upland cotton. The F2 populations developed from reciprocal crosses between the two parental cotton lines--AN-Boyovut-2 (2n = 52), a late leaf defoliating type, and Listopad Beliy (2n = 52), a naturally early leaf defoliating type--demonstrated that the naturally early leaf defoliation trait has heritability values of 0.74 and 0.84 in the reciprocal F2 population. The observed phenotypic segregation difference in reciprocal crosses suggested a minor cytoplasmic effect in the phenotypic expression of the naturally early leaf defoliation trait. Results from the Kruskal-Wallis (KW) nonparametric test revealed that JESPR-13 (KW = 6.17), JESPR-153 (KW = 9.97), and JESPR-178 (KW = 13.45) Simple sequence repeat (SSR) markers are significantly associated with natural leaf defoliation in the mapping population having stable estimates at empirically obtained critical thresholds (P < .05-.0001). JESPR-178 revealed the highest estimates (P < .0001) for association with the natural leaf defoliation trait, exceeding maximum empirical threshold values. JESPR-178 was assigned to the short arm of chromosome 18, suggesting indirectly that genes associated with natural leaf defoliation might be located on this chromosome. This microsatellite marker may have the potential for use to introgress the naturally early leaf defoliation quantitative trait loci (QTL) from the donor line Listopad Beliy to commercial varieties of cotton through marker-assisted selection programs.

  4. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    . Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif...... viewer, that allows the display of the likely binding motif for all human class I proteins of the loci HLA A, B, C, and E and for MHC class I molecules from chimpanzee (Pan troglodytes), rhesus monkey (Macaca mulatta), and mouse (Mus musculus). Furthermore, it covers all HLA-DR protein sequences...

  5. Development of novel simple sequence repeat markers in bitter gourd (Momordica charantia L.) through enriched genomic libraries and their utilization in analysis of genetic diversity and cross-species transferability.

    Science.gov (United States)

    Saxena, Swati; Singh, Archana; Archak, Sunil; Behera, Tushar K; John, Joseph K; Meshram, Sudhir U; Gaikwad, Ambika B

    2015-01-01

    Microsatellite or simple sequence repeat (SSR) markers are the preferred markers for genetic analyses of crop plants. The availability of a limited number of such markers in bitter gourd (Momordica charantia L.) necessitates the development and characterization of more SSR markers. These were developed from genomic libraries enriched for three dinucleotide, five trinucleotide, and two tetranucleotide core repeat motifs. Employing the strategy of polymerase chain reaction-based screening, the number of clones to be sequenced was reduced by 81 % and 93.7 % of the sequenced clones contained in microsatellite repeats. Unique primer-pairs were designed for 160 microsatellite loci, and amplicons of expected length were obtained for 151 loci (94.4 %). Evaluation of diversity in 54 bitter gourd accessions at 51 loci indicated that 20 % of the loci were polymorphic with the polymorphic information content values ranging from 0.13 to 0.77. Fifteen Indian varieties were clearly distinguished indicative of the usefulness of the developed markers. Markers at 40 loci (78.4 %) were transferable to six species, viz. Momordica cymbalaria, Momordica subangulata subsp. renigera, Momordica balsamina, Momordica dioca, Momordica cochinchinesis, and Momordica sahyadrica. The microsatellite markers reported will be useful in various genetic and molecular genetic studies in bitter gourd, a cucurbit of immense nutritive, medicinal, and economic importance.

  6. A common sequence motif determines the Cajal body-specific localization of box H/ACA scaRNAs.

    Science.gov (United States)

    Richard, Patricia; Darzacq, Xavier; Bertrand, Edouard; Jády, Beáta E; Verheggen, Céline; Kiss, Tamás

    2003-08-15

    Post-transcriptional synthesis of 2'-O-methylated nucleotides and pseudouridines in Sm spliceosomal small nuclear RNAs takes place in the nucleoplasmic Cajal bodies and it is directed by guide RNAs (scaRNAs) that are structurally and functionally indistinguishable from small nucleolar RNAs (snoRNAs) directing rRNA modification in the nucleolus. The scaRNAs are synthesized in the nucleoplasm and specifically targeted to Cajal bodies. Here, mutational analysis of the human U85 box C/D-H/ACA scaRNA, followed by in situ localization, demonstrates that box H/ACA scaRNAs share a common Cajal body-specific localization signal, the CAB box. Two copies of the evolutionarily conserved CAB consensus (UGAG) are located in the terminal loops of the 5' and 3' hairpins of the box H/ACA domains of mammalian, Drosophila and plant scaRNAs. Upon alteration of the CAB boxes, mutant scaRNAs accumulate in the nucleolus. In turn, authentic snoRNAs can be targeted into Cajal bodies by addition of exogenous CAB box motifs. Our results indicate that scaRNAs represent an ancient group of small nuclear RNAs which are localized to Cajal bodies by an evolutionarily conserved mechanism.

  7. Identification of the porcine homologous of human disease causing trinucleotide repeat sequences

    DEFF Research Database (Denmark)

    Madsen, Lone Bruhn; Thomsen, Bo; Sølvsten, Christina Ane Elisabeth

    2007-01-01

    expansion in the repeat number of intragenic trinucleotide repeats (TNRs) is associated with a variety of inherited human neurodegenerative diseases. To study the compositionof TNRs in a mammalian species representing an evolutionary intermediate between humans and arodents, we describe in this p...

  8. N-terminal Ile-Orn- and Trp-Orn-motif repeats enhance membrane interaction and increase the antimicrobial activity of apidaecins against Pseudomonas aeruginosa

    Directory of Open Access Journals (Sweden)

    Martina E. C. Bluhm

    2016-05-01

    Full Text Available The Gram-negative bacterium Pseudomonas aeruginosa is a life-threatening nosocomial pathogen due to its generally low susceptibility towards antibiotics. Furthermore, many strains have acquired resistance mechanisms requiring new antimicrobials with novel mechanisms to enhance treatment options. Proline-rich antimicrobial peptides, such as the apidaecin analog Api137, are highly efficient against various Enterobacteriaceae infections in mice, but less active against P. aeruginosa in vitro. Here, we extended our recent work by optimizing lead peptides Api755 (gu-OIORPVYOPRPRPPHPRL-OH; gu = N,N,N’,N’-tetramethylguanidino, O = L-ornithine and Api760 (gu-OWORPVYOPRPRPPHPRL-OH by incorporation of Ile-Orn- and Trp-Orn-motifs, respectively. Api795 (gu-O(IO2RPVYOPRPRPPHPRL-OH and Api794 (gu O(WO3RPVYOPRPRPPHPRL-OHwere highly active against P. aeruginosa with minimal inhibitory concentrations of 8-16 µg/mL and 8-32 µg/mL against E. coli and K. pneumoniae. Assessed using a quartz crystal microbalance, these peptides inserted into a membrane layer and the surface activity increased gradually from Api137, over Api795, to Api794. This mode of action was confirmed by transmission electron microscopy indicating some membrane damage only at the high peptide concentrations. Api794 and Api795 were highly stable against serum proteases (half-life times > 5 h and non-hemolytic to human erythrocytes at peptide concentrations of 0.6 g/L. At this concentration, Api795 reduced the cell viability of HeLa cells only slightly, whereas the IC50 of Api794 was 0.23 ± 0.09 g/L. Confocal fluorescence microscopy revealed no colocalization of 5(6-carboxyfluorescein-labeled Api794 or Api795 with the mitochondria, excluding interactions with the mitochondrial membrane. Interestingly, Api795 was localized in endosomes, whereas Api794 was present in endosomes and the cytosol. This was verified using flow cytometry showing a 50 % higher uptake of Api794 in HeLa cells compared

  9. Use of Limited Proteolysis and Mutagenesis To Identify Folding Domains and Sequence Motifs Critical for Wax Ester Synthase/Acyl Coenzyme A:Diacylglycerol Acyltransferase Activity

    Science.gov (United States)

    Villa, Juan A.; Cabezas, Matilde; de la Cruz, Fernando

    2014-01-01

    Triacylglycerols and wax esters are synthesized as energy storage molecules by some proteobacteria and actinobacteria under stress. The enzyme responsible for neutral lipid accumulation is the bifunctional wax ester synthase/acyl-coenzyme A (CoA):diacylglycerol acyltransferase (WS/DGAT). Structural modeling of WS/DGAT suggests that it can adopt an acyl-CoA-dependent acyltransferase fold with the N-terminal and C-terminal domains connected by a helical linker, an architecture demonstrated experimentally by limited proteolysis. Moreover, we found that both domains form an active complex when coexpressed as independent polypeptides. The structural prediction and sequence alignment of different WS/DGAT proteins indicated catalytically important motifs in the enzyme. Their role was probed by measuring the activities of a series of alanine scanning mutants. Our study underscores the structural understanding of this protein family and paves the way for their modification to improve the production of neutral lipids. PMID:24296496

  10. Limb body wall complex, amniotic band sequence, or new syndrome caused by mutation in IQ Motif containing K (IQCK)?

    Science.gov (United States)

    Kruszka, Paul; Uwineza, Annette; Mutesa, Leon; Martinez, Ariel F; Abe, Yu; Zackai, Elaine H; Ganetzky, Rebecca; Chung, Brian; Stevenson, Roger E; Adelstein, Robert S; Ma, Xuefei; Mullikin, James C; Hong, Sung-Kook; Muenke, Maximilian

    2015-01-01

    Limb body wall complex (LBWC) and amniotic band sequence (ABS) are multiple congenital anomaly conditions with craniofacial, limb, and ventral wall defects. LBWC and ABS are considered separate entities by some, and a continuum of severity of the same condition by others. The etiology of LBWC/ABS remains unknown and multiple hypotheses have been proposed. One individual with features of LBWC and his unaffected parents were whole exome sequenced and Sanger sequenced as confirmation of the mutation. Functional studies were conducted using morpholino knockdown studies followed by human mRNA rescue experiments. Using whole exome sequencing, a de novo heterozygous mutation was found in the gene IQCK: c.667C>G; p.Q223E and confirmed by Sanger sequencing in an individual with LBWC. Morpholino knockdown of iqck mRNA in the zebrafish showed ventral defects including failure of ventral fin to develop and cardiac edema. Human wild-type IQCK mRNA rescued the zebrafish phenotype, whereas human p.Q223E IQCK mRNA did not, but worsened the phenotype of the morpholino knockdown zebrafish. This study supports a genetic etiology for LBWC/ABS, or potentially a new syndrome. PMID:26436108

  11. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps.

    Science.gov (United States)

    Drory Retwitzer, Matan; Polishchuk, Maya; Churkin, Elena; Kifer, Ilona; Yakhini, Zohar; Barash, Danny

    2015-07-01

    Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular-no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site.

  12. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins.

    Science.gov (United States)

    Foulk, Michael S; Urban, John M; Casella, Cinzia; Gerbi, Susan A

    2015-05-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na(+) instead of K(+) in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq.

  13. Structure and organization of the mitochondrial DNA control region with tandemly repeated sequence in the Amazon ornamental fish.

    Science.gov (United States)

    Terencio, Maria Leandra; Schneider, Carlos Henrique; Gross, Maria Claudia; Feldberg, Eliana; Porto, Jorge Ivan Rebelo

    2013-02-01

    Tandemly repeated sequences are a common feature of vertebrate mitochondrial DNA control regions. However, questions still remain about their mode of evolution and function. To better understand patterns of variation in length and to explore the existence of previously described domain, we have characterized the control region structure of the Amazonian ornamental fish Nannostomus eques and Nannostomus unifasciatus. The control region ranged from 1121 to 1142 bp in length and could be separated into three domains: the domain associated with the extended terminal associated sequences, the central conserved domain, and the conserved sequence blocks domain. In the first domain, we encountered a sequence repeated 10 times in tandem (variable number tandem repeat (VNTR)) that could adopt an "inverted repetitions" type structural conformation. The results suggest that the VNTR pattern encountered in both N. eques and N. unifasciatus is consistent with the prerequisites of the illegitimate elongation model in which the unequal pairing of the chains near the 5'-end of the control region favors the formation of repetitions.

  14. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage...

  15. Exploiting the peptidoglycan-binding motif, LysM, for medical and industrial applications

    NARCIS (Netherlands)

    Visweswaran, Ganesh Ram R.; Leenhouts, Kees; van Roosmalen, Maarten; Kok, Jan; Buist, Girbe

    The lysin motif (LysM) was first identified by Garvey et al. in 1986 and, in subsequent studies, has been shown to bind noncovalently to peptidoglycan and chitin by interacting with N-acetylglucosamine moieties. The LysM sequence is present singly or repeatedly in a large number of proteins of

  16. Exploiting the peptidoglycan-binding motif, LysM, for medical and industrial applications

    NARCIS (Netherlands)

    Visweswaran, Ganesh Ram R.; Leenhouts, Kees; van Roosmalen, Maarten; Kok, Jan; Buist, Girbe

    2014-01-01

    The lysin motif (LysM) was first identified by Garvey et al. in 1986 and, in subsequent studies, has been shown to bind noncovalently to peptidoglycan and chitin by interacting with N-acetylglucosamine moieties. The LysM sequence is present singly or repeatedly in a large number of proteins of proka

  17. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats

    OpenAIRE

    Vergnaud Gilles; Grissa Ibtissem; Pourcel Christine

    2007-01-01

    Abstract Background In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows t...

  18. Multilocus variable-number tandem-repeat analysis scheme for chlamydia felis genotyping: comparison with multilocus sequence typing.

    Science.gov (United States)

    Laroucau, Karine; Di Francesco, Antonietta; Vorimore, Fabien; Thierry, Simon; Pingret, Jean Luc; Bertin, Claire; Willems, Hermann; Bölske, Goran; Harley, Ross

    2012-06-01

    Chlamydia felis is an important ocular pathogen in cats worldwide. A multilocus variable-number tandem-repeat analysis (MLVA) system for the detection of tandem repeats across the whole genome of C. felis strain Fe/C-56 was developed. Nine selected genetic loci were tested by MLVA in 17 C. felis isolates, including the C. felis Baker vaccine strain, and 122 clinical samples from different geographic origins. Analysis of the results identified 25 distinct C. felis MLVA patterns. In parallel, a recently described multilocus sequence typing scheme for the typing of Chlamydia was applied to 13 clinical samples with 12 different C. felis MLVA patterns. Rare sequence differences were observed. Thus, the newly developed MLVA system provides a highly sensitive high-resolution test for the differentiation of C. felis isolates from different origins that is suitable for molecular epidemiological studies.

  19. Minimum length of direct repeat sequences required for efficient homologous recombination induced by zinc finger nuclease in yeast.

    Science.gov (United States)

    Ren, ChongHua; Yan, Qiang; Zhang, ZhiYing

    2014-10-01

    Zinc finger nuclease (ZFN) technology is a powerful molecular tool for targeted genome modifications and genetic engineering. However, screening for specific ZFs and validation of ZFN activity are labor intensive and time consuming. We previously designed a yeast-based ZFN screening and validation system by inserting a ZFN binding site flanked by a 164 bp direct repeat sequence into the middle of a Gal4 transcription factor, disrupting the open reading frame of the yeast Gal4 gene. Expression of the ZFN causes a double stranded break at its binding site, which promotes the cellular DNA repair system to restore expression of a functional Gal transcriptional factor via homologous recombination. Expression of Gal4 transcription factor leads to activation of three reporter genes in an AH109 yeast two-hybrid strain. However, the 164 bp direct repeat appears to generate spontaneous homologous recombination frequently, resulting in many false positive ZFNs. To overcome this, a series of DNA fragments of various lengths from 10 to 150 bp with 10 bp increase each and 164 bp direct repeats flanking the ZFN binding site were designed and constructed. The results demonstrated that the minimum length required for ZFN-induced homologous recombination was 30 bp, which almost eliminated spontaneous recombination. Using the 30 bp direct repeat sequence, ZFN could efficiently induce homologous recombination, while false positive ZFNs resulting from spontaneous homologous recombination were minimized. Thus, this study provided a simple, fast and sensitive ZFN screening and activity validation system in yeast.

  20. Comparative population genetic analysis of bocaccio rockfish Sebastes paucispinis using anonymous and gene-associated simple sequence repeat loci.

    Science.gov (United States)

    Buonaccorsi, Vincent P; Kimbrell, Carol A; Lynn, Eric A; Hyde, John R

    2012-01-01

    Comparative population genetic analyses of traditional and emergent molecular markers aid in determining appropriate use of new technologies. The bocaccio rockfish Sebastes paucispinis is a high gene-flow marine species off the west coast of North America that experienced strong population decline over the past 3 decades. We used 18 anonymous and 13 gene-associated simple sequence repeat (SSR) loci (expressed sequence tag [EST]-SSRs) to characterize range-wide population structure with temporal replicates. No F(ST)-outliers were detected using the LOSITAN program, suggesting that neither balancing nor divergent selection affected the loci surveyed. Consistent hierarchical structuring of populations by geography or year class was not detected regardless of marker class. The EST-SSRs were less variable than the anonymous SSRs, but no correlation between F(ST) and variation or marker class was observed. General linear model analysis showed that low EST-SSR variation was attributable to low mean repeat number. Comparative genomic analysis with Gasterosteus aculeatus, Takifugu rubripes, and Oryzias latipes showed consistently lower repeat number in EST-SSRs than SSR loci that were not in ESTs. Purifying selection likely imposed functional constraints on EST-SSRs resulting in low repeat numbers that affected diversity estimates but did not affect the observed pattern of population structure.

  1. Role of the striatum, cerebellum and frontal lobes in the automatization of a repeated visuomotor sequence of movements.

    Science.gov (United States)

    Doyon, J; Laforce, R; Bouchard, G; Gaudreau, D; Roy, J; Poirier, M; Bédard, P J; Bédard, F; Bouchard, J P

    1998-07-01

    Recently, Doyon et al. [20] demonstrated that lesions to both the striatum and to the cerebellum in humans produce a similar deficit in the learning of a repeated visuomotor sequence, which occurs late in the acquisition process. We now report the results of two experiments that were designed to examine whether this impairment was due to a lack of automatization of the repeating sequence of finger movements by using a dual-task paradigm and by testing for long-term retention of this skill. In Experiment 1, the performance of groups of patients with Parkinson's disease, or with damage to the cerebellum or to the frontal lobes, was compared to that of matched control subjects on the Repeated Sequence Test (primary task) and the Brooks' Matrices Test (secondary task). These two tests were administered concomitantly in both early and late learning phases of the visuomotor sequence. Overall, the groups did not differ in their ability to execute the primary task. By contrast, in accordance with the predictions, patients in Stages 2-3 of Parkinson's disease or with a cerebellar lesion failed to reveal the expected increase in performance on the secondary task seen with learning, suggesting that the latter groups of patients did not have access to the same level of residual cognitive resources to complete the matrices compared to controls. In Experiment 2, the same groups of patients and control subjects were retested again 10-18 months later. They were given four blocks of 100 trials each of the repeating sequence task, followed by a questionnaire and a self-generation task that measured their declarative knowledge of that sequence. The results revealed a long-term retention impairment only in patients who changed from Stage I to Stage II of the disease (suggesting further striatal degeneration) during the one-year interval, or who had a cerebellar lesion. By contrast, performance of the three clinical groups did not differ from controls on declarative memory tests. These

  2. Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae).

    Science.gov (United States)

    Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas

    2013-07-01

    The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100-500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S-5·8S-25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species.

  3. New type of starch-binding domain: the direct repeat motif in the C-terminal region of Bacillus sp. no. 195 alpha-amylase contributes to starch binding and raw starch degrading.

    Science.gov (United States)

    Sumitani, J; Tottori, T; Kawaguchi, T; Arai, M

    2000-09-01

    The alpha-amylase from Bacillus sp. no. 195 (BAA) consists of two domains: one is the catalytic domain similar to alpha-amylases from animals and Streptomyces in the N-terminal region; the other is the functionally unknown domain composed of an approx. 90-residue direct repeat in the C-terminal region. The gene coding for BAA was expressed in Streptomyces lividans TK24. Three active forms of the gene products were found. The pH and thermal profiles of BAAs, and their catalytic activities for p-nitrophenyl maltopentaoside and soluble starch, showed almost the same behaviours. The largest, 69 kDa, form (BAA-alpha) was of the same molecular mass as that of the mature protein estimated from the nucleotide sequence, and had raw-starch-binding and -degrading abilities. The second largest, 60 kDa, form (BAA-beta), whose molecular mass was the same as that of the natural enzyme from Bacillus sp. no. 195, was generated by proteolytic processing between the two repeat sequences in the C-terminal region, and had lower activities for raw starch binding and degrading than those of BAA-alpha. The smallest, 50 kDa, form (BAA-gamma) contained only the N-terminal catalytic domain as a result of removal of the C-terminal repeat sequence, which led to loss of binding and degradation of insoluble starches. Thus the starch adsorption capacity and raw-starch-degrading activity of BAAs depends on the existence of the repeat sequence in the C-terminal region. BAA-alpha was specifically adsorbed on starch or dextran (alpha-1,4 or alpha-1,6 glucan), and specifically desorbed with maltose or beta-cyclodextrin. These observations indicated that the repeat sequence of the enzyme was functional in the starch-binding domain (SBD). We propose the designation of the homologues to the SBD of glucoamylase from Aspergillus niger as family I SBDs, the homologues to that of glucoamylase from Rhizopus oryzae as family II, and the homologues of this repeat sequence of BAA as family III.

  4. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    OpenAIRE

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzym...

  5. Association of Arabidopsis type-II ROPs with the plasma membrane requires a conserved C-terminal sequence motif and a proximal polybasic domain.

    Science.gov (United States)

    Lavy, Meirav; Yalovsky, Shaul

    2006-06-01

    Plant ROPs (or RACs) are soluble Ras-related small GTPases that are attached to cell membranes by virtue of the post-translational lipid modifications of prenylation and S-acylation. ROPs (RACs) are subdivided into two major subgroups called type-I and type-II. Whereas type-I ROPs terminate with a conserved CaaL box and undergo prenylation, type-II ROPs undergo S-acylation on two or three C-terminal cysteines. In the present work we determined the sequence requirement for association of Arabidopsis type-II ROPs with the plasma membrane. We identified a conserved sequence motif, designated the GC-CG box, in which the modified cysteines are flanked by glycines. The GC-CG box cysteines are separated by five to six mostly non-polar residues. Deletion of this sequence or the introduction of mutations that change its nature disrupted the association of ROPs with the membrane. Mutations that changed the GC-CG box glycines to alanines also interfered with membrane association. Deletion of a polybasic domain proximal to the GC-CG box disrupted the plasma membrane association of AtROP10. A green fluorescent protein fusion protein containing the C-terminal 25 residues of AtROP10, including its polybasic domain and GC-CG box, was primarily associated with the plasma membrane but a similar fusion protein lacking the polybasic domain was exclusively localized in the soluble fraction. These data provide evidence for the minimal sequence required for plasma membrane association of type-II ROPs in Arabidopsis and other plant species.

  6. Use of short tandem repeat sequences to study Mycobacterium leprae in leprosy patients in Malawi and India.

    Directory of Open Access Journals (Sweden)

    Saroj K Young

    2008-04-01

    Full Text Available Inadequate understanding of the transmission of Mycobacterium leprae makes it difficult to predict the impact of leprosy control interventions. Genotypic tests that allow tracking of individual bacterial strains would strengthen epidemiological studies and contribute to our understanding of the disease.Genotyping assays based on variation in the copy number of short tandem repeat sequences were applied to biopsies collected in population-based epidemiological studies of leprosy in northern Malawi, and from members of multi-case households in Hyderabad, India. In the Malawi series, considerable genotypic variability was observed between patients, and also within patients, when isolates were collected at different times or from different tissues. Less within-patient variability was observed when isolates were collected from similar tissues at the same time. Less genotypic variability was noted amongst the closely related Indian patients than in the Malawi series.Lineages of M. leprae undergo changes in their pattern of short tandem repeat sequences over time. Genetic divergence is particularly likely between bacilli inhabiting different (e.g., skin and nerve tissues. Such variability makes short tandem repeat sequences unsuitable as a general tool for population-based strain typing of M. leprae, or for distinguishing relapse from reinfection. Careful use of these markers may provide insights into the development of disease within individuals and for tracking of short transmission chains.

  7. Investigation of the population structure of Legionella pneumophila by analysis of tandem repeat copy number and internal sequence variation.

    Science.gov (United States)

    Visca, Paolo; D'Arezzo, Silvia; Ramisse, Françoise; Gelfand, Yevgeniy; Benson, Gary; Vergnaud, Gilles; Fry, Norman K; Pourcel, Christine

    2011-09-01

    The population structure of the species Legionella pneumophila was investigated by multilocus variable number of tandem repeats (VNTR) analysis (MLVA) and sequencing of three VNTRs (Lpms01, Lpms04 and Lpms13) in selected strains. Of 150 isolates of diverse origins, 136 (86 %) were distributed into eight large MLVA clonal complexes (VACCs) and the rest were either unique or formed small clusters of up to two MLVA genotypes. In spite of the lower degree of genome-wide linkage disequilibrium of the MLVA loci compared with sequence-based typing, the clustering achieved by the two methods was highly congruent. The detailed analysis of VNTR Lpms04 alleles showed a very complex organization, with five different repeat unit lengths and a high level of internal variation. Within each MLVA-defined VACC, Lpms04 was endowed with a common recognizable pattern with some interesting exceptions. Evidence of recombination events was suggested by analysis of internal repeat variations at the two additional VNTR loci, Lpms01 and Lpms13. Sequence analysis of L. pneumophila VNTR locus Lpms04 alone provides a first-line assay for allocation of a new isolate within the L. pneumophila population structure and for epidemiological studies.

  8. Linkage of congenital isolated adrenocorticotropic hormone deficiency to the corticotropin releasing hormone locus using simple sequence repeat polymorphisms

    Energy Technology Data Exchange (ETDEWEB)

    Kyllo, J.H.; Collins, M.M.; Vetter, K.L. [Univ. of Iowa College of Medicine, Iowa City, IA (United States)] [and others

    1996-03-29

    Genetic screening techniques using simple sequence repeat polymorphisms were applied to investigate the molecular nature of congenital isolated adrenocorticotropic hormone (ACTH) deficiency. We hypothesize that this rare cause of hypocortisolism shared by a brother and sister with two unaffected sibs and unaffected parents is inherited as an autosomal recessive single gene mutation. Genes involved in the hypothalamic-pituitary axis controlling cortisol sufficiency were investigated for a causal role in this disorder. Southern blotting showed no detectable mutations of the gene encoding pro-opiomelanocortin (POMC), the ACTH precursor. Other candidate genes subsequently considered were those encoding neuroendocrine convertase-1, and neuroendocrine convertase-2 (NEC-1, NEC-2), and corticotropin releasing hormone (CRH). Tests for linkage were performed using polymorphic di- and tetranucleotide simple sequence repeat markers flanking the reported map locations for POMC, NEC-1, NEC-2, and CRH. The chromosomal haplotypes determined by the markers flanking the loci for POMC, NEC-1, and NEC-2 were not compatible with linkage. However, 22 individual markers defining the chromosomal haplotypes flanking CRH were compatible with linkage of the disorder to the immediate area of this gene of chromosome 8. Based on these data, we hypothesize that the ACTH deficiency in this family is due to an abnormality of CRH gene structure or expression. These results illustrate the useful application of high density genetic maps constructed with simple sequence repeat markers for inclusion/exclusion studies of candidate genes in even very small nuclear families segregating for unusual phenotypes. 25 refs., 5 figs., 2 tabs.

  9. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  10. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    Science.gov (United States)

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  11. Tracking of intercalary DNA sequences integrated into tandem repeat arrays in rye Secale vavilovii

    Directory of Open Access Journals (Sweden)

    Magdalena Achrem

    2017-06-01

    Full Text Available The structure of repetitive sequences of the JNK block present in the pericentromeric region of the 2RL chromosome was studied in Secale vavilovii. Amplification of sequences present between the JNK sequences led to the identification of seven abnormal DNA fragments. Two of these fragments showed high similarity to the glutamate 5-kinase gene and putative alcohol dehydrogenase gene of trypanosomatid from the genus Leishmania, whose presence can be explained by horizontal gene transfer (HGT. Other fragments were similar to mitochondrial gene for ribosomal protein S4 in plants and to the glycoprotein (G gene of the IHNV virus. Presumably, they are pseudogenes inserted into the JNK heterochromatin region. Within this region, also fragments similar to the rye repetitive sequence and chromosome 3B in wheat were found. There is no known mechanism that would explain how foreign sequences were inserted into the block region of tandem repetitive sequences of the JNK family.

  12. The SIDER2 elements, interspersed repeated sequences that populate the Leishmania genomes, constitute subfamilies showing chromosomal proximity relationship

    Directory of Open Access Journals (Sweden)

    Thomas M Carmen

    2008-06-01

    Full Text Available Abstract Background Protozoan parasites of the genus Leishmania are causative agents of a diverse spectrum of human diseases collectively known as leishmaniasis. These eukaryotic pathogens that diverged early from the main eukaryotic lineage possess a number of unusual genomic, molecular and biochemical features. The completion of the genome projects for three Leishmania species has generated invaluable information enabling a direct analysis of genome structure and organization. Results By using DNA macroarrays, made with Leishmania infantum genomic clones and hybridized with total DNA from the parasite, we identified a clone containing a repeated sequence. An analysis of the recently completed genome sequence of L. infantum, using this repeated sequence as bait, led to the identification of a new class of repeated elements that are interspersed along the different L. infantum chromosomes. These elements turned out to be homologues of SIDER2 sequences, which were recently identified in the Leishmania major genome; thus, we adopted this nomenclature for the Leishmania elements described herein. Since SIDER2 elements are very heterogeneous in sequence, their precise identification is rather laborious. We have characterized 54 LiSIDER2 elements in chromosome 32 and 27 ones in chromosome 20. The mean size for these elements is 550 bp and their sequence is G+C rich (mean value of 66.5%. On the basis of sequence similarity, these elements can be grouped in subfamilies that show a remarkable relationship of proximity, i.e. SIDER2s of a given subfamily locate close in a chromosomal region without intercalating elements. For comparative purposes, we have identified the SIDER2 elements existing in L. major and Leishmania braziliensis chromosomes 32. While SIDER2 elements are highly conserved both in number and location between L. infantum and L. major, no such conservation exists when comparing with SIDER2s in L. braziliensis chromosome 32. Conclusion

  13. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  14. Determination of 5 '-leader sequences from radically disparate strains of porcine reproductive and respiratory syndrome virus reveals the presence of highly conserved sequence motifs

    DEFF Research Database (Denmark)

    Oleksiewicz, M.B.; Bøtner, Anette; Nielsen, Jens;

    1999-01-01

    We determined the untranslated 5'-leader sequence for three different isolates of porcine reproductive and respiratory syndrome virus (PRRSV): pathogenic European- and American-types, as well as an American-type vaccine strain. 5'-leader from European- and American-type PRRSV differed in length...... a priori knowledge for mutational identification of virulence determinants in the 5' nontranslated part of the PRRSV genome....

  15. Cloning, Sequence Analysis and Expression Patterns during Seed Germination of a Rapeseed (Brassica napus L. G-x-S-x-G-motif Lipase Gene

    Directory of Open Access Journals (Sweden)

    Imen GLAIED GHRAM

    2016-12-01

    Full Text Available Lipases catalyze the hydrolysis of ester bonds in triacylglycerides, generating glycerol and free fatty acids. These enzymes are encoded by extremely complex gene families, and appear to fulfil many different biological functions. Although they are present in all types of organisms, available information on plant lipases is still very limited, as compared to their bacterial and animal counterparts. A full-length clone, BnLIP, encoding a putative lipase, has been isolated by PCR amplification of Brassica napus genomic DNA, with oligonucleotide primers derived from the sequence of an Arabidopsis thaliana homologue. The clone included an open reading frame of 1581 bp encoding a polypeptide of 526 amino acids, with a calculated molecular mass of 59.5 kDa. Analysis of the deduced protein sequence, sequence alignment with homologous proteins from related plant species, and a phylogenetic analysis revealed that the BnLIP protein belongs to the ‘classical’ GxSxG-motif lipase family. RT-PCR assays indicated that the BnLIP gene is expressed specifically, but only transiently, during seed germination: the lipase mRNA was not present at detectable levels in ungerminated seeds, was detected only three days after seed imbibition, but its levels decreased rapidly afterwards. No expression was observed in roots, stems or leaves of adult plants. This expression pattern suggests that BnLIP is one of the lipases involved in the hydrolysis of triacylglycerides stored in rapeseed seeds, ultimately providing nutrients and energy to sustain seedling growth until photosynthesis is activated.

  16. Discovery of Highly Divergent Repeat Landscapes in Snake Genomes Using High-Throughput Sequencing

    Science.gov (United States)

    Castoe, Todd A.; Hall, Kathryn T.; Guibotsy Mboulas, Marcel L.; Gu, Wanjun; de Koning, A.P. Jason; Fox, Samuel E.; Poole, Alexander W.; Vemulapalli, Vijetha; Daza, Juan M.; Mockler, Todd; Smith, Eric N.; Feschotte, Cédric; Pollock, David D.

    2011-01-01

    We conducted a comprehensive assessment of genomic repeat content in two snake genomes, the venomous copperhead (Agkistrodon contortrix) and the Burmese python (Python molurus bivittatus). These two genomes are both relatively small (∼1.4 Gb) but have surprisingly extensive differences in the abundance and expansion histories of their repeat elements. In the python, the readily identifiable repeat element content is low (21%), similar to bird genomes, whereas that of the copperhead is higher (45%), similar to mammalian genomes. The copperhead's greater repeat content arises from the recent expansion of many different microsatellites and transposable element (TE) families, and the copperhead had 23-fold greater levels of TE-related transcripts than the python. This suggests the possibility that greater TE activity in the copperhead is ongoing. Expansion of CR1 LINEs in the copperhead genome has resulted in TE-mediated microsatellite expansion (“microsatellite seeding”) at a scale several orders of magnitude greater than previously observed in vertebrates. Snakes also appear to be prone to horizontal transfer of TEs, particularly in the copperhead lineage. The reason that the copperhead has such a small genome in the face of so much recent expansion of repeat elements remains an open question, although selective pressure related to extreme metabolic performance is an obvious candidate. TE activity can affect gene regulation as well as rates of recombination and gene duplication, and it is therefore possible that TE activity played a role in the evolution of major adaptations in snakes; some evidence suggests this may include the evolution of venom repertoires. PMID:21572095

  17. Mycobacterial PE_PGRS Proteins Contain Calcium-Binding Motifs with Parallel β-roll Folds

    Institute of Scientific and Technical Information of China (English)

    Nandita; Bachhawat; Balvinder; Singh

    2007-01-01

    The PE_PGRS family of proteins unique to mycobacteria is demonstrated to con- rain multiple calcium-binding and glycine-rich sequence motifs GGXGXD/NXUX. This sequence repeat constitutes a calcium-binding parallel/3-roll or parallel β-helix structure and is found in RTX toxins secreted by many Gram-negative bacteria. It is predicted that the highly homologous PE_PGRS proteins containing multiple copies of the nona-peptide motif could fold into similar calcium-binding structures. The implication of the predicted calcium-binding property of PE_PGRS proteins in the Ught of macrophage-pathogen interaction and pathogenesis is presented.

  18. The HIV-1 repeated sequence R as a robust hot-spot for copy-choice recombination

    Science.gov (United States)

    Moumen, Abdeladim; Polomack, Lucette; Roques, Bernard; Buc, Henri; Negroni, Matteo

    2001-01-01

    Template switching during reverse transcription is crucial for retroviral replication. While strand transfer on the terminal repeated sequence R is essential to achieve reverse transcription, template switching from internal regions of the genome (copy choice) leads to genetic recombination. We have developed an experimental system to study copy-choice recombination in vitro along the HIV-1 genome. We identify here several genomic regions, including the R sequence, where copy choice occurred at high rates. The frequency of copy choice occurring in a given region of template was strongly influenced by the surrounding sequences, an observation that suggests a pivotal role of the folding of template RNA in the process. The sequence R, instead, constituted an exception to this rule since it was a strong hot-spot for copy choice in the different sequence contexts tested. We suggest therefore that the structure of this region has been optimised during viral evolution to ensure efficient template switching independently from the sequences that might surround it. PMID:11557813

  19. Development of simple sequence repeats (SSR) markers of ramie and comparison of SSR and inter-SSR marker systems

    Institute of Scientific and Technical Information of China (English)

    ZHOU Jianlin; JIE Yucheng; JIANG Yanbo; ZHONG Yingli; LIU Yunhai; ZHANG Jian

    2005-01-01

    Ramie (Boehmeria nivea L. ) is an important bast fiber crop. To study genetic background of this species, we isolated and characterized microsatellite markers of ramie. A genomic library containing inserts of rapid amplification of polymorphic DNA (RAPD)fragments was constructed, and screened by PCR amplification using anchored simple sequence repeats as primers. A total of 26 clones were identified as positives, and 13 microsatellite loci were found after sequencing. The polymorphism of these 13 microsatellite loci was examined and the utility of simple sequence repeats (SSR) and inter-SSR (ISSR) marker systems for genetic characterization compared using 19 selected ramie cultivars. Both approaches successfully discriminated the 19 cultivars which differed in the amount of polymorphism detected. The level of polymorphism detected by SSR was 95.0 %, higher than that by ISSR (72.3 % ), but the average polymorphism information content (PIC) of ISSR (0. 651) was higher than that of SSR (0. 441). The higher PIC value of ISSR suggests that ISSR is more efficient for fingerprinting ramie cultivars than SSR markers. However, because the SSR loci are codominant, they are more suitable for determining the homozygosity levels of ramie, constructing linkage map, quantitative trait loci study of complex traits and marker-as-sisted selection.

  20. Germline mutations of STR-alleles include multi-step mutations as defined by sequencing of repeat and flanking regions.

    Science.gov (United States)

    Dauber, Eva-Maria; Kratzer, Adelgunde; Neuhuber, Franz; Parson, Walther; Klintschar, Michael; Bär, Walter; Mayr, Wolfgang R

    2012-05-01

    Well defined estimates of mutation rates are a prerequisite for the use of short tandem repeat (STR-) loci in relationship testing. We investigated 65 isolated genetic inconsistencies, which were observed within 50,796 allelic transfers at 23 STR-loci (ACTBP2 (SE33), CD4, CSF1PO, F13A1, F13B, FES, FGA, vWA, TH01, TPOX, D2S1338, D3S1358, D5S818, D7S820, D8S1132, D8S1179, D12S391, D13S317, D16S539, D17S976, D18S51, D19S433, D21S11) in Caucasoid families residing in Austria and Switzerland. Sequencing data of repeat and flanking regions and the median of all theoretically possible mutational steps showed valuable information to characterise the mutational events with regard to parental origin, change of repeat number (mutational step size) and direction of mutation (losses and gains of repeats). Apart from predominant single-step mutations including one case with a double genetic inconsistency, two double-step and two apparent four-step mutations could be identified. More losses than gains of repeats and more mutations originating from the paternal than the maternal lineage were observed (31 losses, 22 gains, 12 losses or gains and 47 paternal, 11 maternal mutations and 7 unclear of parental origin). The mutation in the paternal germline was 3.3 times higher than in the maternal germline. The results of our study show, that apart from the vast majority of single-step mutations rare multi-step mutations can be observed. Therefore, the interpretation of mutational events should not rigidly be restricted to the shortest possible mutational step, because rare but true multi-step mutations can easily be overlooked, if haplotype analysis is not possible.

  1. DNA polymorphism among Fusarium oxysporum f.sp. elaeidis populations from oil palm, using a repeated and dispersed sequence "Palm".

    Science.gov (United States)

    Mouyna, I; Renard, J L; Brygoo, Y

    1996-07-31

    A worldwide collection, of 76 F. oxysporum f.sp. elaeidis isolates (Foe), and of 21 F. oxysporum isolates from the soil of several palm grove was analysed by RFLP. As a probe, we used a random DNA fragment (probe 46) from a genomic library of a Foe isolate. This probe contains two different types of sequence, one being repeated and dispersed in the genome "Palm", the other being a single-copy sequence. All F. oxysporum isolates from the palm-grove soils were non-pathogenic to oil palm. They all had a simple restriction pattern with one band homologous to the single-copy sequence of probe 46. All Foe isolates were pathogenic to oil palm and they all had complex patterns due to hybridization with "Palm". This repetitive sequence reveals that Foe isolates are distinct from the other F. oxysporum palm-grove soils isolates. The sequence can reliably discriminate pathogenic from non-pathogenic oil palm isolates. Based on DNA fingerprint similarities, Foe populations were divided into ten groups consisting of isolates with the same geographic origin. Isolates from Brazil and Ecuador were an exception to that rule as they had the same restriction pattern as a few isolates from the Ivory Coast, suggesting they may originated from Africa.

  2. Rapid functional and sequence differentiation of a tandemly repeated species-specific multigene family in Drosophila

    DEFF Research Database (Denmark)

    Clifton, Bryan D.; Sanz, Pablo Librado; Yeh, Shu-Dan

    2017-01-01

    Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our understanding of how mutational mechanisms and evolutionary forces shape the structural and functional evolution of these clusters is hindered by the high sequence identity among the copies, which typical...

  3. Expressed Sequence Tags Analysis and Design of Simple Sequence Repeats Markers from a Full-Length cDNA Library in Perilla frutescens (L.

    Directory of Open Access Journals (Sweden)

    Eun Soo Seong

    2015-01-01

    Full Text Available Perilla frutescens is valuable as a medicinal plant as well as a natural medicine and functional food. However, comparative genomics analyses of P. frutescens are limited due to a lack of gene annotations and characterization. A full-length cDNA library from P. frutescens leaves was constructed to identify functional gene clusters and probable EST-SSR markers via analysis of 1,056 expressed sequence tags. Unigene assembly was performed using basic local alignment search tool (BLAST homology searches and annotated Gene Ontology (GO. A total of 18 simple sequence repeats (SSRs were designed as primer pairs. This study is the first to report comparative genomics and EST-SSR markers from P. frutescens will help gene discovery and provide an important source for functional genomics and molecular genetic research in this interesting medicinal plant.

  4. Race: A scalable and elastic parallel system for discovering repeats in very long sequences

    KAUST Repository

    Mansour, Essam

    2013-08-26

    A wide range of applications, including bioinformatics, time series, and log analysis, depend on the identification of repetitions in very long sequences. The problem of finding maximal pairs subsumes most important types of repetition-finding tasks. Existing solutions require both the input sequence and its index (typically an order of magnitude larger than the input) to fit in memory. Moreover, they are serial algorithms with long execution time. Therefore, they are limited to small datasets, despite the fact that modern applications demand orders of magnitude longer sequences. In this paper we present RACE, a parallel system for finding maximal pairs in very long sequences. RACE supports parallel execution on stand-alone multicore systems, in addition to scaling to thousands of nodes on clusters or supercomputers. RACE does not require the input or the index to fit in memory; therefore, it supports very long sequences with limited memory. Moreover, it uses a novel array representation that allows for cache-efficient implementation. RACE is particularly suitable for the cloud (e.g., Amazon EC2) because, based on availability, it can scale elastically to more or fewer machines during its execution. Since scaling out introduces overheads, mainly due to load imbalance, we propose a cost model to estimate the expected speedup, based on statistics gathered through sampling. The model allows the user to select the appropriate combination of cloud resources based on the provider\\'s prices and the required deadline. We conducted extensive experimental evaluation with large real datasets and large computing infrastructures. In contrast to existing methods, RACE can handle the entire human genome on a typical desktop computer with 16GB RAM. Moreover, for a problem that takes 10 hours of serial execution, RACE finishes in 28 seconds using 2,048 nodes on an IBM BlueGene/P supercomputer.

  5. Assessment of Genetic Diversities of Selected Laminaria (Laminariales,Phaeophyta) Gametophytes by Inter-Simple Sequence Repeat Analysis

    Institute of Scientific and Technical Information of China (English)

    Xiu-Liang WANG; Chen-Lin LIU; Xiao-Jie LI; Yi-Zhou CONG; De-Lin DUAN

    2005-01-01

    Inter-simple sequence repeat (ISSR) analysis was used to assess genetic diversity among 10pairs of male and female Laminaria gametophytes. A total of 58 amplification loci was obtained from 10selected ISSR primers, of which 34 revealed polymorphism among the gametophytes. Genetic distances were calculated with the Dice coefficient ranging from 0.006 to 0.223. A dendrogram based on the unweighted pair-group method arithmetic (UPGMA) average showed that most male and female gametophytes of the same species were clustered together and that 10 pairs of gametophytes were divided into four groups. This was generally consistent with the taxonomic categories. The main group consisted of six pairs of gametophytes, which were selected from Laminaria japonica Aresch. by intensive inbreeding through artificial hybridization. One specific marker was cloned, but was not converted successfully into a sequence characterized amplified region (SCAR) marker. Our results demonstrate the feasibility of applying ISSR markers to evaluate Laminaria germplasm diversities.

  6. Consistent levels of A-to-I RNA editing across individuals in coding sequences and non-conserved Alu repeats

    Directory of Open Access Journals (Sweden)

    Osenberg Sivan

    2010-10-01

    Full Text Available Abstract Background Adenosine to inosine (A-to-I RNA-editing is an essential post-transcriptional mechanism that occurs in numerous sites in the human transcriptome, mainly within Alu repeats. It has been shown to have consistent levels of editing across individuals in a few targets in the human brain and altered in several human pathologies. However, the variability across human individuals of editing levels in other tissues has not been studied so far. Results Here, we analyzed 32 skin samples, looking at A-to-I editing level in three genes within coding sequences and in the Alu repeats of six different genes. We observed highly consistent editing levels across different individuals as well as across tissues, not only in coding targets but, surprisingly, also in the non evolutionary conserved Alu repeats. Conclusions Our findings suggest that A-to-I RNA-editing of Alu elements is a tightly regulated process and, as such, might have been recruited in the course of primate evolution for post-transcriptional regulatory mechanisms.

  7. The NS1 polypeptide of the murine parvovirus minute virus of mice binds to DNA sequences containing the motif [ACCA]2-3.

    Science.gov (United States)

    Cotmore, S F; Christensen, J; Nüesch, J P; Tattersall, P

    1995-03-01

    A DNA fragment containing the minute virus of mice 3' replication origin was specifically coprecipitated in immune complexes containing the virally coded NS1, but not the NS2, polypeptide. Antibodies directed against the amino- or carboxy-terminal regions of NS1 precipitated the NS1-origin complexes, but antibodies directed against NS1 amino acids 284 to 459 blocked complex formation. Using affinity-purified histidine-tagged NS1 preparations, we have shown that the specific protein-DNA interaction is of moderate affinity, being stable in 0.1 M salt but rapidly lost at higher salt concentrations. In contrast, generalized (or nonspecific) DNA binding by NS1 could be demonstrated only in low salt. Addition of ATP or gamma S-ATP enhanced specific DNA binding by wild-type NS1 severalfold, but binding was lost under conditions which favored ATP hydrolysis. NS1 molecules with mutations in a critical lysine residue (amino acid 405) in the consensus ATP-binding site bound to the origin, but this binding could not be enhanced by ATP addition. DNase I protection assays carried out with wild-type NS1 in the presence of gamma S-ATP gave footprints which extended over 43 nucleotides on both DNA strands, from the middle of the origin bubble sequence to a position some 14 bp beyond the nick site. The DNA-binding site for NS1 was mapped to a 22-bp fragment from the middle of the 3' replication origin which contains the sequence ACCAACCA. This conforms to a reiterated motif (ACCA)2-3, which occurs, in more or less degenerate form, at many sites throughout the minute virus of mice genome (J. W. Bodner, Virus Genes 2:167-182, 1989). Insertion of a single copy of the sequence (ACCA)3 was shown to be sufficient to confer NS1 binding on an otherwise unrecognized plasmid fragment. The functions of NS1 in the viral life cycle are reevaluated in the light of this result.

  8. Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.

    Science.gov (United States)

    Tong, Hao; Schliekelman, Paul; Mrázek, Jan

    2017-01-05

    DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude

  9. Comparison of different sequencing and assembly strategies for a repeat-rich fungal genome, Ophiocordyceps sinensis.

    Science.gov (United States)

    Li, Yi; Hsiang, Tom; Yang, Rui-Heng; Hu, Xiao-Di; Wang, Ke; Wang, Wen-Jing; Wang, Xiao-Liang; Jiao, Lei; Yao, Yi-Jian

    2016-09-01

    Ophiocordyceps sinensis is one of the most expensive medicinal fungi world-wide, and has been used as a traditional Chinese medicine for centuries. In a recent report, the genome of this fungus was found to be expanded by extensive repetitive elements after assembly of Roche 454 (223Mb) and Illumina HiSeq (10.6Gb) sequencing data, producing a genome of 87.7Mb with an N50 scaffold length of 12kb and 6972 predicted genes. To test whether the assembly could be improved by deeper sequencing and to assess the amount of data needed for optimal assembly, genomic sequencing was run several times on genomic DNA extractions of a single ascospore isolate (strain 1229) on an Illumina HiSeq platform (25Gb total data). Assemblies were produced using different data types (raw vs. trimmed) and data amounts, and using three freely available assembly programs (ABySS, SOAP and Velvet). In nearly all cases, trimming the data for low quality base calls did not provide assemblies with higher N50 values compared to the non-trimmed data, and increasing the amount of input data (i.e. sequence reads) did not always lead to higher N50 values. Depending on the assembly program and data type, the maximal N50 was reached with between 50% to 90% of the total read data, equivalent to 100× to 200× coverage. The draft genome assembly was improved over the previously published version resulting in a 114Mb assembly, scaffold N50 of 70kb and 9610 predicted genes. Among the predicted genes, 9213 were validated by RNA-Seq analysis in this study, of which 8896 were found to be singletons. Evidence from genome and transcriptome analyses indicated that species assemblies could be improved with defined input material (e.g. haploid mono-ascospore isolate) without the requirement of multiple sequencing technologies, multiple library sizes or data trimming for low quality base calls, and with genome coverages between 100× and 200×.

  10. Target motifs affecting natural immunity by a constitutive CRISPR-Cas system in Escherichia coli.

    Directory of Open Access Journals (Sweden)

    Cristóbal Almendros

    Full Text Available Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR and CRISPR associated (cas genes conform the CRISPR-Cas systems of various bacteria and archaea and produce degradation of invading nucleic acids containing sequences (protospacers that are complementary to repeat intervening spacers. It has been demonstrated that the base sequence identity of a protospacer with the cognate spacer and the presence of a protospacer adjacent motif (PAM influence CRISPR-mediated interference efficiency. By using an original transformation assay with plasmids targeted by a resident spacer here we show that natural CRISPR-mediated immunity against invading DNA occurs in wild type Escherichia coli. Unexpectedly, the strongest activity is observed with protospacer adjoining nucleotides (interference motifs that differ from the PAM both in sequence and location. Hence, our results document for the first time native CRISPR activity in E. coli and demonstrate that positions next to the PAM in invading DNA influence their recognition and degradation by these prokaryotic immune systems.

  11. Transcriptome analysis reveals ginsenosides biosynthetic genes, microRNAs and simple sequence repeats in Panax ginseng C. A. Meyer

    Science.gov (United States)

    2013-01-01

    Background Panax ginseng C. A. Meyer is one of the most widely used medicinal plants. Complete genome information for this species remains unavailable due to its large genome size. At present, analysis of expressed sequence tags is still the most powerful tool for large-scale gene discovery. The global expressed sequence tags from P. ginseng tissues, especially those isolated from stems, leaves and flowers, are still limited, hindering in-depth study of P. ginseng. Results Two 454 pyrosequencing runs generated a total of 2,423,076 reads from P. ginseng roots, stems, leaves and flowers. The high-quality reads from each of the tissues were independently assembled into separate and shared contigs. In the separately assembled database, 45,849, 6,172, 4,041 and 3,273 unigenes were only found in the roots, stems, leaves and flowers database, respectively. In the jointly assembled database, 178,145 unigenes were observed, including 86,609 contigs and 91,536 singletons. Among the 178,145 unigenes, 105,522 were identified for the first time, of which 65.6% were identified in the stem, leaf or flower cDNA libraries of P. ginseng. After annotation, we discovered 223 unigenes involved in ginsenoside backbone biosynthesis. Additionally, a total of 326 potential cytochrome P450 and 129 potential UDP-glycosyltransferase sequences were predicted based on the annotation results, some of which may encode enzymes responsible for ginsenoside backbone modification. A BLAST search of the obtained high-quality reads identified 14 potential microRNAs in P. ginseng, which were estimated to target 100 protein-coding genes, including transcription factors, transporters and DNA binding proteins, among others. In addition, a total of 13,044 simple sequence repeats were identified from the 178,145 unigenes. Conclusions This study provides global expressed sequence tags for P. ginseng, which will contribute significantly to further genome-wide research and analyses in this species. The novel

  12. Several tetratricopeptide repeat (TPR) motifs of FANCG are required for assembly of the BRCA2/D1-D2-G-X3 complex, FANCD2 monoubiquitylation and phleomycin resistance.

    Science.gov (United States)

    Wilson, James B; Blom, Eric; Cunningham, Ryan; Xiao, Yuxuan; Kupfer, Gary M; Jones, Nigel J

    2010-07-01

    The Fanconi anaemia (FA) FANCG protein is an integral component of the FA nuclear core complex that is required for monoubiquitylation of FANCD2. FANCG is also part of another protein complex termed D1-D2-G-X3 that contains FANCD2 and the homologous recombination repair proteins BRCA2 (FANCD1) and XRCC3. Formation of the D1-D2-G-X3 complex is mediated by serine-7 phosphorylation of FANCG and occurs independently of the FA core complex and FANCD2 monoubiquitylation. FANCG contains seven tetratricopeptide repeat (TPR) motifs that mediate protein-protein interactions and here we show that mutation of several of the TPR motifs at a conserved consensus residue ablates the in vivo binding activity of FANCG. Expression of mutated TPR1, TPR2, TPR5 and TPR6 in Chinese hamster fancg mutant NM3 fails to functionally complement its hypersensitivities to mitomycin C (MMC) and phleomycin and fails to restore FANCD2 monoubiquitylation. Using co-immunoprecipitation analysis, we demonstrate that these TPR-mutated FANCG proteins fail to interact with BRCA2, XRCC3, FANCA or FANCF. The interactions of other proteins in the D1-D2-G-X3 complex are also absent, including the interaction of BRCA2 with both the monoubiquitylated (FANCD2-L) and non-ubiquitylated (FANCD2-S) isoforms of FANCD2. Interestingly, a mutation of TPR7 (R563E), that complements the MMC and phleomycin hypersensitivity of human FA-G EUFA316 cells, fails to complement NM3, despite the mutated FANCG protein co-precipitating with FANCA, BRCA2 and XRCC3. Whilst interaction of TPR7-mutated FANCG with FANCF does appear to be reduced in NM3, FANCD2 is monoubiquitylated suggesting that sub-optimal interactions of FANCG in the core complex and the D1-D2-G-X3 complex are responsible for the observed MMC- and phleomycin-hypersensitivity, rather than a defect in FANCD2 monoubiquitylation. Our data demonstrate that FANCG functions as a mediator of protein-protein interactions and is vital for the assembly of multi-protein complexes

  13. Several tetratricopeptide repeat (TPR) motifs of FANCG are required for assembly of the BRCA2/D1-D2-G-X3 complex, FANCD2 monoubiquitylation and phleomycin resistance

    Energy Technology Data Exchange (ETDEWEB)

    Wilson, James B. [Molecular Oncology and Stem Cell Research Group, School of Biological Sciences, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB (United Kingdom); Blom, Eric [Department of Clinical Genetics and Human Genetics, VU University Medical Center, Van der Boechorststraat 7, NL-1081 BT Amsterdam (Netherlands); Cunningham, Ryan; Xiao, Yuxuan [Molecular Oncology and Stem Cell Research Group, School of Biological Sciences, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB (United Kingdom); Kupfer, Gary M. [Departments of Pediatrics and Pathology, Yale University School of Medicine, Section of Hematology/Oncology, 333 Cedar Street, New Haven, CT 0652 (United States); Jones, Nigel J., E-mail: njjones@liv.ac.uk [Molecular Oncology and Stem Cell Research Group, School of Biological Sciences, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB (United Kingdom)

    2010-07-07

    The Fanconi anaemia (FA) FANCG protein is an integral component of the FA nuclear core complex that is required for monoubiquitylation of FANCD2. FANCG is also part of another protein complex termed D1-D2-G-X3 that contains FANCD2 and the homologous recombination repair proteins BRCA2 (FANCD1) and XRCC3. Formation of the D1-D2-G-X3 complex is mediated by serine-7 phosphorylation of FANCG and occurs independently of the FA core complex and FANCD2 monoubiquitylation. FANCG contains seven tetratricopeptide repeat (TPR) motifs that mediate protein-protein interactions and here we show that mutation of several of the TPR motifs at a conserved consensus residue ablates the in vivo binding activity of FANCG. Expression of mutated TPR1, TPR2, TPR5 and TPR6 in Chinese hamster fancg mutant NM3 fails to functionally complement its hypersensitivities to mitomycin C (MMC) and phleomycin and fails to restore FANCD2 monoubiquitylation. Using co-immunoprecipitation analysis, we demonstrate that these TPR-mutated FANCG proteins fail to interact with BRCA2, XRCC3, FANCA or FANCF. The interactions of other proteins in the D1-D2-G-X3 complex are also absent, including the interaction of BRCA2 with both the monoubiquitylated (FANCD2-L) and non-ubiquitylated (FANCD2-S) isoforms of FANCD2. Interestingly, a mutation of TPR7 (R563E), that complements the MMC and phleomycin hypersensitivity of human FA-G EUFA316 cells, fails to complement NM3, despite the mutated FANCG protein co-precipitating with FANCA, BRCA2 and XRCC3. Whilst interaction of TPR7-mutated FANCG with FANCF does appear to be reduced in NM3, FANCD2 is monoubiquitylated suggesting that sub-optimal interactions of FANCG in the core complex and the D1-D2-G-X3 complex are responsible for the observed MMC- and phleomycin-hypersensitivity, rather than a defect in FANCD2 monoubiquitylation. Our data demonstrate that FANCG functions as a mediator of protein-protein interactions and is vital for the assembly of multi-protein complexes

  14. A highly conserved repeated chromosomal sequence in the radioresistant bacterium Deinococcus radiodurans SARK.

    Science.gov (United States)

    Lennon, E; Gutman, P D; Yao, H L; Minton, K W

    1991-03-01

    A DNA fragment containing a portion of a DNA damage-inducible gene from Deinococcus radiodurans SARK hybridized to numerous fragments of SARK genomic DNA because of a highly conserved repetitive chromosomal element. The element is of variable length, ranging from 150 to 192 bp, depending on the absence or presence of one or two 21-bp sequences located internally. A putative translational start site of the damage-inducible gene is within the reiterated element. The element contains dyad symmetries that suggest modes of transcriptional and/or translational control.

  15. A Nonpolynomial Optimal Algorithm for Sequencing Inspectors in a Repeat Inspection System with Rework

    Directory of Open Access Journals (Sweden)

    Moon Hee Yang

    2015-01-01

    Full Text Available Assuming that two types of inspection errors are nonidentical and that only the items rejected by an inspector are reworked and sent to the next inspection cycle, we formulate a combinatorial optimization problem for simultaneously determining both the minimum frequency of inspection-rework cycles and the optimal sequence of inspectors selected from a set of available inspectors, in order to meet the constraints of the outgoing quality level. Based on the inherent properties from our mathematical model, we provide a nonpolynomial optimal algorithm with a time complexity of O(2m.

  16. MSDmotif: exploring protein sites and motifs

    Directory of Open Access Journals (Sweden)

    Henrick Kim

    2008-07-01

    Full Text Available Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.

  17. Efficient development of highly polymorphic microsatellite markers based on polymorphic repeats in transcriptome sequences of multiple individuals.

    Science.gov (United States)

    Vukosavljev, M; Esselink, G D; van 't Westende, W P C; Cox, P; Visser, R G F; Arens, P; Smulders, M J M

    2015-01-01

    The first hurdle in developing microsatellite markers, cloning, has been overcome by next-generation sequencing. The second hurdle is testing to differentiate polymorphic from nonpolymorphic loci. The third hurdle, somewhat hidden, is that only polymorphic markers with a large effective number of alleles are sufficiently informative to be deployed in multiple studies. Both steps are laborious and still performed manually. We have developed a strategy in which we first screen reads from multiple genotypes for repeats that show the most length variants, and only these are subsequently developed into markers. We validated our strategy in tetraploid garden rose using Illumina paired-end transcriptome sequences of 11 roses. Of 48 tested two markers failed to amplify, but all others were polymorphic. Ten loci amplified more than one locus, indicating duplicated genes or gene families. Completely avoiding duplicated loci will be difficult because the range of numbers of predicted alleles of highly polymorphic single- and multilocus markers largely overlapped. Of the remainder, half were replicate markers (i.e. multiple primer pairs for one locus), indicating the difficulty of correctly filtering short reads containing repeat sequences. We subsequently refined the approach to eliminate multiple primer sets to the same loci. The remaining 18 markers were all highly polymorphic, amplifying on average 11.7 alleles per marker (range = 6-20) in 11 tetraploid roses, exceeding the 8.2 alleles per marker of the 24 most polymorphic markers genotyped previously. This strategy therefore represents a major step forward in the development of highly polymorphic microsatellite markers.

  18. Regulation of the nucleosome repeat length in vivo by the DNA sequence, protein concentrations and long-range interactions.

    Directory of Open Access Journals (Sweden)

    Daria A Beshnova

    2014-07-01

    Full Text Available The nucleosome repeat length (NRL is an integral chromatin property important for its biological functions. Recent experiments revealed several conflicting trends of the NRL dependence on the concentrations of histones and other architectural chromatin proteins, both in vitro and in vivo, but a systematic theoretical description of NRL as a function of DNA sequence and epigenetic determinants is currently lacking. To address this problem, we have performed an integrative biophysical and bioinformatics analysis in species ranging from yeast to frog to mouse where NRL was studied as a function of various parameters. We show that in simple eukaryotes such as yeast, a lower limit for the NRL value exists, determined by internucleosome interactions and remodeler action. For higher eukaryotes, also the upper limit exists since NRL is an increasing but saturating function of the linker histone concentration. Counterintuitively, smaller H1 variants or non-histone architectural proteins can initiate larger effects on the NRL due to entropic reasons. Furthermore, we demonstrate that different regimes of the NRL dependence on histone concentrations exist depending on whether DNA sequence-specific effects dominate over boundary effects or vice versa. We consider several classes of genomic regions with apparently different regimes of the NRL variation. As one extreme, our analysis reveals that the period of oscillations of the nucleosome density around bound RNA polymerase coincides with the period of oscillations of positioning sites of the corresponding DNA sequence. At another extreme, we show that although mouse major satellite repeats intrinsically encode well-defined nucleosome preferences, they have no unique nucleosome arrangement and can undergo a switch between two distinct types of nucleosome positioning.

  19. Tandem repeat sequence variation and length heteroplasmy in the mitochondrial DNA D-loop of the threatened Gulf of Mexico sturgeon, Acipenser oxyrhynchus desotoi.

    Science.gov (United States)

    Miracle, A L; Campton, D E

    1995-01-01

    Genetic variability within the Suwannee River, Florida, population of Gulf of Mexico sturgeon, Acipenser oxyrhynchus desotoi, was assessed by examining sequence and length variation within the control region, or D-loop, of the mitochondrial genome. Although once abundant throughout the Gulf of Mexico, Gulf sturgeon are now listed as a threatened species by the U.S. Fish and Wildlife Service. Mitochondrial DNA was analyzed for length variation from 168 individual Gulf sturgeon by PCR amplification and visualization of PCR products using ethidium bromide-stained agarose gels. Of the 168 individual Gulf sturgeon, 31 (18.5%) were heteroplasmic for one to four copies of an 81-base pair, tandemly repeated sequence in the D-loop region. However, no individuals homoplasmic for multiple copies of the repeat sequence were observed. The existence and nature of these tandem repeats in heteroplasmic individuals was confirmed by direct sequencing of the PCR products for a subset of 22 individuals. The results are consistent with the apparent nature and mechanism of heteroplasmy observed in a congeneric species, A. transmontanus. In addition, sequences for 187 base pairs outside of the tandem repeats were identical among all 16 individuals assayed for this region. Lack of variable sequences is concordant with earlier studies involving mtDNA restriction fragment length profiles of Gulf sturgeon found in the Suwannee River. The absence of sequence variation exclusive of the tandem repeats is consistent with the hypothesis that the subspecies has undergone a population or evolutionary bottleneck.

  20. The TIS11 primary response gene is a member of a gene family that encodes proteins with a highly conserved sequence containing an unusual Cys-His repeat.

    OpenAIRE

    Varnum, B C; Ma, Q F; T. H. Chi; Fletcher, B.; Herschman, H.R.

    1991-01-01

    The TIS11 primary response gene is rapidly and transiently induced by both 12-O-tetradecanoylphorbol-13-acetate and growth factors. The predicted TIS11 protein contains a 6-amino-acid repeat, YKTELC. We cloned two additional cDNAs, TIS11b and TIS11d, that contain the YKTELC sequence. TIS11, TIS11b, and TIS11d proteins share a 67-amino-acid region of sequence similarity that includes the YKTELC repeat and two cysteine-histidine containing repeats. TIS11 gene family members are not coordinately...

  1. C-terminal sequences of hsp70 and hsp90 as non-specific anchors for tetratricopeptide repeat (TPR) proteins.

    Science.gov (United States)

    Ramsey, Andrew J; Russell, Lance C; Chinkers, Michael

    2009-10-12

    Steroid-hormone-receptor maturation is a multi-step process that involves several TPR (tetratricopeptide repeat) proteins that bind to the maturation complex via the C-termini of hsp70 (heat-shock protein 70) and hsp90 (heat-shock protein 90). We produced a random T7 peptide library to investigate the roles played by the C-termini of the two heat-shock proteins in the TPR-hsp interactions. Surprisingly, phages with the MEEVD sequence, found at the C-terminus of hsp90, were not recovered from our biopanning experiments. However, two groups of phages were isolated that bound relatively tightly to HsPP5 (Homo sapiens protein phosphatase 5) TPR. Multiple copies of phages with a C-terminal sequence of LFG were isolated. These phages bound specifically to the TPR domain of HsPP5, although mutation studies produced no evidence that they bound to the domain's hsp90-binding groove. However, the most abundant family obtained in the initial screen had an aspartate residue at the C-terminus. Two members of this family with a C-terminal sequence of VD appeared to bind with approximately the same affinity as the hsp90 C-12 control. A second generation pseudo-random phage library produced a large number of phages with an LD C-terminus. These sequences acted as hsp70 analogues and had relatively low affinities for hsp90-specific TPR domains. Unfortunately, we failed to identify residues near hsp90's C-terminus that impart binding specificity to individual hsp90-TPR interactions. The results suggest that the C-terminal sequences of hsp70 and hsp90 act primarily as non-specific anchors for TPR proteins.

  2. Repeat Finding Techniques, Data Structures and Algorithms in DNA sequences: A Survey

    Directory of Open Access Journals (Sweden)

    Freeson Kaniwa

    2015-09-01

    Full Text Available DNA sequencing technologies keep getting faster and cheaper leading to massive availability of entire human genomes. This massive availability calls for better analysis tools with a potential to realize a shift from reactive to predictive medicine. The challenge remains, since the entire human genomes need more space and processing power than that can be offered by a standard Desktop PC for their analysis. A background of key concepts surrounding the area of DNA analysis is given and a review of selected prominent algorithms used in this area. The significance of this paper would be to survey the concepts surrounding DNA analysis so as to provide a deep rooted understanding and knowledge transfer regarding existing approaches for DNA analysis using Burrows-Wheeler transform, Wavelet tree and their respective strengths and weaknesses. Consequent to this survey, the paper attempts to provide some directions for future research.

  3. MEME SUITE: tools for motif discovery and searching.

    Science.gov (United States)

    Bailey, Timothy L; Boden, Mikael; Buske, Fabian A; Frith, Martin; Grant, Charles E; Clementi, Luca; Ren, Jingyuan; Li, Wilfred W; Noble, William S

    2009-07-01

    The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms--MAST, FIMO and GLAM2SCAN--allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm TOMTOM. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and TOMTOM), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.

  4. Construction of an integrated high density simple sequence repeat linkage map in cultivated strawberry (Fragaria × ananassa) and its applicability.

    Science.gov (United States)

    Isobe, Sachiko N; Hirakawa, Hideki; Sato, Shusei; Maeda, Fumi; Ishikawa, Masami; Mori, Toshiki; Yamamoto, Yuko; Shirasawa, Kenta; Kimura, Mitsuhiro; Fukami, Masanobu; Hashizume, Fujio; Tsuji, Tomoko; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Tsuruoka, Hisano; Minami, Chiharu; Takahashi, Chika; Wada, Tsuyuko; Ono, Akiko; Kawashima, Kumiko; Nakazaki, Naomi; Kishida, Yoshie; Kohara, Mitsuyo; Nakayama, Shinobu; Yamada, Manabu; Fujishiro, Tsunakazu; Watanabe, Akiko; Tabata, Satoshi

    2013-02-01

    The cultivated strawberry (Fragaria × ananassa) is an octoploid (2n = 8x = 56) of the Rosaceae family whose genomic architecture is still controversial. Several recent studies support the AAA'A'BBB'B' model, but its complexity has hindered genetic and genomic analysis of this important crop. To overcome this difficulty and to assist genome-wide analysis of F. × ananassa, we constructed an integrated linkage map by organizing a total of 4474 of simple sequence repeat (SSR) markers collected from published Fragaria sequences, including 3746 SSR markers [Fragaria vesca expressed sequence tag (EST)-derived SSR markers] derived from F. vesca ESTs, 603 markers (F. × ananassa EST-derived SSR markers) from F. × ananassa ESTs, and 125 markers (F. × ananassa transcriptome-derived SSR markers) from F. × ananassa transcripts. Along with the previously published SSR markers, these markers were mapped onto five parent-specific linkage maps derived from three mapping populations, which were then assembled into an integrated linkage map. The constructed map consists of 1856 loci in 28 linkage groups (LGs) that total 2364.1 cM in length. Macrosynteny at the chromosome level was observed between the LGs of F. × ananassa and the genome of F. vesca. Variety distinction on 129 F. × ananassa lines was demonstrated using 45 selected SSR markers.

  5. Mutation of the aspartic acid residues of the GDD sequence motif of poliovirus RNA-dependent RNA polymerase results in enzymes with altered metal ion requirements for activity.

    Science.gov (United States)

    Jablonski, S A; Morrow, C D

    1995-01-01

    The poliovirus RNA-dependent RNA polymerase, 3Dpol, is known to share a region of sequence homology with all RNA polymerases centered at the GDD amino acid motif. The two aspartic acids have been postulated to be involved in the catalytic activity and metal ion coordination of the enzyme. To test this hypothesis, we have utilized oligonucleotide site-directed mutagenesis to generate defined mutations in the aspartic acids of the GDD motif of the 3Dpol gene. The codon for the first aspartate (3D-D-328 [D refers to the single amino acid change, and the number refers to its position in the polymerase]) was changed to that for glutamic acid, histidine, asparagine, or glutamine; the codons for both aspartic acids were simultaneously changed to those for glutamic acids; and the codon for the second aspartic acid (3D-D-329) was changed to that for glutamic acid or asparagine. The mutant enzymes were expressed in Escherichia coli, and the in vitro poly(U) polymerase activity was characterized. All of the mutant 3Dpol enzymes were enzymatically inactive in vitro when tested over a range of Mg2+ concentrations. However, when Mn2+ was substituted for Mg2+ in the in vitro assays, the mutant that substituted the second aspartic acid for asparagine (3D-N-329) was active. To further substantiate this finding, a series of different transition metal ions were substituted for Mg2+ in the poly(U) polymerase assay. The wild-type enzyme was active with all metals except Ca2+, while the 3D-N-329 mutant was active only when FeC6H7O5 was used in the reaction. To determine the effects of the mutations on poliovirus replication, the mutant 3Dpol genes were subcloned into an infectious cDNA of poliovirus. The cDNAs containing the mutant 3Dpol genes did not produce infectious virus when transfected into tissue culture cells under standard conditions. Because of the activity of the 3D-N-329 mutant in the presence of Fe2+ and Mn2+, transfections were also performed in the presence of the

  6. Identification of a Simple Sequence Repeat molecular-marker set for large-scale analyses of pear germplasm

    Directory of Open Access Journals (Sweden)

    Gabriel Dequigiovanni

    2012-01-01

    Full Text Available Simple Sequence Repeats (SSR are molecular markers suitable to assess the genetic variation of germplasm resources; however, large-scale SSR use requires protocol optimization. The present work aimed to identify SSR markers, developed for pear and other fruit species that are effective in characterizing pear germplasm collections and in demonstrating their use in providing support for genetic breeding programs. From a total of 62 SSR markers investigated, 23 yielding reproducible and polymorphic patterns were used to genotype a sample of 42 pear accessions of the Brazilian Pear Germplasm Bank (PGB. When compared to these 23 SSR markers, a subset of eleven markers, selected based on He, PIC and PId, was used to distinguish individual accessions and perform cluster analysis with similar efficacy. Genetic diversity analysis clustered the European, Japanese and Chinese accessions in distinct groups. This markers subset constitutes a valuable tool for several applications related to pear genetic resources management and breeding.

  7. Selection and development of representative simple sequence repeat primers and multiplex SSR sets for high throughput automated genotyping in maize

    Institute of Scientific and Technical Information of China (English)

    WANG FengGe; ZHAO JiuRan; DAI JingRui; YI HongMei; KUANG Meng; SUN YanMei; YU XinYan; GUO JingLun; WANG Lu

    2007-01-01

    In the current study, 1900 maize simple sequence repeat (SSR) primers published in MaizeGDB were screened utilizing reference literature, 15 representative Chinese maize inbred lines and 15 Chinese maize hybrids from national regional testing. In total, 500 highly polymorphic primers were identified and used to construct a genetic map. 100 evenly distributed primers, 10 primers per chromosome, were further selected as a set of universal SSR core primers, recommended as preferred primers for general studies. These core primers were then redesigned and used to construct a high throughput multiplex PCR system based on a five-color fluorescence capillary detection system. We report here that two sets of ten-plex PCR combinations have been constructed, each consisting of 10 primers, with one primer per chromosome.

  8. Common interruptions in the repeating tripeptide sequence of non-fibrillar collagens: sequence analysis and structural studies on triple-helix peptide models.

    Science.gov (United States)

    Thiagarajan, Geetha; Li, Yingjie; Mohs, Angela; Strafaci, Christopher; Popiel, Magdalena; Baum, Jean; Brodsky, Barbara

    2008-02-22

    Interruptions in the repeating (Gly-X1-X2)(n) amino acid sequence pattern are found in the triple-helix domains of all non-fibrillar collagens, and perturbations to the triple-helix at such sites are likely to play a role in collagen higher-order structure and function. This study defines the sequence features and structural consequences of the most common interruption, where one residue is missing from the tripeptide pattern, Gly-X1-X2-Gly-AA(1)-Gly-X1-X2, designated G1G interruptions. Residues found within G1G interruptions are predominantly hydrophobic (70%), followed by a significant amount of charged residues (16%), and the Gly-X1-X2 triplets flanking the interruption are atypical. Studies on peptide models indicate the degree of destabilization is much greater when Pro is in the interruption, GP, than when hydrophobic residues (GF, GY) are present, and a rigid Gly-Pro-Hyp tripeptide adjacent to the interruption leads to greater destabilization than a flexible Gly-Ala-Ala sequence. Modeling based on NMR data indicates the Phe residue within a GF interruption is located on the outside of the triple helix. The G1G interruptions resemble a previously studied collagen interruption GPOGAAVMGPO, designated G4G-type, in that both are destabilizing, but allow continuation of rod-like triple helices and maintenance of the single residue stagger throughout the imperfection, with a loss of axial register of the superhelix on both sides. Both kinds of interruptions result in a highly localized perturbation in hydrogen bonding and dihedral angles, but the hydrophobic residue of a G4G interruption packs near the central axis of the superhelix, while the hydrophobic residue of a G1G interruption is located on the triple-helix surface. The different structural consequences of G1G and G4G interruptions in the repeating tripeptide sequence pattern suggest a physical basis for their differential susceptibility to matrix metalloproteinases in type X collagen.

  9. Molecular cloning and long terminal repeat sequences of human endogenous retrovirus genes related to types A and B retrovirus genes

    Energy Technology Data Exchange (ETDEWEB)

    Ono, M.

    1986-06-01

    By using a DNA fragment primarily encoding the reverse transcriptase (pol) region of the Syrian hamster intracisternal A particle (IAP; type A retrovirus) gene as a probe, human endogenous retrovirus genes, tentatively termed HERV-K genes, were cloned from a fetal human liver gene library. Typical HERV-K genes were 9.1 or 9.4 kilobases in length, having long terminal repeats (LTRs) of ca. 970 base pairs. Many structural features commonly observed on the retrovirus LTRs, such as the TATAA box, polyadenylation signal, and terminal inverted repeats, were present on each LTR, and a lysine (K) tRNA having a CUU anticodon was identified as a presumed primer tRNA. The HERV-K LTR, however, had little sequence homology to either the IAP LTR or other typical oncovirus LTRs. By filter hybridization, the number of HERV-K genes was estimated to be ca. 50 copies per haploid human genome. The cloned mouse mammary tumor virus (type B) gene was found to hybridize with both the HERV-K and IAP genes to essentially the same extent.

  10. Empirical Comparison of Simple Sequence Repeats and Single Nucleotide Polymorphisms in Assessment of Maize Diversity and Relatedness

    Science.gov (United States)

    Hamblin, Martha T.; Warburton, Marilyn L.; Buckler, Edward S.

    2007-01-01

    While Simple Sequence Repeats (SSRs) are extremely useful genetic markers, recent advances in technology have produced a shift toward use of single nucleotide polymorphisms (SNPs). The different mutational properties of these two classes of markers result in differences in heterozygosities and allele frequencies that may have implications for their use in assessing relatedness and evaluation of genetic diversity. We compared analyses based on 89 SSRs (primarily dinucleotide repeats) to analyses based on 847 SNPs in individuals from the same 259 inbred maize lines, which had been chosen to represent the diversity available among current and historic lines used in breeding. The SSRs performed better at clustering germplasm into populations than did a set of 847 SNPs or 554 SNP haplotypes, and SSRs provided more resolution in measuring genetic distance based on allele-sharing. Except for closely related pairs of individuals, measures of distance based on SSRs were only weakly correlated with measures of distance based on SNPs. Our results suggest that 1) large numbers of SNP loci will be required to replace highly polymorphic SSRs in studies of diversity and relatedness and 2) relatedness among highly-diverged maize lines is difficult to measure accurately regardless of the marker system. PMID:18159250

  11. Sequence diversities of serine-aspartate repeat genes among Staphylococcus aureus isolates from different hosts presumably by horizontal gene transfer.

    Directory of Open Access Journals (Sweden)

    Huping Xue

    Full Text Available BACKGROUND: Horizontal gene transfer (HGT is recognized as one of the major forces for bacterial genome evolution. Many clinically important bacteria may acquire virulence factors and antibiotic resistance through HGT. The comparative genomic analysis has become an important tool for identifying HGT in emerging pathogens. In this study, the Serine-Aspartate Repeat (Sdr family has been compared among different sources of Staphylococcus aureus (S. aureus to discover sequence diversities within their genomes. METHODOLOGY/PRINCIPAL FINDINGS: Four sdr genes were analyzed for 21 different S. aureus strains and 218 mastitis-associated S. aureus isolates from Canada. Comparative genomic analyses revealed that S. aureus strains from bovine mastitis (RF122 and mastitis isolates in this study, ovine mastitis (ED133, pig (ST398, chicken (ED98, and human methicillin-resistant S. aureus (MRSA (TCH130, MRSA252, Mu3, Mu50, N315, 04-02981, JH1 and JH9 were highly associated with one another, presumably due to HGT. In addition, several types of insertion and deletion were found in sdr genes of many isolates. A new insertion sequence was found in mastitis isolates, which was presumably responsible for the HGT of sdrC gene among different strains. Moreover, the sdr genes could be used to type S. aureus. Regional difference of sdr genes distribution was also indicated among the tested S. aureus isolates. Finally, certain associations were found between sdr genes and subclinical or clinical mastitis isolates. CONCLUSIONS: Certain sdr gene sequences were shared in S. aureus strains and isolates from different species presumably due to HGT. Our results also suggest that the distributional assay of virulence factors should detect the full sequences or full functional regions of these factors. The traditional assay using short conserved regions may not be accurate or credible. These findings have important implications with regard to animal husbandry practices that may

  12. Transciptome analysis reveals flavonoid biosynthesis regulation and simple sequence repeats in yam (Dioscorea alata L.) tubers.

    Science.gov (United States)

    Wu, Zhi-Gang; Jiang, Wu; Mantri, Nitin; Bao, Xiao-Qing; Chen, Song-Lin; Tao, Zheng-Ming

    2015-04-30

    Yam (Dioscorea alata L.) is an important tuber crop and purple pigmented elite cultivar has recently become popular because of associated health benefits. Identifying candidate genes responsible for flavonoid biosynthesis pathway (FBP) will facilitate understanding the molecular mechanism of controlling pigment formation in yam tubers. Here, we used Illumina sequencing to characterize the transcriptome of tubers from elite purple-flesh cultivar (DP) and conventional white-flesh cultivar (DW) of yam. In this process, we also designed high quality molecular markers to assist molecular breeding for tuber trait improvement. A total of 125,123 unigenes were identified from the DP and DW cDNA libraries, of which about 49.5% (60,020 unigenes) were annotated by BLASTX analysis using the publicly available protein database. These unigenes were further annotated functionally and subject to biochemical pathway analysis. 511 genes were identified to be more than 2-fold (FDR yam cultivars, of which 288 genes were up-regulated and 223 genes were down-regulated in the DP tubers. Transcriptome analysis detected 61 unigenes encoding multiple well-known enzymes in the FBP. Furthermore, the unigenes encoding chalcone isomerase (CHS), flavanone 3-hydroxylase (F3H), flavonoid 3'-monooxygenase (F3'H), dihydroflavonol 4-reductase (DFR), leucoanthocyanidin dioxygenase (LDOX), and flavonol 3-O-glucosyltransferase (UF3GT) were found to be significantly up-regulated in the DP, implying that these genes were potentially associated with tuber color formation in this elite cultivar. The expression of these genes was further confirmed by qRT-PCR. Finally, 11,793 SSRs were successfully identified with these unigenes and 6,082 SSR markers were developed using Primer 3. This study provides the first comprehensive transcriptomic dataset for yam tubers, which will significantly contribute to genomic research of this and other related species. Some key genes associated with purple-flesh trait were

  13. Microsatellite primers resource developed from the mapped sequence scaffolds of Nisqually-1 genome. Submitted to New Phytologist

    Energy Technology Data Exchange (ETDEWEB)

    Yin, Tongming [ORNL; ZHANG, Dr. XINYE [Oak Ridge National Laboratory (ORNL); Gunter, Lee E [ORNL; Li, Shuxian [Nanjing Forestry University, China; Wullschleger, Stan D [ORNL; Huang, Prof. Minren [Nanjing Forestry University, China; Tuskan, Gerald A [ORNL

    2009-01-01

    In this study, 148 428 simple sequence repeat (SSR) primer pairs were designed from the unambiguously mapped sequence scaffolds of the Nisqually-1 genome. The physical position of the priming sites were identified along each of the 19 Populus chromosomes, and it was specified whether the priming sequences belong to intronic, intergenic, exonic or UTR regions. A subset of 150 SSR loci were amplified and a high amplification success rate (72%) was obtained in P. tremuloides, which belongs to a divergent subgenus of Populus relative to Nisqually-1. PCR reactions showed that the amplification success rate of exonic primer pairs was much higher than that of the intronic/intergenic primer pairs. Applying ANOVA and regression analyses to the flanking sequences of microsatellites, the repeat lengths, the GC contents of the repeats, the repeat motif numbers, the repeat motif length and the base composition of the repeat motif, it was determined that only the base composition of the repeat motif and the repeat motif length significantly affect the microsatellite variability in P. tremuloides samples. The SSR primer resource developed in this study provides a database for selecting highly transferable SSR markers with known physical position in the Populus genome and provides a comprehensive genetic tool to extend the genome sequence of Nisqually-1 to genetic studies in different Populus species.

  14. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  15. Reference: TCA1MOTIF [PLACE

    Lifescience Database Archive (English)

    Full Text Available TCA1MOTIF Goldsbrough AP, Albrecht H, Stratford R Salicylic acid-inducible binding ...of a tobacco nuclear protein to a 10 bp sequence which is highly conserved amongst stress-inducible genes. Plant J 3:563-571 (1993) PubMed: 8220463; ...

  16. 基于松材线虫全基因组序列的SSR标记开发%Development of simple sequence repeats base on pine wood nematode (Bursaphelenchus xylophilus) genome sequence

    Institute of Scientific and Technical Information of China (English)

    许峻荣; 吴小芹; 刘云; 叶建仁

    2014-01-01

    为了研究松材线虫在我国的群体遗传关系及传播路径,获得更为稳定的松材线虫分子标记,使用MISA软件对松材线虫全基因组10432条DNA片段进行搜索,共获得95个gSSR位点。其中,二核苷酸重复出现频率最高,占全部SSR位点的66�3%。依据所有gSSR位点共设计出36对引物,以1份松材线虫DNA pooling(包含我国46个不同地理来源的松材线虫虫株DNA)为模板进行PCR,产物由QIAxcel全自动凝胶电泳分析系统检测,获得17对可能具有多态性的gSSR引物。进一步对这17对引物的PCR产物进行单克隆试验并测序,BioEdit软件拼接比对结果表明,其中9对确实具有多态性。%To develop more stable molecular marker, SSR marker base on pine wood nematode ( Bursaphelenchus xylo-philus) whole genome sequences was discussed for researching the relationship of population genetic and travel route ac-curately. Ninety-five SSR loci were searched from a total of 10 432 DNA fragments of B. xylophilus by MISA. In the gSSRs, the dinucleotide repeat motifs were the most abundant (66.3%). Thirty-six pairs of primers were designed, and verified with one peace of B.xylophilus DNA pooling sample (46 DNA samples of isolates from different geographic origin contained) for PCR. The products detected by QIAxcellautomatic gel electrophoresis analysis system showed that 17 pairs of primers had potential polymorphism. Further, nine pairs of the primers showed polymorphism, which tested by mono-clonal sequencing and spliced by BioEdit, and which provided foundation for studying population genetic of B.xylophilus.

  17. Phylogenetic placement of Cynomorium in Rosales inferred from sequences of the inverted repeat region of the chloroplast genome

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hong ZHANG; Chun-Qi LI; Jian-hua LI

    2009-01-01

    Cynomorium is a herbaceous holoparasite that has been placed in Santalales, Saxifragales, Myrtales, or Sapindales. The inverted repeat (IR) region of the chloroplast genome region is slow evolving and, unlike mitochondrial genes, the chloroplast genome experiences few horizontal gene transfers between the host and parasite. Thus, in the present study, we used sequences of the IR region to test the phylogenetic placements of Cynomorium. Phylogenetic analyses of the chloroplast IR sequences generated largely congruent ordinal relationships with those from previous studies of angiosperm phylogeny based on single or multiple genes. Santalales was closely related to Caryophyllales and asterids. Saxifragales formed a clade where Peridiscus was sister to the remainder of the order, whereas Paeonia was sister to the woody clade of Saxifragales. Cynomorium is not closely related to Santalales, Saxifragales, Myrtales, or Sapindales; instead, it is included in Rosales and sister to Rosaceae. The various placements of the holoparasite on the basis of different regions of the mitochondrial genome may indicate the heterogeneous nature of the genome in the parasite. However, it is unlikely that the placement of Cynomorium in Rosales is the result of chloroplast gene transfer because Cynomorium does not parasitize on rosaceous plants and there is no chloroplast gene transfer between Cynomorium and Nitraria, a confirmed host of Cynomorium and a member of Sapindales.

  18. De novo characterization of the Dialeurodes citri transcriptome: mining genes involved in stress resistance and simple sequence repeats (SSRs) discovery.

    Science.gov (United States)

    Chen, E-H; Wei, D-D; Shen, G-M; Yuan, G-R; Bai, P-P; Wang, J-J

    2014-02-01

    The citrus whitefly, Dialeurodes citri (Ashmead), is one of the three economically important whitefly species that infest citrus plants around the world; however, limited genetic research has been focused on D. citri, partly because of lack of genomic resources. In this study, we performed de novo assembly of a transcriptome using Illumina paired-end sequencing technology (Illumina Inc., San Diego, CA, USA). In total, 36,766 unigenes with a mean length of 497 bp were identified. Of these unigenes, we identified 17,788 matched known proteins in the National Center for Biotechnology Information database, as determined by Blast search, with 5731, 4850 and 14,441 unigenes assigned to clusters of orthologous groups (COG), gene ontology (GO), and SwissProt, respectively. In total, 7507 unigenes were assigned to 308 known pathways. In-depth analysis of the data showed that 117 unigenes were identified as potentially involved in the detoxification of xenobiotics and 67 heat shock protein (Hsp) genes were associated with environmental stress. In addition, these enzymes were searched against the GO and COG database, and the results showed that the three major detoxification enzymes and Hsps were classified into 18 and 3, 6, and 8 annotations, respectively. In addition, 149 simple sequence repeats were detected. The results facilitate the investigation of molecular resistance mechanisms to insecticides and environmental stress, and contribute to molecular marker development. The findings greatly improve our genetic understanding of D. citri, and lay the foundation for future functional genomics studies on this species.

  19. Sequence analysis of the fragile X trinucleotide repeat: Correlations with stability and haplotype and implications for the origin of fragile X alleles

    Energy Technology Data Exchange (ETDEWEB)

    Snow, K.; Tester, D.J.; Kruckeberg, K.E.; Thibodeau, S.N. [Mayo Clinic, Rochester, MN (United States)

    1994-09-01

    Fragile X (FX) syndrome is associated with amplification of a CGG trinucleotide repeat in the 5{prime} untranslated region of the gene FMR-1. To address mechanism of instability and concern related to overlap between sizes of normal stable alleles and FX unstable alleles, we have sequenced 165 alleles to analyze patterns of AGG interruptions within the CGG repeat, and have typed the (CA)n at DXS548 for 204 chromosomes. Overall, our data is consistent with the idea that the length of uninterrupted CGG repeats determines instability. For 17 stably transmitted alleles with total repeat lengths between 33 and 51, the longest stretch of uninterrupted CGGs was 41. In contrast, for 13 premutation alleles, the shortest stretch of uninterrupted CGGs was 48, suggesting a threshold for expansion between 41 and 48 pure CGGs. For expansion from a premutation to a full mutation, the threshold appears to be {ge}70 uninterrupted repeats. Interestingly, an AGG was detected in some carriers of a full mutation. Comparison of the number of {open_quote}shadow bands{close_quote} in PCR products from similar size alleles with different AGG interruption patterns supports replication slippage as a potential mechanism, i.e. replication slippage occurs more readily as the length of pure repeat increases. Alleles with high total repeat lengths but up to 3 AGGs may be relatively protected against expansion, whereas smaller alleles with pure CGG sequence could be at higher risk for instability. Comparison of sequence data and DXS548 (CA)n data revealed specific sequence trends for each of the DXS548 alleles, explaining the previously reported haplotype association with FX. Incorporating these observations into models for the origin of FX alleles, we consider replication slippage, unequal crossover within the CGG repeat region, recombination between FMR-1 and DXS548, and loss of AGGs by A to C transversion.

  20. Retroviral sequence located in border region of short unique region and short terminal repeat of Md5 strain of Marek's disease virus type 1.

    Science.gov (United States)

    Endoh, D; Ito, M; Cho, K O; Kon, Y; Morimura, T; Hayashi, M; Kuwabara, M

    1998-02-01

    A 246-base pair (bp) retroviral sequence, which was homologous to a long terminal repeat of avian erythroblastosis virus (AEV), was detected and cloned from Md5 strain (Md5) of Marek's disease virus type 1 (MDV1) by representational difference analysis (RDA). The retroviral sequence was thought to be located in the border region of short unique region (U(s) and short terminal repeat (TRs), but did not exist in the border region of U(s) and the inverted short repeat (IRs) of the Md5 genome. A cloned fragment of the US/TRs border region of the Md5 genome showed a construction of U-E'-R-U'-E-TRs with the regions designated as follows: E, expanded TRs reported by Jones et al. [Proc. Natl. Acad. Sci. U.S.A. 90, 3855, 1993]; E', a partial copy of the expanded TRs; R, the retroviral sequence detected in Md5 genome; U, TRs-end sequence of U(s); U', a partial copy of TRs-end sequence of U(s). The sequence unit indicated as E'-R-U' was thought to be heterogeneously repeated in the Md5 genome. Since this retroviral sequence reportedly did not exist in the original stock of Md5, the retroviral sequence is thought to be inserted in the Md5 genome without experimental co-infection of avian cells with retrovirus and MDV1. These results suggest that RDA could be useful for the detection of retroviral sequences in the herpesvirus genome.

  1. Helix-packing motifs in membrane proteins.

    Science.gov (United States)

    Walters, R F S; DeGrado, W F

    2006-09-12

    The fold of a helical membrane protein is largely determined by interactions between membrane-imbedded helices. To elucidate recurring helix-helix interaction motifs, we dissected the crystallographic structures of membrane proteins into a library of interacting helical pairs. The pairs were clustered according to their three-dimensional similarity (rmsd universe of common transmembrane helix-pairing motifs is relatively simple. The largest cluster, which comprises 29% of the library members, consists of an antiparallel motif with left-handed packing angles, and it is frequently stabilized by packing of small side chains occurring every seven residues in the sequence. Right-handed parallel and antiparallel structures show a similar tendency to segregate small residues to the helix-helix interface but spaced at four-residue intervals. Position-specific sequence propensities were derived for the most populated motifs. These structural and sequential motifs should be quite useful for the design and structural prediction of membrane proteins.

  2. Frequent mutations of the CA simple sequence repeat in intron 1 of EGFR in mismatch repair-deficient colorectal cancers

    Institute of Scientific and Technical Information of China (English)

    Marie-Pierre Buisine; Thècla Lesuffleur; Agnès Wacrenier; Christophe Mariette; Emmanuelle Leteurtre; Fabienne Escande; Sana Aissi; Amandine Ketele; Annette Leclercq; Nicole Porchet

    2008-01-01

    AIM:To investigate the polymorphic simple sequence repeat in intron 1 of the epidermal growth factor receptor gene(EGFR)(CA-SSR I),which is known to affect the efficiency of gene transcription as a putative target of the mismatch repair (MMR) machinery in colorectal tumors.METHODS:The CA-SSR I genotype was analyzed in a total of 86 primary colorectal tumors,selected upon their microsatellite instability (MSI) status [42 with high frequency MSI (MSI-H) and 44 microsatellite stable (MSS)]and their respective normal tissue.The effect of the CASSR I genotype on the expression of the EGFR gene was evaluated in 18 specimens using quantitative real-time reverse transcription PCR and immunohistochemistry.RESULTS:Mutations in CA-SSR I were detected in 86%(36 of 42) of MSI-H colorectal tumors and 0% (0 of 44) of MSS tumors,indicating the EGFR gene as a novel putative specific target of the defective MMR system (P<0.001).Impaired expression of EGFR was detected in most of the colorectal tumors analyzed [6/12 (50%) at the mRNA level and 15/18 (83%) at the peptide level].However,no association was apparent between EGFR expression and CA-SSR I status in tumors or normal tissues.CONCLUSION:Our results suggest that CA-SSR I sequence does not contribute to the regulation of EGFR transcription in colon,and should thus not be considered as a promising predictive marker for response to EGFR inhibitors in patients with colorectal cancer.

  3. Genome-wide analysis of tandem repeats in plants and green algae

    Science.gov (United States)

    Zhixin Zhao; Cheng Guo; Sreeskandarajan Sutharzan; Pei Li; Craig Echt; Jie Zhang; Chun Liang

    2014-01-01

    Tandem repeats (TRs) extensively exist in the genomes of prokaryotes and eukaryotes. Based on the sequenced genomes and gene annotations of 31 plant and algal species in Phytozome version 8.0 (http://www.phytozome.net/), we examined TRs in a genome-wide scale, characterized their distributions and motif features, and explored their putative biological functions. Among...

  4. Diversity and genetic stability in banana genotypes in a breeding program using inter simple sequence repeats (ISSR) markers.

    Science.gov (United States)

    Silva, A V C; Nascimento, A L S; Vitória, M F; Rabbani, A R C; Soares, A N R; Lédo, A S

    2017-02-23

    Banana (Musa spp) is a fruit species frequently cultivated and consumed worldwide. Molecular markers are important for estimating genetic diversity in germplasm and between genotypes in breeding programs. The objective of this study was to analyze the genetic diversity of 21 banana genotypes (FHIA 23, PA42-44, Maçã, Pacovan Ken, Bucaneiro, YB42-47, Grand Naine, Tropical, FHIA 18, PA94-01, YB42-17, Enxerto, Japira, Pacovã, Prata-Anã, Maravilha, PV79-34, Caipira, Princesa, Garantida, and Thap Maeo), by using inter-simple sequence repeat (ISSR) markers. Material was generated from the banana breeding program of Embrapa Cassava & Fruits and evaluated at Embrapa Coastal Tablelands. The 12 primers used in this study generated 97.5% polymorphism. Four clusters were identified among the different genotypes studied, and the sum of the first two principal components was 48.91%. From the Unweighted Pair Group Method using Arithmetic averages (UPGMA) dendrogram, it was possible to identify two main clusters and subclusters. Two genotypes (Garantida and Thap Maeo) remained isolated from the others, both in the UPGMA clustering and in the principal cordinate analysis (PCoA). Using ISSR markers, we could analyze the genetic diversity of the studied material and state that these markers were efficient at detecting sufficient polymorphism to estimate the genetic variability in banana genotypes.

  5. Molecular diversity and relationships among Cymbidium goeringii cultivars based on inter-simple sequence repeat (ISSR) markers.

    Science.gov (United States)

    Wang, Hui-Zhong; Wu, Zhen-Xing; Lu, Jiang-Jie; Shi, Nong-Nong; Zhao, Yan; Zhang, Zhi-Tao; Liu, Jun-Jun

    2009-07-01

    Spring orchid (Cymbidium goeringii) is a popular flowering plant species. There have been few molecular studies of the genetic diversity and conservation genetics on this species. An assessment of the level of genetic diversity in cultivated spring orchid would facilitate development of the future germplasm conservation for cultivar improvement. In the present study, DNA markers of intersimple sequence repeats (ISSR) were identified and the ISSR fingerprinting technique was used to evaluate genetic diversity in C. goeringii cultivars. Twenty-five ISSR primers were selected to produce a total of 224 ISSR loci for evaluation of the genetic diversity. A wide genetic variation was found in the 50 tested cultivars with Nei's gene diversity (H = 0.2241) and 93.75% of polymorphic loci. Fifty cultivars were unequivocally distinguished based on ISSR fingerprinting. Cultivar-specific ISSR markers were identified in seven of 50 tested cultivars. Unweighted pair-group mean analysis (UPGMA) and principal coordinates analysis (PCA) grouped them into two clusters: one composed the cultivars mainly from Japan, and the other contained three major subclusters mainly from China. Two Chinese subclusters were generally consistent with horticultural classification, and the third Chinese subcluster contained cultivars from various horticultural groups. Our results suggest that the ISSR technique provides a powerful tool for cultivar identification and establishment of genetic relationships of cultivars in C. goeringii.

  6. Inter-Simple Sequence Repeat (ISSR Markers to Study Genetic Diversity Among Cotton Cultivars in Associated with Salt Tolerance

    Directory of Open Access Journals (Sweden)

    Ali Akbar ABDI

    2012-11-01

    Full Text Available Developing salt-tolerant crops is very important as a significant proportion of cultivated land is salt-affected. Screening and selection of salt tolerant genotypes of cotton using DNA molecular markers not only introduce tolerant cultivars useful for hybridization and breeding programs but also detect DNA regions involved in mechanism of salinity tolerance. To study this, 28 cotton cultivars, including 8 Iranian cotton varieties were grown in pots under greenhouse condition and three salt treatments were imposed with salt solutions (0, 70 and 140 mM NaCl. Eight agronomic traits including root length, root fresh weight, root dry weight, chlorophyll and fluorescence index, K+ and Na+ contents in shoot (above ground biomass, and K+/Na+ ratio were measured. Cluster analysis of cultivars based on measured agronomic traits, showed �Cindose� and �Ciacra� as the most tolerant cultivars, and �B-557� and �43347� as the most sensitive cultivars of salt damage. A total of 65 polymorphic DNA fragments were generated at 14 inter-simple sequence repeat (ISSR loci. Plants of 28 cultivars of cotton grouped into three clusters based on ISSR markers. Regression analysis of markers in relation with traits data showed that 23, 33 and 30 markers associated with the measured traits in three salt treatments respectively. These markers might help breeders in any marker assisted selection program in order to improving cotton cultivars against salt stress.

  7. Regeneration and assessment of genetic fidelity of the endangered tree Moringa peregrina (Forsk.) Fiori using Inter Simple Sequence Repeat (ISSR).

    Science.gov (United States)

    Al Khateeb, Wesam; Bahar, Eman; Lahham, Jamil; Schroeder, Dana; Hussein, Emad

    2013-01-01

    Moringa peregrinais an endangered species of Moringaceae.M. peregrinais a multipurpose tree with a wide variety of potential uses including its medicinal activity. In our study, a rapid and efficient micropropagation protocol for M. peregrina has been established. In vitro germinated seedlings were cultured on Murashige and Skoog (MS) medium supplemented with different levels of either 6-benzyladenine (BA) or kinetin (Kin). The maximum shoot proliferation of 6.5 shoots per explant with 100 % shoot proliferation rate was observed on MS medium supplemented with 1.0 mg/l BA. On the other hand, MS medium supplemented with 1 mg/l indole-3-butyric acid (IBA) resulted in the maximum number of roots. Micropropagated plants were successfully acclimatized. Genetic stability of micropropagated plants was assessed using Inter-Simple Sequence Repeat (ISSR). The amplification products were monomorphic in all in vitro grown plants. No polymorphism was detected indicating the genetic integrity of in vitro propagated plants. This micropropagation protocol could be useful for raising genetically uniform plants for plant propagation and commercial cultivation.

  8. Transferability of simple sequence repeat (SSR) markers developed in guava (Psidium guajava L.) to four Myrtaceae species.

    Science.gov (United States)

    Rai, Manoj K; Phulwaria, Mahendra; Shekhawat, N S

    2013-08-01

    Present study demonstrated the cross-genera transferability of 23 simple sequence repeat (SSR) primer pairs developed for guava (Psidium guajava L.) to four new targets, two species of eucalypts (Eucalyptus citriodora, Eucalyptus camaldulensis), bottlebrush (Callistemon lanceolatus) and clove (Syzygium aromaticum), belonging to the family Myrtaceae and subfamily Myrtoideae. Off the 23 SSR loci assayed, 18 (78.2%) gave cross-amplification in E. citriodora, 14 (60.8%) in E. camaldulensis and 17-17 (73.9%) in C. lanceolatus and S. aromaticum. Eight primer pairs were found to be transferable to all four species. The number of alleles detected at each locus ranged from one to nine, with an average of 4.8, 2.6, 4.5 and 4.6 alleles in E. citriodora, E. camaldulensis, C. lanceolatus and S. aromaticum, respectively. The high levels of cross-genera transferability of guava SSRs may be applicable for the analysis of intra- and inter specific genetic diversity of target species, especially in E. citriodora, C. lanceolatus and S. aromaticum, for which till date no information about EST-derived as well as genomic SSR is available.

  9. Genetic Diversity among Parents of Hybrid Rice Based on Cluster Analysis of Morphological Traits and Simple Sequence Repeat Markers

    Institute of Scientific and Technical Information of China (English)

    WANG Sheng-jun; LU Zuo-mei; WAN Jian-min

    2006-01-01

    The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one entries were assigned into two clusters (I.e. Early or medium-maturing cluster; medium or late-maturing cluster) and further assigned into six sub-clusters based on morphological trait cluster analysis. The early or medium-maturing cluster was composed of 15 maintainer lines, four early-maturing restorer lines and two thermo-sensitive genic male sterile lines, and the medium or late-maturing cluster included 16 restorer lines and 4 medium or late-maturing maintainer lines. Moreover, the SSR cluster analysis classified 41 entries into two clusters (I.e. Maintainer line cluster and restorer line cluster) and seven sub-clusters. The maintainer line cluster consisted of all 19 maintainer lines, two thermo-sensitive genic male sterile lines, while the restorer line cluster was composed of all 20 restorer lines. The SSR analysis fitted better with the pedigree information. From the views on hybrid rice breeding, the results suggested that SSR analysis might be a better method to study the diversity of parental lines in indica hybrid rice.

  10. Genetic diversity of wild Cymbidium goeringii (Orchidaceae)populations from Hubei based on Inter-simple sequence repeats analysis

    Institute of Scientific and Technical Information of China (English)

    YAO Xiaohong; GAO Li; YANG Bo

    2007-01-01

    Cymbidium goeringii is a diploid and nonrewarding,bumblebee-pollinated species,which is distributed in China,Japan and Korea Peninsula.This species is now highly endangered due to the mass collection and forest clearance in China.In the present study,we investigated the distribution of genetic variation within and between eleven populations of Cymbidium goeringii in central China by using Inter-simple sequence repeats (ISSR) markers.Eleven primers produced a total of 127 clear and reproducible bands of which 112 were polymorphic.High genetic diversity was detected in Cymbidium goeringii for both population level (P = 63.1%;He = 0.194 5) and species level (P = 88.2%;He = 0.262 8).A higher level of genetic differentiation was detected among populations (GST = 0.244 0,FST = 0.220 7)with Nei's Gsr analysis and analysis of molecular variance (AMOVA),and no correlation was found between geographical and genetic distance.Genetic drift rather than gene flow played an important role in forming the present population structure of Cymbidium goeringii.Limited gene flow among populations and gene drift increase the extinction risk of local populations.Some conservation concerns are therefore discussed together with possible strategies for implementing in situ and ex situ conservation.

  11. Genetic Diversity of Chinese and Swedish Rapeseed (Brassica napus L. ) Analyzed by Inter-Simple Sequence Repeats (ISSRs)

    Institute of Scientific and Technical Information of China (English)

    MA Chao-zhi; FU Ting-dong; Stine Tuevesson; Bo Gertsson

    2003-01-01

    We have compared genetic diversity of 24 Chinese weak-winter, Swedish winter and spring B.napus accessions by inter-simple sequence repeats (ISSRs). By cluster analysis (UPGMA) based on 125 polymorphism bands amplified with 20 primers, the 24 accessions were divided into three groups. Six Swedish winter lines and eight Chinese weak-winter lines were in the group Ⅰ and the group Ⅱ were two Chinese weakwinter lines Xiangyou15 and Bao81. The third group contained eight Swedish spring lines. Principal co-ordinates analysis (PCO) showed similar groupings to cluster analysis. Results from cluster analysis and PCO analysis showed very clearly that Chinese weak-winter, Swedish spring and winter accessions were distinguished from each other and Chinese weak-winter accessions in this study were genetically closer to Swedish winter accessions than to Swedish spring accessions. The Chinese weak-winter accessions had larger diversity than Swedish spring or winter accessions did. This study indicated that ISSR is a suitable and effective tool to evaluate genetic diversity among rapeseed germplasm.

  12. Evaluation of Population Structure, Genetic Diversity and Origin of Northeast Asia Weedy Rice Based on Simple Sequence Repeat Markers

    Directory of Open Access Journals (Sweden)

    Li Mao-bai

    2015-07-01

    Full Text Available Weedy rice exerts a severe impact on rice production by competing for sunlight, water and nutrients. This study assayed the population structure, genetic diversity and origin of Northeast Asia weedy rice by using 48 simple sequence repeat markers. The results showed that weedy rice in Northeast Asia had a high genetic diversity, with Shannon's diversity index (I of 0.748 and the heterozygosity (He of 0.434. In each regional population, I value varied widely. The widest range of I (0.228–0.489 was observed in the weedy rice of Eastern China, which was larger than that of Northeast China and Korea (0.168–0.270. The F-statistics of regional populations (Fis, Fit and Fst also showed higher values in the weedy rice of Eastern China than those of Northeast China and Korea. All weedy rice accessions were grouped into two clusters in the unweighted pair group method with arithmetic mean cluster analysis dendrogram, namely Eastern China branch and Northeastern China plus Korea branch. There was significant differentiation in genetic characteristics in weedy rice of northeastern and eastern Asia, especially in Eastern China.

  13. The complete chloroplast genome sequence of Cephalotaxus oliveri (Cephalotaxaceae): evolutionary comparison of cephalotaxus chloroplast DNAs and insights into the loss of inverted repeat copies in gymnosperms.

    Science.gov (United States)

    Yi, Xuan; Gao, Lei; Wang, Bo; Su, Ying-Juan; Wang, Ting

    2013-01-01

    We have determined the complete chloroplast (cp) genome sequence of Cephalotaxus oliveri. The genome is 134,337 bp in length, encodes 113 genes, and lacks inverted repeat (IR) regions. Genome-wide mutational dynamics have been investigated through comparative analysis of the cp genomes of C. oliveri and C. wilsoniana. Gene order transformation analyses indicate that when distinct isomers are considered as alternative structures for the ancestral cp genome of cupressophyte and Pinaceae lineages, it is not possible to distinguish between hypotheses favoring retention of the same IR region in cupressophyte and Pinaceae cp genomes from a hypothesis proposing independent loss of IRA and IRB. Furthermore, in cupressophyte cp genomes, the highly reduced IRs are replaced by short repeats that have the potential to mediate homologous recombination, analogous to the situation in Pinaceae. The importance of repeats in the mutational dynamics of cupressophyte cp genomes is also illustrated by the accD reading frame, which has undergone extreme length expansion in cupressophytes. This has been caused by a large insertion comprising multiple repeat sequences. Overall, we find that the distribution of repeats, indels, and substitutions is significantly correlated in Cephalotaxus cp genomes, consistent with a hypothesis that repeats play a role in inducing substitutions and indels in conifer cp genomes.

  14. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  15. An integrated genetic linkage map of watermelon and genetic diversity based on single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers

    Science.gov (United States)

    Watermelon (Citrullus lanatus var. lanatus) is an important vegetable fruit throughout the world. A high number of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers should provide large coverage of the watermelon genome and high phylogenetic resolution of germplasm acces...

  16. Characterization of clonal relatedness among the natural population of Staphylococcus aureus strains by using spa sequence typing and the BURP (based upon repeat patterns) algorithm

    NARCIS (Netherlands)

    Mellmann, Alexander; Weniger, Thomas; Berssenbrügge, Christoph; Keckevoet, Ursula; Friedrich, Alexander W; Harmsen, Dag; Grundmann, Hajo

    2008-01-01

    We evaluated the BURP (based upon repeat patterns) algorithm, which relies on sequencing of the Staphylococcus aureus protein A gene (spa), for its ability to infer clonal relatedness within a population of 110 wild-type strains. BURP clustering of the resulting 66 spa types was highly concordant wi

  17. Cyclotriveratrylene (CTV) as a new chiral triacid scaffold capable of inducing triple helix formation of collagen peptides containing either a native sequence or Pro-Hyp-Gly repeats

    NARCIS (Netherlands)

    Rump, ET; Rijkers, DTS; Hilbers, HW; de Groot, PG; Liskamp, RMJ

    2002-01-01

    A new triacid scaffold is described based on the cone-shaped cyclotriveratrylene (CTV) molecule that facilitates the triple, helical folding of peptides containing either a unique blood platelet binding collagen sequence or collagen peptides composed of Pro-Hyp-Gly repeats. The latter were synthesiz

  18. Comparison of Multilocus Variable-Number Tandem-Repeat Analysis and Multilocus Sequence Typing for Differentiation of Hemolytic-Uremic Syndrome-Associated Escherichia coli (HUSEC) Collection Strains▿

    OpenAIRE

    2011-01-01

    Multilocus variable-number tandem-repeat analysis (MLVA) was compared to multilocus sequence typing (MLST) to differentiate hemolytic uremic syndrome-associated enterohemorrhagic Escherichia coli strains. Although MLVA—like MLST—was highly discriminatory (index of diversity, 0.988 versus 0.984), a low level of concordance demonstrated the limited ability of MLVA to reflect long-term evolutionary events.

  19. Characterisation of an unusual telomere motif (TTTTTTAGGG)n in the plant Cestrum elegans (Solanaceae), a species with a large genome.

    Science.gov (United States)

    Peška, Vratislav; Fajkus, Petr; Fojtová, Miloslava; Dvořáčková, Martina; Hapala, Jan; Dvořáček, Vojtěch; Polanská, Pavla; Leitch, Andrew R; Sýkorová, Eva; Fajkus, Jiří

    2015-05-01

    The characterization of unusual telomere sequence sheds light on patterns of telomere evolution, maintenance and function. Plant species from the closely related genera Cestrum, Vestia and Sessea (family Solanaceae) lack known plant telomeric sequences. Here we characterize the telomere of Cestrum elegans, work that was a challenge because of its large genome size and few chromosomes (1C 9.76 pg; n = 8). We developed an approach that combines BAL31 digestion, which digests DNA from the ends and chromosome breaks, with next-generation sequencing (NGS), to generate data analysed in RepeatExplorer, designed for de novo repeats identification and quantification. We identify an unique repeat motif (TTTTTTAGGG)n in C. elegans, occurring in ca. 30 400 copies per haploid genome, averaging ca. 1900 copies per telomere, and synthesized by telomerase. We demonstrate that the motif is synthesized by telomerase. The occurrence of an unusual eukaryote (TTTTTTAGGG)n telomeric motif in C. elegans represents a switch in motif from the 'typical' angiosperm telomere (TTTAGGG)n . That switch may have happened with the divergence of Cestrum, Sessea and Vestia. The shift in motif when it arose would have had profound effects on telomere activity. Thus our finding provides a unique handle to study how telomerase and telomeres responded to genetic change, studies that will shed more light on telomere function.

  20. Sequence variations in C9orf72 downstream of the hexanucleotide repeat region and its effect on repeat-primed PCR interpretation

    DEFF Research Database (Denmark)

    Nordin, Angelica; Akimoto, Chizuru; Wuolikainen, Anna

    2017-01-01

    -PCR data. Our objective was to determine the properties of these sequence variations with regard to prevalence, the range of variation, and effect on disease prognosis. We screened a multi-national cohort (n = 6981) for the HREM and samples with deviant RP-PCR curves were identified. The deviant samples...

  1. Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif

    Science.gov (United States)

    2010-01-01

    Background Effector secretion is a common strategy of pathogen in mediating host-pathogen interaction. Eight EPIYA-motif containing effectors have recently been discovered in six pathogens. Once these effectors enter host cells through type III/IV secretion systems (T3SS/T4SS), tyrosine in the EPIYA motif is phosphorylated, which triggers effectors binding other proteins to manipulate host-cell functions. The objectives of this study are to evaluate the distribution pattern of EPIYA motif in broad biological species, to predict potential effectors with EPIYA motif, and to suggest roles and biological functions of potential effectors in host-pathogen interactions. Results A hidden Markov model (HMM) of five amino acids was built for the EPIYA-motif based on the eight known effectors. Using this HMM to search the non-redundant protein database containing 9,216,047 sequences, we obtained 107,231 sequences with at least one EPIYA motif occurrence and 3115 sequences with multiple repeats of the EPIYA motif. Although the EPIYA motif exists among broad species, it is significantly over-represented in some particular groups of species. For those proteins containing at least four copies of EPIYA motif, most of them are from intracellular bacteria, extracellular bacteria with T3SS or T4SS or intracellular protozoan parasites. By combining the EPIYA motif and the adjacent SH2 binding motifs (KK, R4, Tarp and Tir), we built HMMs of nine amino acids and predicted many potential effectors in bacteria and protista by the HMMs. Some potential effectors for pathogens (such as Lawsonia intracellularis, Plasmodium falciparum and Leishmania major) are suggested. Conclusions Our study indicates that the EPIYA motif may be a ubiquitous functional site for effectors that play an important pathogenicity role in mediating host-pathogen interactions. We suggest that some intracellular protozoan parasites could secrete EPIYA-motif containing effectors through secretion systems similar to the

  2. Long CAG Repeat Sequence and Protein Expression of Androgen Receptor Considered as Prognostic Indicators in Male Breast Carcinoma

    OpenAIRE

    Yan-Ni Song; Jing-Shu Geng; Tong Liu; Zhen-Bin Zhong; Yang Liu; Bing-Shu Xia; Hong-Fei Ji; Xiao-Mei Li; Guo-Qiang Zhang; Yan-Lv Ren; Zhi-Gao Li; Da Pang

    2012-01-01

    BACKGROUND: The androgen receptor (AR) expression and the CAG repeat length within the AR gene appear to be involved in the carcinogenesis of male breast carcinoma (MBC). Although phenotypic differences have been observed between MBC and normal control group in AR gene, there is lack of correlation analysis between AR expression and CAG repeat length in MBC. The purpose of the study was to investigate the prognostic value of CAG repeat lengths and AR protein expression. METHODS: 81 tumor tiss...

  3. Short tandem repeat sequences in the Mycoplasma genitalium genome and their use in a multilocus genotyping system

    Directory of Open Access Journals (Sweden)

    Lillis Rebecca

    2008-07-01

    Full Text Available Abstract Background Several methods have been reported for strain typing of Mycoplasma genitalium. The value of these methods has never been comparatively assessed. The aims of this study were: 1 to identify new potential genetic markers based on an analysis of short tandem repeat (STR sequences in the published M. genitalium genome sequence; 2 to apply previously and newly identified markers to a panel of clinical strains in order to determine the optimal combination for an efficient multi-locus genotyping system; 3 to further confirm sexual transmission of M. genitalium using the newly developed system. Results We performed a comprehensive analysis of STRs in the genome of the M. genitalium type strain G37 and identified 18 loci containing STRs. In addition to one previously studied locus, MG309, we chose two others, MG307 and MG338, for further study. Based on an analysis of 74 unrelated patient specimens from New Orleans and Scandinavia, the discriminatory indices (DIs for these three markers were 0.9153, 0.7381 and 0.8730, respectively. Two other previously described markers, including single nucleotide polymorphisms (SNPs in the rRNA genes (rRNA-SNPs and SNPs in the MG191 gene (MG191-SNPs were found to have DIs of 0.5820 and 0.9392, respectively. A combination of MG309-STRs and MG191-SNPs yielded almost perfect discrimination (DI = 0.9894. An additional finding was that the rRNA-SNPs distribution pattern differed significantly between Scandinavia and New Orleans. Finally we applied multi-locus typing to further confirm sexual transmission using specimens from 74 unrelated patients and 31 concurrently infected couples. Analysis of multi-locus genotype profiles using the five variable loci described above revealed 27 of the couples had concordant genotype profiles compared to only four examples of concordance among the 74 unrelated randomly selected patients. Conclusion We propose that a combination of the MG309-STRs and MG191-SNPs is

  4. Organellar genome, nuclear ribosomal DNA repeat unit, and microsatellites isolated from a small-scale of 454 GS FLX sequencing on two mosses.

    Science.gov (United States)

    Liu, Yang; Forrest, Laura L; Bainard, Jillian D; Budke, Jessica M; Goffinet, Bernard

    2013-03-01

    Recent innovations in high-throughput DNA sequencing methodology (next generation sequencing technologies [NGS]) allow for the generation of large amounts of high quality data that may be particularly critical for resolving ambiguous relationships such as those resulting from rapid radiations. Application of NGS technology to bryology is limited to assembling entire nuclear or organellar genomes of selected exemplars of major lineages (e.g., classes). Here we outline how organellar genomes and the entire nuclear ribosomal DNA repeat can be obtained from minimal amounts of moss tissue via small-scale 454 GS FLX sequencing. We sampled two Funariaceae species, Funaria hygrometrica and Entosthodon obtusus, and assembled nearly complete organellar genomes and the whole nuclear ribosomal DNA repeat unit (18S-ITS1-5.8S-ITS2-26S-IGS1-5S-IGS2) for both taxa. Sequence data from these species were compared to sequences from another Funariaceae species, Physcomitrella patens, revealing low overall degrees of divergence of the organellar genomes and nrDNA genes with substitutions spread rather evenly across their length, and high divergence within the external spacers of the nrDNA repeat. Furthermore, we detected numerous microsatellites among the 454 assemblies. This study demonstrates that NGS methodology can be applied to mosses to target large genomic regions and identify microsatellites.

  5. Genetic variation in Rhodomyrtus tomentosa (Kemunting) populations from Malaysia as revealed by inter-simple sequence repeat markers.

    Science.gov (United States)

    Hue, T S; Abdullah, T L; Abdullah, N A P; Sinniah, U R

    2015-12-14

    Kemunting (Rhodomyrtus tomentosa) from the Myrtaceae family, is native to Malaysia. It is widely used in traditional medicine to treat various illnesses and possesses significant antibacterial properties. In addition, it has great potential as ornamental in landscape design. Genetic variability studies are important for the rational management and conservation of genetic material. In the present study, inter-simple sequence repeat markers were used to assess the genetic diversity of 18 R. tomentosa populations collected from ten states of Peninsular Malaysia. The 11 primers selected generated 173 bands that ranged in size from 1.6 kb to 130 bp, which corresponded to an average of 15.73 bands per primer. Of these bands, 97.69% (169 in total) were polymorphic. High genetic diversity was documented at the species level (H(T) = 0.2705; I = 0.3973; PPB = 97.69%) but there was a low diversity at population level (H(S) = 0.0073; I = 0 .1085; PPB = 20.14%). The high level of genetic differentiation revealed by G(ST) (73%) and analysis of molecular variance (63%), together with the limited gene flow among population (N(m) = 0.1851), suggests that the populations examined are isolated. Results from an unweighted pair group method with arithmetic mean dendrogram and principal coordinate analysis clearly grouped the populations into two geographic groups. This clear grouping can also be demonstrated by the significant Mantel test (r = 0.581, P = 0.001). We recommend that all the R. tomentosa populations be preserved in conservation program.

  6. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats

    Directory of Open Access Journals (Sweden)

    Vergnaud Gilles

    2007-05-01

    Full Text Available Abstract Background In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows that new motifs (one spacer and one repeated element are added in a polarised fashion. Although their principal characteristics have been described, a lot remains to be discovered on the way CRISPRs are created and evolve. As new genome sequences become available it appears necessary to develop automated scanning tools to make available CRISPRs related information and to facilitate additional investigations. Description We have produced a program, CRISPRFinder, which identifies CRISPRs and extracts the repeated and unique sequences. Using this software, a database is constructed which is automatically updated monthly from newly released genome sequences. Additional tools were created to allow the alignment of flanking sequences in search for similarities between different loci and to build dictionaries of unique sequences. To date, almost six hundred CRISPRs have been identified in 475 published genomes. Two Archeae out of thirty-seven and about half of Bacteria do not possess a CRISPR. Fine analysis of repeated sequences strongly supports the current view that new motifs are added at one end of the CRISPR adjacent to the putative promoter. Conclusion It is hoped that availability of a public database, regularly updated and which can be queried on the web will help in further dissecting and understanding CRISPR structure and flanking sequences evolution. Subsequent analyses of the intra-species CRISPR polymorphism will be facilitated by CRISPRFinder and the

  7. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas.

    Science.gov (United States)

    Petrov, Anton I; Zirbel, Craig L; Leontis, Neocles B

    2013-10-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson-Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.

  8. Detection of dispersed short tandem repeats using reversible jump Markov chain Monte Carlo.

    Science.gov (United States)

    Liang, Tong; Fan, Xiaodan; Li, Qiwei; Li, Shuo-Yen R

    2012-10-01

    Tandem repeats occur frequently in biological sequences. They are important for studying genome evolution and human disease. A number of methods have been designed to detect a single tandem repeat in a sliding window. In this article, we focus on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. We construct a probabilistic generative model for the tandem repeats, where the sequence pattern is represented by a motif matrix. A Bayesian approach is adopted to compute this model. Markov chain Monte Carlo (MCMC) algorithms are used to explore the posterior distribution as an effort to infer both the motif matrix of tandem repeats and the location of repeat segments. Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. Experiments on both synthetic data and real data show that this new approach is powerful in detecting dispersed short tandem repeats. As far as we know, it is the first work to adopt RJMCMC algorithms in the detection of tandem repeats.

  9. Main: TCA1MOTIF [PLACE

    Lifescience Database Archive (English)

    Full Text Available TCA1MOTIF S000159 17-May-1998 (last modified) kehi TCA-1 (tobacco nuclear protein 1...) binding site; Related to salicylic acid-inducible expression of many genes; Found in barley beta-1,3-gluca...nase and over 30 different plant genes which are known to be induced by one or more forms of stress; A similar sequence (TCA... et al., 1997); SA; salicylic acid; stress; TCA-1; barley (Hordeum vulgare); tobacco (Nicotiana tabacum); TCATCTTCTT ...

  10. IS1630 of Mycoplasma fermentans, a Novel IS30-Type Insertion Element That Targets and Duplicates Inverted Repeats of Variable Length and Sequence during Insertion

    Science.gov (United States)

    Calcutt, Michael J.; Lavrrar, Jennifer L.; Wise, Kim S.

    1999-01-01

    A new insertion sequence (IS) of Mycoplasma fermentans is described. This element, designated IS1630, is 1,377 bp long and has 27-bp inverted repeats at the termini. A single open reading frame (ORF), predicted to encode a basic protein of either 366 or 387 amino acids (depending on the start codon utilized), occupies most of this compact element. The predicted translation product of this ORF has homology to transposases of the IS30 family of IS elements and is most closely related (27% identical amino acid residues) to the product of the prototype of the group, IS30. Multiple copies of IS1630 are present in the genomes of at least two M. fermentans strains. Characterization and comparison of nine copies of the element revealed that IS1630 exhibits unusual target site specificity and, upon insertion, duplicates target sequences in a manner unlike that of any other IS element. IS1630 was shown to have the striking ability to target and duplicate inverted repeats of variable length and sequence during transposition. IS30-type elements typically generate 2- or 3-bp target site duplications, whereas those created by IS1630 vary between 19 and 26 bp. With the exception of two recently reported IS4-type elements which have the ability to generate variable large duplications (B. B. Plikaytis, J. T. Crawford, and T. M. Shinnick, J. Bacteriol. 180:1037–1043, 1998; E. M. Vilei, J. Nicolet, and J. Frey, J. Bacteriol. 181:1319–1323, 1999), such large direct repeats had not been observed for other IS elements. Interestingly, the IS1630-generated duplications are all symmetrical inverted repeat sequences that are apparently derived from rho-independent transcription terminators of neighboring genes. Although the consensus target site for IS30 is almost palindromic, individual target sites possess considerably less inverted symmetry. In contrast, IS1630 appears to exhibit an increased stringency for inverted repeat recognition, since the majority of target sites had no

  11. Crystal structure of the G3BP2 NTF2-like domain in complex with a canonical FGDF motif peptide

    DEFF Research Database (Denmark)

    Kristensen, Ole

    2015-01-01

    The crystal structure of the NTF2-like domain of the human Ras GTPase SH3 Binding Protein (G3BP), isoform 2, was determined at a resolution of 2.75 Å in complex with a peptide containing a FGDF sequence motif. The overall structure of the protein is highly similar to the homodimeric N...... molecular modeling suggested that FGDF-motif containing peptides bind in an extended conformation into a hydrophobic groove on the surface of the G3BP NTF2-like domain in a manner similar to the known binding of FxFG nucleoporin repeats. The results in this paper provide evidence for a different binding...

  12. Assessment of composite motif discovery methods

    Directory of Open Access Journals (Sweden)

    Johansen Jostein

    2008-02-01

    Full Text Available Abstract Background Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery – discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. Results We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Conclusion Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual

  13. Sequence determinants of human microsatellite variability

    Directory of Open Access Journals (Sweden)

    Jakobsson Mattias

    2009-12-01

    Full Text Available Abstract Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length, under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  14. PEGylation enhances tumor targeting of plasmid DNA by an artificial cationized protein with repeated RGD sequences, Pronectin.

    Science.gov (United States)

    Hosseinkhani, Hossein; Tabata, Yasuhiko

    2004-05-31

    The objective of this study is to investigate feasibility of a non-viral gene carrier with repeated RGD sequences (Pronectin F+) in tumor targeting for gene expression. The Pronectin F+ was cationized by introducing spermine (Sm) to the hydroxyl groups to allow to polyionically complex with plasmid DNA. The cationized Pronectin F+ prepared was additionally modified with poly(ethylene glycol) (PEG) molecules which have active ester and methoxy groups at the terminal, to form various PEG-introduced cationized Pronectin F+. The cationized Pronectin F+ with or without PEGylation at different extents was mixed with a plasmid DNA of LacZ to form respective cationized Pronectin F+-plasmid DNA complexes. The plasmid DNA was electrophoretically complexed with cationized Pronectin F+ and PEG-introduced cationized Pronectin F+, irrespective of the PEGylation extent, although the higher N/P ratio of complexes was needed for complexation with the latter Pronectin F+. The molecular size and zeta potential measurements revealed that the plasmid DNA was reduced in size to about 250 nm and the charge was changed to be positive by the complexation with cationized Pronectin F+. For the complexation with PEG-introduced cationized Pronectin F+, the charge of complex became neutral being almost 0 mV with the increasing PEGylation extents, while the molecular size was similar to that of cationized Pronectin F+. When cationized Pronectin F+-plasmid DNA complexes with or without PEGylation were intravenously injected to mice carrying a subcutaneous Meth-AR-1 fibrosarcoma mass, the PEG-introduced cationized Pronectin F+-plasmid DNA complex specifically enhanced the level of gene expression in the tumor, to a significantly high extent compared with the cationized Pronectin F+-plasmid DNA complexes and free plasmid DNA. The enhanced level of gene expression depended on the percentage of PEG introduced, the N/P ratio, and the plasmid DNA dose. A fluorescent microscopic study revealed that the

  15. Elongated polyproline motifs facilitate enamel evolution through matrix subunit compaction.

    Directory of Open Access Journals (Sweden)

    Tianquan Jin

    2009-12-01

    Full Text Available Vertebrate body designs rely on hydroxyapatite as the principal mineral component of relatively light-weight, articulated endoskeletons and sophisticated tooth-bearing jaws, facilitating rapid movement and efficient predation. Biological mineralization and skeletal growth are frequently accomplished through proteins containing polyproline repeat elements. Through their well-defined yet mobile and flexible structure polyproline-rich proteins control mineral shape and contribute many other biological functions including Alzheimer's amyloid aggregation and prolamine plant storage. In the present study we have hypothesized that polyproline repeat proteins exert their control over biological events such as mineral growth, plaque aggregation, or viscous adhesion by altering the length of their central repeat domain, resulting in dramatic changes in supramolecular assembly dimensions. In order to test our hypothesis, we have used the vertebrate mineralization protein amelogenin as an exemplar and determined the biological effect of the four-fold increased polyproline tandem repeat length in the amphibian/mammalian transition. To study the effect of polyproline repeat length on matrix assembly, protein structure, and apatite crystal growth, we have measured supramolecular assembly dimensions in various vertebrates using atomic force microscopy, tested the effect of protein assemblies on crystal growth by electron microscopy, generated a transgenic mouse model to examine the effect of an abbreviated polyproline sequence on crystal growth, and determined the structure of polyproline repeat elements using 3D NMR. Our study shows that an increase in PXX/PXQ tandem repeat motif length results (i in a compaction of protein matrix subunit dimensions, (ii reduced conformational variability, (iii an increase in polyproline II helices, and (iv promotion of apatite crystal length. Together, these findings establish a direct relationship between polyproline tandem

  16. VARUN: discovering extensible motifs under saturation constraints.

    Science.gov (United States)

    Apostolico, Alberto; Comin, Matteo; Parida, Laxmi

    2010-01-01

    The discovery of motifs in biosequences is frequently torn between the rigidity of the model on one hand and the abundance of candidates on the other hand. In particular, motifs that include wild cards or "don't cares" escalate exponentially with their number, and this gets only worse if a don't care is allowed to stretch up to some prescribed maximum length. In this paper, a notion of extensible motif in a sequence is introduced and studied, which tightly combines the structure of the motif pattern, as described by its syntactic specification, with the statistical measure of its occurrence count. It is shown that a combination of appropriate saturation conditions and the monotonicity of probabilistic scores over regions of constant frequency afford us significant parsimony in the generation and testing of candidate overrepresented motifs. A suite of software programs called Varun is described, implementing the discovery of extensible motifs of the type considered. The merits of the method are then documented by results obtained in a variety of experiments primarily targeting protein sequence families. Of equal importance seems the fact that the sets of all surprising motifs returned in each experiment are extracted faster and come in much more manageable sizes than would be obtained in the absence of saturation constraints.

  17. The DNA-binding domain of BenM reveals the structural basis for the recognition of a T-N11-A sequence motif by LysR-type transcriptional regulators.

    Science.gov (United States)

    Alanazi, Amer M; Neidle, Ellen L; Momany, Cory

    2013-10-01

    LysR-type transcriptional regulators (LTTRs) play critical roles in metabolism and constitute the largest family of bacterial regulators. To understand protein-DNA interactions, atomic structures of the DNA-binding domain and linker-helix regions of a prototypical LTTR, BenM, were determined by X-ray crystallography. BenM structures with and without bound DNA reveal a set of highly conserved amino acids that interact directly with DNA bases. At the N-terminal end of the recognition helix (α3) of a winged-helix-turn-helix DNA-binding motif, several residues create hydrophobic pockets (Pro30, Pro31 and Ser33). These pockets interact with the methyl groups of two thymines in the DNA-recognition motif and its complementary strand, T-N11-A. This motif usually includes some dyad symmetry, as exemplified by a sequence that binds two subunits of a BenM tetramer (ATAC-N7-GTAT). Gln29 forms hydrogen bonds to adenine in the first position of the recognition half-site (ATAC). Another hydrophobic pocket defined by Ala28, Pro30 and Pro31 interacts with the methyl group of thymine, complementary to the base at the third position of the half-site. Arg34 interacts with the complementary base of the 3' position. Arg53, in the wing, provides AT-tract recognition in the minor groove. For DNA recognition, LTTRs use highly conserved interactions between amino acids and nucleotide bases as well as numerous less-conserved secondary interactions.

  18. Mononucleotide repeats are asymmetrically distributed in fungal genes

    Directory of Open Access Journals (Sweden)

    de Graaff Leo H

    2008-12-01

    Full Text Available Abstract Background Systematic analyses of sequence features have resulted in a better characterisation of the organisation of the genome. A previous study in prokaryotes on the distribution of sequence repeats, which are notoriously variable and can disrupt the reading frame in genes, showed that these motifs are skewed towards gene termini, specifically the 5' end of genes. For eukaryotes no such intragenic analysis has been performed, though this could indicate the pervasiveness of this distribution bias, thereby helping to expose the selective pressures causing it. Results In fungal gene repertoires we find a similar 5' bias of intragenic mononucleotide repeats, most notably for Candida spp., whereas e.g. Coccidioides spp. display no such bias. With increasing repeat length, ever larger discrepancies are observed in genome repertoire fractions containing such repeats, with up to an 80-fold difference in gene fractions at repeat lengths of 10 bp and longer. This species-specific difference in gene fractions containing large repeats could be attributed to variations in intragenic repeat tolerance. Furthermore, long transcripts experience an even more prominent bias towards the gene termini, with possibly a more adaptive role for repeat-containing short transcripts. Conclusion Mononucleotide repeats are intragenically biased in numerous fungal genomes, similar to earlier studies on prokaryotes, indicative of a similar selective pressure in gene organization.

  19. Analysis of the Complete Mycoplasma hominis LBD-4 Genome Sequence Reveals Strain-Variable Prophage Insertion and Distinctive Repeat-Containing Surface Protein Arrangements

    OpenAIRE

    2015-01-01

    The complete genome sequence of Mycoplasma hominis LBD-4 has been determined and the gene content ascribed. The 715,165-bp chromosome contains 620 genes, including 14 carried by a strain-variable prophage genome related to Mycoplasma fermentans MFV-1 and Mycoplasma arthritidis MAV-1. Comparative analysis with the genome of M. hominis PG21T reveals distinctive arrangements of repeat-containing surface proteins.

  20. Analysis of the Complete Mycoplasma hominis LBD-4 Genome Sequence Reveals Strain-Variable Prophage Insertion and Distinctive Repeat-Containing Surface Protein Arrangements.

    Science.gov (United States)

    Calcutt, Michael J; Foecking, Mark F

    2015-02-26

    The complete genome sequence of Mycoplasma hominis LBD-4 has been determined and the gene content ascribed. The 715,165-bp chromosome contains 620 genes, including 14 carried by a strain-variable prophage genome related to Mycoplasma fermentans MFV-1 and Mycoplasma arthritidis MAV-1. Comparative analysis with the genome of M. hominis PG21(T) reveals distinctive arrangements of repeat-containing surface proteins.