WorldWideScience

Sample records for functional sequence categoriesof

  1. Conservation patterns in different functional sequence categoriesof divergent Drosophila species

    Energy Technology Data Exchange (ETDEWEB)

    Papatsenko, Dmitri; Kislyuk, Andrey; Levine, Michael; Dubchak, Inna

    2005-10-01

    We have explored the distributions of fully conservedungapped blocks in genome-wide pairwise alignments of recently completedspecies of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilisand D.mojavensis. Based on these distributions we have found that nearlyevery functional sequence category possesses its own distinctiveconservation pattern, sometimes independent of the overall sequenceconservation level. In the coding and regulatory regions, the ungappedblocks were longer than in introns, UTRs and non-functional sequences. Atthe same time, the blocks in the coding regions carried 3N+2 signaturecharacteristic to synonymic substitutions in the 3rd codon positions.Larger block sizes in transcription regulatory regions can be explainedby the presence of conserved arrays of binding sites for transcriptionfactors. We also have shown that the longest ungapped blocks, or'ultraconserved' sequences, are associated with specific gene groups,including those encoding ion channels and components of the cytoskeleton.We discussed how restrained conservation patterns may help in mappingfunctional sequence categories and improving genomeannotation.

  2. Prediction Error During Functional and Non-Functional Action Sequences

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Sørensen, Jesper

    2013-01-01

    recurrent networks were made and the results are presented in this article. The simulations show that non-functional action sequences do indeed increase prediction error, but that context representations, such as abstract goal information, can modulate the error signal considerably. It is also shown...... that the networks are sensitive to boundaries between sequences in both functional and non-functional actions....

  3. Fibonacci difference sequence spaces for modulus functions

    Directory of Open Access Journals (Sweden)

    Kuldip Raj

    2015-05-01

    Full Text Available In the present paper we introduce Fibonacci difference sequence spaces l(F, Ƒ, p, u and  l_∞(F, Ƒ, p, u by using a sequence of modulus functions and a new band matrix F. We also make an effort to study some inclusion relations, topological and geometric properties of these spaces. Furthermore, the alpha, beta, gamma duals and matrix transformation of the space l(F, Ƒ, p, u are determined.

  4. Spontaneous processing of functional and non-functional action sequences

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Sørensen, Jesper

    2011-01-01

    as sub-categories of non-functional behavior (i.e., actions lacking causal coherence and a necessary integration between subparts). New insights in human action processing can help us explain how cognition might vary depending on the type of behavior processed. Using an event segmentation paradigm, we...... conducted two experiments eliciting differences in participants' response patterns to functional and non-functional actions. Participants consistently segmented non-functional action sequences into smaller units indicating either an attentional shift to the level of gesture analysis or a problem...... of representational integration. Experimental studies of non-functional behavior can strengthen explanations of recurrent features of human action processing, such as ritual and ritualized behavior, as well as indicate potential sources and effects of breakdown of the system....

  5. Function-Based Algorithms for Biological Sequences

    Science.gov (United States)

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  6. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  7. Determining and comparing protein function in Bacterial genome sequences

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla

    of this class have very little homology to other known genomes making functional annotation based on sequence similarity very difficult. Inspired in part by this analysis, an approach for comparative functional annotation was created based public sequenced genomes, CMGfunc. Functionally related groups......In November 2013, there was around 21.000 different prokaryotic genomes sequenced and publicly available, and the number is growing daily with another 20.000 or more genomes expected to be sequenced and deposited by the end of 2014. An important part of the analysis of this data is the functional...... annotation of genes – the descriptions assigned to genes that describe the likely function of the encoded proteins. This process is limited by several factors, including the definition of a function which can be more or less specific as well as how many genes can actually be assigned a function based...

  8. Filling the gap between sequence and function: a bioinformatics approach

    NARCIS (Netherlands)

    Bargsten, J.W.

    2014-01-01

    The research presented in this thesis focuses on deriving function from sequence information, with the emphasis on plant sequence data. Unravelling the impact of genomic elements, in most cases genes, on the phenotype of an organism is a major challenge in biological research and modern plant

  9. Zero sequences of holomorphic functions, representation of meromorphic functions. II. Entire functions

    International Nuclear Information System (INIS)

    Khabibullin, Bulat N

    2009-01-01

    Let Λ={λ k } be a sequence of points in the complex plane C and f a non-trivial entire function of finite order ρ and finite type σ such that f=0 on Λ. Upper bounds for functions such as the Weierstrass-Hadamard canonical product of order ρ constructed from the sequence Λ are obtained. Similar bounds for meromorphic functions are also derived. These results are used to estimate the radius of completeness of a system of exponentials in C. Bibliography: 26 titles.

  10. Arithmetic convergent sequence space defined by modulus function

    Directory of Open Access Journals (Sweden)

    Taja Yaying

    2019-10-01

    Full Text Available The aim of this article is to introduce the sequence spaces $AC(f$ and $AS(f$ using arithmetic convergence and modulus function, and study algebraic and topological properties of this space, and certain inclusion results.

  11. The convergence of the order sequence and the solution function sequence on fractional partial differential equation

    Science.gov (United States)

    Rusyaman, E.; Parmikanti, K.; Chaerani, D.; Asefan; Irianingsih, I.

    2018-03-01

    One of the application of fractional ordinary differential equation is related to the viscoelasticity, i.e., a correlation between the viscosity of fluids and the elasticity of solids. If the solution function develops into function with two or more variables, then its differential equation must be changed into fractional partial differential equation. As the preliminary study for two variables viscoelasticity problem, this paper discusses about convergence analysis of function sequence which is the solution of the homogenous fractional partial differential equation. The method used to solve the problem is Homotopy Analysis Method. The results show that if given two real number sequences (αn) and (βn) which converge to α and β respectively, then the solution function sequences of fractional partial differential equation with order (αn, βn) will also converge to the solution function of fractional partial differential equation with order (α, β).

  12. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  13. Inaudible functional MRI using a truly mute gradient echo sequence

    International Nuclear Information System (INIS)

    Marcar, V.L.; Girard, F.; Rinkel, Y.; Schneider, J.F.; Martin, E.

    2002-01-01

    We performed functional MRI experiments using a mute version of a gradient echo sequence on adult volunteers using either a simple visual stimulus (flicker goggles: 4 subjects) or an auditory stimulus (music: 4 subjects). Because the mute sequence delivers fewer images per unit time than a fast echo planar imaging (EPI) sequence, we explored our data using a parametric ANOVA test and a non-parametric Wilcoxon-Mann-Whitney test in addition to performing a cross-correlation analysis. All three methods were in close agreement regarding the location of the BOLD contrast signal change. We demonstrated that, using appropriate statistical analysis, functional MRI using an MR sequence that is acoustically inaudible to the subject is feasible. Furthermore compared with the ''silent'' event-related procedures involving an EPI protocol, our mGE protocol compares favourably with respect to experiment time and the BOLD signal. (orig.)

  14. Inaudible functional MRI using a truly mute gradient echo sequence

    Energy Technology Data Exchange (ETDEWEB)

    Marcar, V.L. [University of Zurich, Department of Psychology, Neuropsychology, Treichlerstrasse 10, 8032 Zurich (Switzerland); Girard, F. [GE Medical Systems SA, 283, rue de la Miniere B.P. 34, 78533 Buc Cedex (France); Rinkel, Y.; Schneider, J.F.; Martin, E. [University Children' s Hospital, Neuroradiology and Magnetic Resonance, Department of Diagnostic Imaging, Steinwiesstrasse 75, 8032 Zurich (Switzerland)

    2002-11-01

    We performed functional MRI experiments using a mute version of a gradient echo sequence on adult volunteers using either a simple visual stimulus (flicker goggles: 4 subjects) or an auditory stimulus (music: 4 subjects). Because the mute sequence delivers fewer images per unit time than a fast echo planar imaging (EPI) sequence, we explored our data using a parametric ANOVA test and a non-parametric Wilcoxon-Mann-Whitney test in addition to performing a cross-correlation analysis. All three methods were in close agreement regarding the location of the BOLD contrast signal change. We demonstrated that, using appropriate statistical analysis, functional MRI using an MR sequence that is acoustically inaudible to the subject is feasible. Furthermore compared with the ''silent'' event-related procedures involving an EPI protocol, our mGE protocol compares favourably with respect to experiment time and the BOLD signal. (orig.)

  15. Functional annotation from the genome sequence of the giant panda

    OpenAIRE

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-01-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided in...

  16. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    Energy Technology Data Exchange (ETDEWEB)

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  17. Massively parallel interrogation of aptamer sequence, structure and function.

    Directory of Open Access Journals (Sweden)

    Nicholas O Fischer

    Full Text Available BACKGROUND: Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. METHODOLOGY/PRINCIPAL FINDINGS: High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. CONCLUSION AND SIGNIFICANCE: The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  18. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  19. deFUME: Dynamic exploration of functional metagenomic sequencing data

    DEFF Research Database (Denmark)

    van der Helm, Eric; Geertz-Hansen, Henrik Marcus; Genee, Hans Jasper

    2015-01-01

    is time consuming and constitutes a major bottleneck for experimental researchers in the field. Here we present the deFUME web server, an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, tailored to meet the requirements of non......-bioinformaticians. The web-server integrates multiple analysis steps into one single workflow: read assembly, open reading frame prediction, and annotation with BLAST, InterPro and GO classifiers. Analysis results are visualized in an online dynamic web-interface. The deFUME webserver provides a fast track from raw sequence...

  20. Mining dynamic noteworthy functions in software execution sequences.

    Science.gov (United States)

    Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.

  1. Functional annotation from the genome sequence of the giant panda.

    Science.gov (United States)

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-08-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

  2. Asymptotically double lacunry equivalent sequences defined by Orlicz functions

    Directory of Open Access Journals (Sweden)

    Ayhan Esi

    2014-04-01

    Full Text Available This paper presents the following definition which is natural combition of the definition for asymptotically equivalent and Orlicz function. The two nonnegative double sequences x=(x_{k,l} and y=(y_{k,l} are said to be M-asymptotically double equivalent to multiple L provided that for every ε>0, P-lim_{k,l}M(((|((x_{k,l}/(y_{k,l}-L|/ρ=0, for some ρ>0, (denoted by x∽y and simply M-asymptotically double equivalent if L=1. Also we give some new concepts related to this definition and some inclusion theorems.

  3. Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence

    NARCIS (Netherlands)

    Al-Shahib, A.; Breitling, R.; Gilbert, D.

    2005-01-01

    Abstract: When the standard approach to predict protein function by sequence homology fails, other alternative methods can be used that require only the amino acid sequence for predicting function. One such approach uses machine learning to predict protein function directly from amino acid sequence

  4. Universal sequence replication, reversible polymerization and early functional biopolymers: a model for the initiation of prebiotic sequence evolution.

    Directory of Open Access Journals (Sweden)

    Sara Imari Walker

    Full Text Available Many models for the origin of life have focused on understanding how evolution can drive the refinement of a preexisting enzyme, such as the evolution of efficient replicase activity. Here we present a model for what was, arguably, an even earlier stage of chemical evolution, when polymer sequence diversity was generated and sustained before, and during, the onset of functional selection. The model includes regular environmental cycles (e.g. hydration-dehydration cycles that drive polymers between times of replication and functional activity, which coincide with times of different monomer and polymer diffusivity. Template-directed replication of informational polymers, which takes place during the dehydration stage of each cycle, is considered to be sequence-independent. New sequences are generated by spontaneous polymer formation, and all sequences compete for a finite monomer resource that is recycled via reversible polymerization. Kinetic Monte Carlo simulations demonstrate that this proposed prebiotic scenario provides a robust mechanism for the exploration of sequence space. Introduction of a polymer sequence with monomer synthetase activity illustrates that functional sequences can become established in a preexisting pool of otherwise non-functional sequences. Functional selection does not dominate system dynamics and sequence diversity remains high, permitting the emergence and spread of more than one functional sequence. It is also observed that polymers spontaneously form clusters in simulations where polymers diffuse more slowly than monomers, a feature that is reminiscent of a previous proposal that the earliest stages of life could have been defined by the collective evolution of a system-wide cooperation of polymer aggregates. Overall, the results presented demonstrate the merits of considering plausible prebiotic polymer chemistries and environments that would have allowed for the rapid turnover of monomer resources and for

  5. Evolution of sequence-defined highly functionalized nucleic acid polymers

    Science.gov (United States)

    Chen, Zhen; Lichtor, Phillip A.; Berliner, Adrian P.; Chen, Jonathan C.; Liu, David R.

    2018-03-01

    The evolution of sequence-defined synthetic polymers made of building blocks beyond those compatible with polymerase enzymes or the ribosome has the potential to generate new classes of receptors, catalysts and materials. Here we describe a ligase-mediated DNA-templated polymerization and in vitro selection system to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks that contain eight chemically diverse side chains on a DNA backbone. Through iterated cycles of polymer translation, selection and reverse translation, we discovered HFNAPs that bind proprotein convertase subtilisin/kexin type 9 (PCSK9) and interleukin-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity (KD = 3 nM). This evolved polymer potently inhibited the binding between PCSK9 and the low-density lipoprotein receptor. Structure-activity relationship studies revealed that specific side chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

  6. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    Science.gov (United States)

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence

  7. SIFTS: Structure Integration with Function, Taxonomy and Sequences resource

    Science.gov (United States)

    Velankar, Sameer; Dana, José M.; Jacobsen, Julius; van Ginkel, Glen; Gane, Paul J.; Luo, Jie; Oldfield, Thomas J.; O’Donovan, Claire; Martin, Maria-Jesus; Kleywegt, Gerard J.

    2013-01-01

    The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS; http://pdbe.org/sifts) is a close collaboration between the Protein Data Bank in Europe (PDBe) and UniProt. The two teams have developed a semi-automated process for maintaining up-to-date cross-reference information to UniProt entries, for all protein chains in the PDB entries present in the UniProt database. This process is carried out for every weekly PDB release and the information is stored in the SIFTS database. The SIFTS process includes cross-references to other biological resources such as Pfam, SCOP, CATH, GO, InterPro and the NCBI taxonomy database. The information is exported in XML format, one file for each PDB entry, and is made available by FTP. Many bioinformatics resources use SIFTS data to obtain cross-references between the PDB and other biological databases so as to provide their users with up-to-date information. PMID:23203869

  8. Functional noncoding sequences derived from SINEs in the mammalian genome.

    Science.gov (United States)

    Nishihara, Hidenori; Smit, Arian F A; Okada, Norihiro

    2006-07-01

    Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the approximately 1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality.

  9. Some double sequence spaces of interval numbers defined by Orlicz function

    Directory of Open Access Journals (Sweden)

    Ayhan Esi

    2014-10-01

    Full Text Available In this paper we introduce some interval valued double sequence spaces defined by Orlicz function and study different properties of these spaces like inclusion relations, solidity, etc. We establish some inclusion relations among them. Also we introduce the concept of double statistical convergence for interval number sequences and give an inclusion relation between interval valued double sequence spaces.

  10. Functional analysis of bipartite begomovirus coat protein promoter sequences

    International Nuclear Information System (INIS)

    Lacatus, Gabriela; Sunter, Garry

    2008-01-01

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters

  11. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    Science.gov (United States)

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-04

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues.

    Science.gov (United States)

    Garrido-Martín, Diego; Pazos, Florencio

    2018-02-27

    The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.

  13. Chaos game representation of functional protein sequences, and simulation and multifractal analysis of induced measures

    International Nuclear Information System (INIS)

    Zu-Guo, Yu; Qian-Jun, Xiao; Long, Shi; Jun-Wu, Yu; Anh, Vo

    2010-01-01

    Investigating the biological function of proteins is a key aspect of protein studies. Bioinformatic methods become important for studying the biological function of proteins. In this paper, we first give the chaos game representation (CGR) of randomly-linked functional protein sequences, then propose the use of the recurrent iterated function systems (RIFS) in fractal theory to simulate the measure based on their chaos game representations. This method helps to extract some features of functional protein sequences, and furthermore the biological functions of these proteins. Then multifractal analysis of the measures based on the CGRs of randomly-linked functional protein sequences are performed. We find that the CGRs have clear fractal patterns. The numerical results show that the RIFS can simulate the measure based on the CGR very well. The relative standard error and the estimated probability matrix in the RIFS do not depend on the order to link the functional protein sequences. The estimated probability matrices in the RIFS with different biological functions are evidently different. Hence the estimated probability matrices in the RIFS can be used to characterise the difference among linked functional protein sequences with different biological functions. From the values of the D q curves, one sees that these functional protein sequences are not completely random. The D q of all linked functional proteins studied are multifractal-like and sufficiently smooth for the C q (analogous to specific heat) curves to be meaningful. Furthermore, the D q curves of the measure μ based on their CGRs for different orders to link the functional protein sequences are almost identical if q ≥ 0. Finally, the C q curves of all linked functional proteins resemble a classical phase transition at a critical point. (cross-disciplinary physics and related areas of science and technology)

  14. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    Science.gov (United States)

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  15. Advancing Functional Metagenomics using Synthetic Biology from Soil to Sequence

    DEFF Research Database (Denmark)

    van der Helm, Eric

    as ‘functional metagenomics’, the DNA of these bacteria can be recovered from the environment and used by host-bacteria which can be grown in a lab. This allows us to make use of the capabilities of the billions of bacteria that a represent in the environment without actually growing them but by making use...

  16. Scoring protein relationships in functional interaction networks predicted from sequence data.

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    Full Text Available UNLABELLED: The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. AVAILABILITY: Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes.

  17. Automatic discovery of cross-family sequence features associated with protein function

    Directory of Open Access Journals (Sweden)

    Krings Andrea

    2006-01-01

    Full Text Available Abstract Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for

  18. Coding and decoding libraries of sequence-defined functional copolymers synthesized via photoligation.

    Science.gov (United States)

    Zydziak, Nicolas; Konrad, Waldemar; Feist, Florian; Afonin, Sergii; Weidner, Steffen; Barner-Kowollik, Christopher

    2016-11-30

    Designing artificial macromolecules with absolute sequence order represents a considerable challenge. Here we report an advanced light-induced avenue to monodisperse sequence-defined functional linear macromolecules up to decamers via a unique photochemical approach. The versatility of the synthetic strategy-combining sequential and modular concepts-enables the synthesis of perfect macromolecules varying in chemical constitution and topology. Specific functions are placed at arbitrary positions along the chain via the successive addition of monomer units and blocks, leading to a library of functional homopolymers, alternating copolymers and block copolymers. The in-depth characterization of each sequence-defined chain confirms the precision nature of the macromolecules. Decoding of the functional information contained in the molecular structure is achieved via tandem mass spectrometry without recourse to their synthetic history, showing that the sequence information can be read. We submit that the presented photochemical strategy is a viable and advanced concept for coding individual monomer units along a macromolecular chain.

  19. Applications of high-throughput sequencing to chromatin structure and function in mammals

    OpenAIRE

    Dunham, Ian

    2009-01-01

    High-throughput DNA sequencing approaches have enabled direct interrogation of chromatin samples from mammalian cells. We are beginning to develop a genome-wide description of nuclear function during development, but further data collection, refinement, and integration are needed.

  20. On paranormed Zweier ideal convergent sequence spaces defined By Orlicz function

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2014-10-01

    Full Text Available In this article we introduce paranorm ideal convergent sequence spaces using Zweier transform and Orlicz function. We study some topological and algebraic properties. Further we prove some inclusion relations related to these new spaces.

  1. PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways

    OpenAIRE

    Mi, Huaiyu; Guo, Nan; Kejariwal, Anish; Thomas, Paul D.

    2006-01-01

    PANTHER is a freely available, comprehensive software system for relating protein sequence evolution to the evolution of specific protein functions and biological roles. Since 2005, there have been three main improvements to PANTHER. First, the sequences used to create evolutionary trees are carefully selected to provide coverage of phylogenetic as well as functional information. Second, PANTHER is now a member of the InterPro Consortium, and the PANTHER hidden markov Models (HMMs) are distri...

  2. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    Science.gov (United States)

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  3. Evolutionary rates at codon sites may be used to align sequences and infer protein domain function

    Directory of Open Access Journals (Sweden)

    Hazelhurst Scott

    2010-03-01

    Full Text Available Abstract Background Sequence alignments form part of many investigations in molecular biology, including the determination of phylogenetic relationships, the prediction of protein structure and function, and the measurement of evolutionary rates. However, to obtain meaningful results, a significant degree of sequence similarity is required to ensure that the alignments are accurate and the inferences correct. Limitations arise when sequence similarity is low, which is particularly problematic when working with fast-evolving genes, evolutionary distant taxa, genomes with nucleotide biases, and cases of convergent evolution. Results A novel approach was conceptualized to address the "low sequence similarity" alignment problem. We developed an alignment algorithm termed FIRE (Functional Inference using the Rates of Evolution, which aligns sequences using the evolutionary rate at codon sites, as measured by the dN/dS ratio, rather than nucleotide or amino acid residues. FIRE was used to test the hypotheses that evolutionary rates can be used to align sequences and that the alignments may be used to infer protein domain function. Using a range of test data, we found that aligning domains based on evolutionary rates was possible even when sequence similarity was very low (for example, antibody variable regions. Furthermore, the alignment has the potential to infer protein domain function, indicating that domains with similar functions are subject to similar evolutionary constraints. These data suggest that an evolutionary rate-based approach to sequence analysis (particularly when combined with structural data may be used to study cases of convergent evolution or when sequences have very low similarity. However, when aligning homologous gene sets with sequence similarity, FIRE did not perform as well as the best traditional alignment algorithms indicating that the conventional approach of aligning residues as opposed to evolutionary rates remains the

  4. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-05-01

    The purpose of this dissertation is to present a methodology to model global sequence alignment problem as directed acyclic graph which helps to extract all possible optimal alignments. Moreover, a mechanism to sequentially optimize sequence alignment problem relative to different cost functions is suggested. Sequence alignment is mostly important in computational biology. It is used to find evolutionary relationships between biological sequences. There are many algo- rithms that have been developed to solve this problem. The most famous algorithms are Needleman-Wunsch and Smith-Waterman that are based on dynamic program- ming. In dynamic programming, problem is divided into a set of overlapping sub- problems and then the solution of each subproblem is found. Finally, the solutions to these subproblems are combined into a final solution. In this thesis it has been proved that for two sequences of length m and n over a fixed alphabet, the suggested optimization procedure requires O(mn) arithmetic operations per cost function on a single processor machine. The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  5. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences

    Directory of Open Access Journals (Sweden)

    Meinicke Peter

    2009-09-01

    Full Text Available Abstract Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  6. Reconciling mass functions with the star-forming main sequence via mergers

    Science.gov (United States)

    Steinhardt, Charles L.; Yurk, Dominic; Capak, Peter

    2017-06-01

    We combine star formation along the 'main sequence', quiescence and clustering and merging to produce an empirical model for the evolution of individual galaxies. Main-sequence star formation alone would significantly steepen the stellar mass function towards low redshift, in sharp conflict with observation. However, a combination of star formation and merging produces a consistent result for correct choice of the merger rate function. As a result, we are motivated to propose a model in which hierarchical merging is disconnected from environmentally independent star formation. This model can be tested via correlation functions and would produce new constraints on clustering and merging.

  7. Motor sequence learning-induced neural efficiency in functional brain connectivity.

    Science.gov (United States)

    Karim, Helmet T; Huppert, Theodore J; Erickson, Kirk I; Wollam, Mariegold E; Sparto, Patrick J; Sejdić, Ervin; VanSwearingen, Jessie M

    2017-02-15

    Previous studies have shown the functional neural circuitry differences before and after an explicitly learned motor sequence task, but have not assessed these changes during the process of motor skill learning. Functional magnetic resonance imaging activity was measured while participants (n=13) were asked to tap their fingers to visually presented sequences in blocks that were either the same sequence repeated (learning block) or random sequences (control block). Motor learning was associated with a decrease in brain activity during learning compared to control. Lower brain activation was noted in the posterior parietal association area and bilateral thalamus during the later periods of learning (not during the control). Compared to the control condition, we found the task-related motor learning was associated with decreased connectivity between the putamen and left inferior frontal gyrus and left middle cingulate brain regions. Motor learning was associated with changes in network activity, spatial extent, and connectivity. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.

    Science.gov (United States)

    Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke

    2009-02-15

    Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http

  9. A method for partitioning the information contained in a protein sequence between its structure and function.

    Science.gov (United States)

    Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido

    2018-05-23

    Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.

  10. Likelihood functions for the analysis of single-molecule binned photon sequences

    Energy Technology Data Exchange (ETDEWEB)

    Gopich, Irina V., E-mail: irinag@niddk.nih.gov [Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892 (United States)

    2012-03-02

    Graphical abstract: Folding of a protein with attached fluorescent dyes, the underlying conformational trajectory of interest, and the observed binned photon trajectory. Highlights: Black-Right-Pointing-Pointer A sequence of photon counts can be analyzed using a likelihood function. Black-Right-Pointing-Pointer The exact likelihood function for a two-state kinetic model is provided. Black-Right-Pointing-Pointer Several approximations are considered for an arbitrary kinetic model. Black-Right-Pointing-Pointer Improved likelihood functions are obtained to treat sequences of FRET efficiencies. - Abstract: We consider the analysis of a class of experiments in which the number of photons in consecutive time intervals is recorded. Sequence of photon counts or, alternatively, of FRET efficiencies can be studied using likelihood-based methods. For a kinetic model of the conformational dynamics and state-dependent Poisson photon statistics, the formalism to calculate the exact likelihood that this model describes such sequences of photons or FRET efficiencies is developed. Explicit analytic expressions for the likelihood function for a two-state kinetic model are provided. The important special case when conformational dynamics are so slow that at most a single transition occurs in a time bin is considered. By making a series of approximations, we eventually recover the likelihood function used in hidden Markov models. In this way, not only is insight gained into the range of validity of this procedure, but also an improved likelihood function can be obtained.

  11. RECONCILING THE OBSERVED STAR-FORMING SEQUENCE WITH THE OBSERVED STELLAR MASS FUNCTION

    International Nuclear Information System (INIS)

    Leja, Joel; Van Dokkum, Pieter G.; Franx, Marijn; Whitaker, Katherine E.

    2015-01-01

    We examine the connection between the observed star-forming sequence (SFR ∝ M α ) and the observed evolution of the stellar mass function in the range 0.2 < z < 2.5. We find that the star-forming sequence cannot have a slope α ≲ 0.9 at all masses and redshifts because this would result in a much higher number density at 10 < log (M/M ☉ ) < 11 by z = 1 than is observed. We show that a transition in the slope of the star-forming sequence, such that α = 1 at log (M/M ☉ ) < 10.5 and α = 0.7-0.13z (Whitaker et al.) at log (M/M ☉ ) > 10.5, greatly improves agreement with the evolution of the stellar mass function. We then derive a star-forming sequence that reproduces the evolution of the mass function by design. This star-forming sequence is also well described by a broken power law, with a shallow slope at high masses and a steep slope at low masses. At z = 2, it is offset by ∼0.3 dex from the observed star-forming sequence, consistent with the mild disagreement between the cosmic star formation rate (SFR) and recent observations of the growth of the stellar mass density. It is unclear whether this problem stems from errors in stellar mass estimates, errors in SFRs, or other effects. We show that a mass-dependent slope is also seen in other self-consistent models of galaxy evolution, including semianalytical, hydrodynamical, and abundance-matching models. As part of the analysis, we demonstrate that neither mergers nor hidden low-mass quiescent galaxies are likely to reconcile the evolution of the mass function and the star-forming sequence. These results are supported by observations from Whitaker et al

  12. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

    KAUST Repository

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert

    2017-01-01

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often

  13. Hunting down frame shifts: Ecological analysis of diverse functional gene sequences

    Directory of Open Access Journals (Sweden)

    Michal eStrejcek

    2015-11-01

    Full Text Available Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frame-shifts (FS. Genes encoding for alpha subunits of biphenyl (bphA and benzoate (benA dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 43.1% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of Maximum Expected Error (MEE filtering and single linkage pre-clustering (SLP proved the most efficient read procession. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study and the tool was implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/ and https://github.com/rdpstaff/Framebot.

  14. Identification and functional characterization of a novel bipartite nuclear localization sequence in ARID1A

    Energy Technology Data Exchange (ETDEWEB)

    Bateman, Nicholas W. [Women' s Health Integrated Research Center at Inova Health System, Gynecologic Cancer Center of Excellence, Annandale 22003, VA (United States); The John P. Murtha Cancer Center, Walter Reed National Military Medical Center, 8901 Wisconsin Avenue, Bethesda 20889, MD (United States); Shoji, Yutaka [Department of Obstetrics, Gynecology and Reproductive Biology, Michigan State University, Grand Rapids 49503, MI (United States); Conrads, Kelly A.; Stroop, Kevin D. [Women' s Health Integrated Research Center at Inova Health System, Gynecologic Cancer Center of Excellence, Annandale 22003, VA (United States); Hamilton, Chad A. [Women' s Health Integrated Research Center at Inova Health System, Gynecologic Cancer Center of Excellence, Annandale 22003, VA (United States); The John P. Murtha Cancer Center, Walter Reed National Military Medical Center, 8901 Wisconsin Avenue, Bethesda 20889, MD (United States); Gynecologic Oncology Service, Department of Obstetrics and Gynecology, Walter Reed National Military Medical Center, 8901 Wisconsin Ave, MD, Bethesda, 20889 (United States); Department of Obstetrics and Gynecology, Uniformed Services University of the Health Sciences, Bethesda 20814, MD (United States); Darcy, Kathleen M. [Women' s Health Integrated Research Center at Inova Health System, Gynecologic Cancer Center of Excellence, Annandale 22003, VA (United States); The John P. Murtha Cancer Center, Walter Reed National Military Medical Center, 8901 Wisconsin Avenue, Bethesda 20889, MD (United States); Maxwell, George L. [Department of Obstetrics and Gynecology, Inova Fairfax Hospital, Falls Church, VA 22042 (United States); Risinger, John I. [Department of Obstetrics, Gynecology and Reproductive Biology, Michigan State University, Grand Rapids 49503, MI (United States); and others

    2016-01-01

    AT-rich interactive domain-containing protein 1A (ARID1A) is a recently identified nuclear tumor suppressor frequently altered in solid tumor malignancies. We have identified a bipartite-like nuclear localization sequence (NLS) that contributes to nuclear import of ARID1A not previously described. We functionally confirm activity using GFP constructs fused with wild-type or mutant NLS sequences. We further show that cyto-nuclear localized, bipartite NLS mutant ARID1A exhibits greater stability than nuclear-localized, wild-type ARID1A. Identification of this undescribed functional NLS within ARID1A contributes vital insights to rationalize the impact of ARID1A missense mutations observed in patient tumors. - Highlights: • We have identified a bipartite nuclear localization sequence (NLS) in ARID1A. • Confirmation of the NLS was performed using GFP constructs. • NLS mutant ARID1A exhibits greater stability than wild-type ARID1A.

  15. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    International Nuclear Information System (INIS)

    Arneodo, Alain; Vaillant, Cedric; Audit, Benjamin; Argoul, Francoise; D'Aubenton-Carafa, Yves; Thermes, Claude

    2011-01-01

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  16. On algorithmic equivalence of instruction sequences for computing bit string functions

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2015-01-01

    Every partial function from bit strings of a given length to bit strings of a possibly different given length can be computed by a finite instruction sequence that contains only instructions to set and get the content of Boolean registers, forward jump instructions, and a termination instruction. We

  17. On algorithmic equivalence of instruction sequences for computing bit string functions

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2014-01-01

    Every partial function from bit strings of a given length to bit strings of a possibly different given length can be computed by a finite instruction sequence that contains only instructions to set and get the content of Boolean registers, forward jump instructions, and a termination instruction. We

  18. Recurrence Relations and Generating Functions of the Sequence of Sums of Corresponding Factorials and Triangular Numbers

    Directory of Open Access Journals (Sweden)

    Romer C. Castillo

    2015-11-01

    Full Text Available This study established some recurrence relations and exponential generating functions of the sequence of factoriangular numbers. A factoriangular number is defined as a sum of corresponding factorial and triangular number. The proofs utilize algebraic manipulations with some known results from calculus, particularly on power series and Maclaurin’s series. The recurrence relations were found by manipulating the formula defining a factoringular number while the ascertained exponential generating functions were in the closed form.

  19. Functional role of a highly repetitive DNA sequence in anchorage of the mouse genome.

    Science.gov (United States)

    Neuer-Nitsche, B; Lu, X N; Werner, D

    1988-09-12

    The major portion of the eukaryotic genome consists of various categories of repetitive DNA sequences which have been studied with respect to their base compositions, organizations, copy numbers, transcription and species specificities; their biological roles, however, are still unclear. A novel quality of a highly repetitive mouse DNA sequence is described which points to a functional role: All copies (approximately 50,000 per haploid genome) of this DNA sequence reside on genomic Alu I DNA fragments each associated with nuclear polypeptides that are not released from DNA by proteinase K, SDS and phenol extraction. By this quality the repetitive DNA sequence is classified as a member of the sub-set of DNA sequences involved in tight DNA-polypeptide complexes which have been previously shown to be components of the subnuclear structure termed 'nuclear matrix'. From these results it has to be concluded that the repetitive DNA sequence characterized in this report represents or comprises a signal for a large number of site specific attachment points of the mouse genome in the nuclear matrix.

  20. The Sequences of 1504 Mutants in the Model Rice Variety Kitaake Facilitate Rapid Functional Genomic Studies.

    Science.gov (United States)

    Li, Guotian; Jain, Rashmi; Chern, Mawsheng; Pham, Nikki T; Martin, Joel A; Wei, Tong; Schackwitz, Wendy S; Lipzen, Anna M; Duong, Phat Q; Jones, Kyle C; Jiang, Liangrong; Ruan, Deling; Bauer, Diane; Peng, Yi; Barry, Kerrie W; Schmutz, Jeremy; Ronald, Pamela C

    2017-06-01

    The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake ( Oryza sativa ssp japonica ), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. This work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations. © 2017 American Society of Plant Biologists. All rights reserved.

  1. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    Science.gov (United States)

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  2. Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievals

    Directory of Open Access Journals (Sweden)

    Reneker Jeff

    2005-05-01

    Full Text Available Abstract Background Searching for small tandem/disperse repetitive DNA sequences streamlines many biomedical research processes. For instance, whole genomic array analysis in yeast has revealed 22 PHO-regulated genes. The promoter regions of all but one of them contain at least one of the two core Pho4p binding sites, CACGTG and CACGTT. In humans, microsatellites play a role in a number of rare neurodegenerative diseases such as spinocerebellar ataxia type 1 (SCA1. SCA1 is a hereditary neurodegenerative disease caused by an expanded CAG repeat in the coding sequence of the gene. In bacterial pathogens, microsatellites are proposed to regulate expression of some virulence factors. For example, bacteria commonly generate intra-strain diversity through phase variation which is strongly associated with virulence determinants. A recent analysis of the complete sequences of the Helicobacter pylori strains 26695 and J99 has identified 46 putative phase-variable genes among the two genomes through their association with homopolymeric tracts and dinucleotide repeats. Life scientists are increasingly interested in studying the function of small sequences of DNA. However, current search algorithms often generate thousands of matches – most of which are irrelevant to the researcher. Results We present our hash function as well as our search algorithm to locate small sequences of DNA within multiple genomes. Our system applies information retrieval algorithms to discover knowledge of cross-species conservation of repeat sequences. We discuss our incorporation of the Gene Ontology (GO database into these algorithms. We conduct an exhaustive time analysis of our system for various repetitive sequence lengths. For instance, a search for eight bases of sequence within 3.224 GBases on 49 different chromosomes takes 1.147 seconds on average. To illustrate the relevance of the search results, we conduct a search with and without added annotation terms for the

  3. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-01-01

    The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  4. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships.

    KAUST Repository

    Kosinski, Jan

    2013-02-08

    SUMMARY: MODexplorer is an integrated tool aimed at exploring the sequence, structural and functional diversity in protein families useful in homology modeling and in analyzing protein families in general. It takes as input either the sequence or the structure of a protein and provides alignments with its homologs along with a variety of structural and functional annotations through an interactive interface. The annotations include sequence conservation, similarity scores, ligand-, DNA- and RNA-binding sites, secondary structure, disorder, crystallographic structure resolution and quality scores of models implied by the alignments to the homologs of known structure. MODexplorer can be used to analyze sequence and structural conservation among the structures of similar proteins, to find structures of homologs solved in different conformational state or with different ligands and to transfer functional annotations. Furthermore, if the structure of the query is not known, MODexplorer can be used to select the modeling templates taking all this information into account and to build a comparative model. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modexplorer. Website implemented in HTML and JavaScript with all major browsers supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  5. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships.

    KAUST Repository

    Kosinski, Jan; Barbato, Alessandro; Tramontano, Anna

    2013-01-01

    SUMMARY: MODexplorer is an integrated tool aimed at exploring the sequence, structural and functional diversity in protein families useful in homology modeling and in analyzing protein families in general. It takes as input either the sequence or the structure of a protein and provides alignments with its homologs along with a variety of structural and functional annotations through an interactive interface. The annotations include sequence conservation, similarity scores, ligand-, DNA- and RNA-binding sites, secondary structure, disorder, crystallographic structure resolution and quality scores of models implied by the alignments to the homologs of known structure. MODexplorer can be used to analyze sequence and structural conservation among the structures of similar proteins, to find structures of homologs solved in different conformational state or with different ligands and to transfer functional annotations. Furthermore, if the structure of the query is not known, MODexplorer can be used to select the modeling templates taking all this information into account and to build a comparative model. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modexplorer. Website implemented in HTML and JavaScript with all major browsers supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  6. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    Science.gov (United States)

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  7. A functional U-statistic method for association analysis of sequencing data.

    Science.gov (United States)

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  8. From Sequence and Forces to Structure, Function and Evolution of Intrinsically Disordered Proteins

    Science.gov (United States)

    Forman-Kay, Julie D.; Mittag, Tanja

    2015-01-01

    Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales and compactness is shaping a unified understanding of structure-dynamics-disorder/function relationships. On the 20th anniversary of this journal, Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional and evolutionary properties. PMID:24010708

  9. fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets.

    Science.gov (United States)

    Madrigal, Pedro

    2017-03-01

    Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers. An R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/ . pmb59@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  10. Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.

    Science.gov (United States)

    Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D

    2004-10-01

    Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and

  11. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

    Science.gov (United States)

    Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

    2015-02-10

    Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.

  12. Bidirectional gene sequences with similar homology to functional proteins of alkane degrading bacterium pseudomonas fredriksbergensis DNA

    International Nuclear Information System (INIS)

    Megeed, A.A.

    2011-01-01

    The potential for two overlapping fragments of DNA from a clone of newly isolated alkanes degrading bacterium Pseudomonas frederiksbergensis encoding sequences with similar homology to two parts of functional proteins is described. One strand contains a sequence with high homology to alkanes monooxygenase (alkB), a member of the alkanes hydroxylase family, and the other strand contains a sequence with some homology to alcohol dehydrogenase gene (alkJ). Overlapping of the genes on opposite strands has been reported in eukaryotic species, and is now reported in a bacterial species. The sequence comparisons and ORFS results revealed that the regulation and the genes organization involved in alkane oxidation represented in Pseudomonas frederiksberghensis varies among the different known alkane degrading bacteria. The alk gene cluster containing homologues to the known alkane monooxygenase (alkB), and rubredoxin (alkG) are oriented in the same direction, whereas alcohol dehydrogenase (alkJ) is oriented in the opposite direction. Such genomes encode messages on both strands of the DNA, or in an overlapping but different reading frames, of the same strand of DNA. The possibility of creating novel genes from pre-existing sequences, known as overprinting, which is a widespread phenomenon in small viruses. Here, the origin and evolution of the gene overlap to bacteriophages belonging to the family Microviridae have been investigated. Such a phenomenon is most widely described in extremely small genomes such as those of viruses or small plasmids, yet here is a unique phenomenon. (author)

  13. Multiple amino acid sequence alignment nitrogenase component 1: insights into phylogenetics and structure-function relationships.

    Directory of Open Access Journals (Sweden)

    James B Howard

    Full Text Available Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as "core" for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification

  14. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  15. Mapping genomic features to functional traits through microbial whole genome sequences.

    Science.gov (United States)

    Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott

    2014-01-01

    Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.

  16. Positive Selection or Free to Vary? Assessing the Functional Significance of Sequence Change Using Molecular Dynamics.

    Directory of Open Access Journals (Sweden)

    Jane R Allison

    Full Text Available Evolutionary arms races between pathogens and their hosts may be manifested as selection for rapid evolutionary change of key genes, and are sometimes detectable through sequence-level analyses. In the case of protein-coding genes, such analyses frequently predict that specific codons are under positive selection. However, detecting positive selection can be non-trivial, and false positive predictions are a common concern in such analyses. It is therefore helpful to place such predictions within a structural and functional context. Here, we focus on the p19 protein from tombusviruses. P19 is a homodimer that sequesters siRNAs, thereby preventing the host RNAi machinery from shutting down viral infection. Sequence analysis of the p19 gene is complicated by the fact that it is constrained at the sequence level by overprinting of a viral movement protein gene. Using homology modeling, in silico mutation and molecular dynamics simulations, we assess how non-synonymous changes to two residues involved in forming the dimer interface-one invariant, and one predicted to be under positive selection-impact molecular function. Interestingly, we find that both observed variation and potential variation (where a non-synonymous change to p19 would be synonymous for the overprinted movement protein does not significantly impact protein structure or RNA binding. Consequently, while several methods identify residues at the dimer interface as being under positive selection, MD results suggest they are functionally indistinguishable from a site that is free to vary. Our analyses serve as a caveat to using sequence-level analyses in isolation to detect and assess positive selection, and emphasize the importance of also accounting for how non-synonymous changes impact structure and function.

  17. Intercellular signalling in Vibrio harveyi: sequence and function of genes regulating expression of luminescence.

    Science.gov (United States)

    Bassler, B L; Wright, M; Showalter, R E; Silverman, M R

    1993-08-01

    Density-dependent expression of luminescence in Vibrio harveyi is regulated by the concentration of an extracellular signal molecule (autoinducer) in the culture medium. A recombinant clone that restored function to one class of spontaneous dim mutants was found to encode functions necessary for the synthesis of, and response to, a signal molecule. Sequence analysis of the region encoding these functions revealed three open reading frames, two (luxL and luxM) that are required for production of an autoinducer substance and a third (luxN) that is required for response to this signal substance. The LuxL and LuxM proteins are not similar in amino acid sequence to other proteins in the database, but the LuxN protein contains regions of sequence resembling both the histidine protein kinase and the response regulator domains of the family of two-component, signal transduction proteins. The phenotypes of mutants with luxL, luxM and luxN defects indicated that an additional signal-response system controlling density-dependent expression of luminescence remains to be identified.

  18. A Parvovirus B19 synthetic genome: sequence features and functional competence.

    Science.gov (United States)

    Manaresi, Elisabetta; Conti, Ilaria; Bua, Gloria; Bonvicini, Francesca; Gallinella, Giorgio

    2017-08-01

    Central to genetic studies for Parvovirus B19 (B19V) is the availability of genomic clones that may possess functional competence and ability to generate infectious virus. In our study, we established a new model genetic system for Parvovirus B19. A synthetic approach was followed, by design of a reference genome sequence, by generation of a corresponding artificial construct and its molecular cloning in a complete and functional form, and by setup of an efficient strategy to generate infectious virus, via transfection in UT7/EpoS1 cells and amplification in erythroid progenitor cells. The synthetic genome was able to generate virus with biological properties paralleling those of native virus, its infectious activity being dependent on the preservation of self-complementarity and sequence heterogeneity within the terminal regions. A virus of defined genome sequence, obtained from controlled cell culture conditions, can constitute a reference tool for investigation of the structural and functional characteristics of the virus. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases

    Directory of Open Access Journals (Sweden)

    Braun Werner

    2002-11-01

    Full Text Available Abstract Background Total sequence decomposition, using the web-based MASIA tool, identifies areas of conservation in aligned protein sequences. By structurally annotating these motifs, the sequence can be parsed into individual building blocks, molecular legos ("molegos", that can eventually be related to function. Here, the approach is applied to the apurinic/apyrimidinic endonuclease (APE DNA repair proteins, essential enzymes that have been highly conserved throughout evolution. The APEs, DNase-1 and inositol 5'-polyphosphate phosphatases (IPP form a superfamily that catalyze metal ion based phosphorolysis, but recognize different substrates. Results MASIA decomposition of APE yielded 12 sequence motifs, 10 of which are also structurally conserved within the family and are designated as molegos. The 12 motifs include all the residues known to be essential for DNA cleavage by APE. Five of these molegos are sequentially and structurally conserved in DNase-1 and the IPP family. Correcting the sequence alignment to match the residues at the ends of two of the molegos that are absolutely conserved in each of the three families greatly improved the local structural alignment of APEs, DNase-1 and synaptojanin. Comparing substrate/product binding of molegos common to DNase-1 showed that those distinctive for APEs are not directly involved in cleavage, but establish protein-DNA interactions 3' to the abasic site. These additional bonds enhance both specific binding to damaged DNA and the processivity of APE1. Conclusion A modular approach can improve structurally predictive alignments of homologous proteins with low sequence identity and reveal residues peripheral to the traditional "active site" that control the specificity of enzymatic activity.

  20. Analysis of breast cancer metastasis candidate genes from next generation-sequencing via systematic functional genomics

    DEFF Research Database (Denmark)

    Blomstrøm, Monica Marie

    2016-01-01

    several growth modulators and invasion modulators were identified and independently validated. These candidates revealed a group of genes with metastasis-related functions in vitro that are involved in RNA-related processes, such as RNA-processing. Moreover, a general feature was that proliferation......) and non-CSCs. The main goal of this project was to functionally characterize a set of candidate genes recovered from next-generation sequencing analysis for their role in breast cancer metastasis formation. The starting gene set comprised 104 gene variants; i.e. 57 wildtype and 47 mutated variants. During...

  1. Identification of functional SNPs in the 5-prime flanking sequences of human genes

    Directory of Open Access Journals (Sweden)

    Lenhard Boris

    2005-02-01

    Full Text Available Abstract Background Over 4 million single nucleotide polymorphisms (SNPs are currently reported to exist within the human genome. Only a small fraction of these SNPs alter gene function or expression, and therefore might be associated with a cell phenotype. These functional SNPs are consequently important in understanding human health. Information related to functional SNPs in candidate disease genes is critical for cost effective genetic association studies, which attempt to understand the genetics of complex diseases like diabetes, Alzheimer's, etc. Robust methods for the identification of functional SNPs are therefore crucial. We report one such experimental approach. Results Sequence conserved between mouse and human genomes, within 5 kilobases of the 5-prime end of 176 GPCR genes, were screened for SNPs. Sequences flanking these SNPs were scored for transcription factor binding sites. Allelic pairs resulting in a significant score difference were predicted to influence the binding of transcription factors (TFs. Ten such SNPs were selected for mobility shift assays (EMSA, resulting in 7 of them exhibiting a reproducible shift. The full-length promoter regions with 4 of the 7 SNPs were cloned in a Luciferase based plasmid reporter system. Two out of the 4 SNPs exhibited differential promoter activity in several human cell lines. Conclusions We propose a method for effective selection of functional, regulatory SNPs that are located in evolutionary conserved 5-prime flanking regions (5'-FR regions of human genes and influence the activity of the transcriptional regulatory region. Some SNPs behave differently in different cell types.

  2. A structural study for the optimisation of functional motifs encoded in protein sequences

    Directory of Open Access Journals (Sweden)

    Helmer-Citterich Manuela

    2004-04-01

    Full Text Available Abstract Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases, the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of

  3. An artificial functional family filter in homolog searching in next-generation sequencing metagenomics.

    Directory of Open Access Journals (Sweden)

    Ruofei Du

    Full Text Available In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.

  4. Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

    Science.gov (United States)

    Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

    2006-01-01

    Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344

  5. Functional brain activation differences in stuttering identified with a rapid fMRI sequence

    Science.gov (United States)

    Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.

    2011-01-01

    The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech motor and auditory brain activity in children who stutter closer to the age at which recovery from stuttering is documented. Rapid sequences may be preferred for individuals or populations who do not tolerate long scanning sessions. In this report, we document the application of a picture naming and phoneme monitoring task in three minute fMRI sequences with adults who stutter (AWS). If relevant brain differences are found in AWS with these approaches that conform to previous reports, then these approaches can be extended to younger populations. Pairwise contrasts of brain BOLD activity between AWS and normally fluent adults indicated the AWS showed higher BOLD activity in the right inferior frontal gyrus (IFG), right temporal lobe and sensorimotor cortices during picture naming and and higher activity in the right IFG during phoneme monitoring. The right lateralized pattern of BOLD activity together with higher activity in sensorimotor cortices is consistent with previous reports, which indicates rapid fMRI sequences can be considered for investigating stuttering in younger participants. PMID:22133409

  6. A functional test of Neandertal and modern human mitochondrial targeting sequences

    Energy Technology Data Exchange (ETDEWEB)

    Gralle, Matthias, E-mail: gralle@bioqmed.ufrj.br [Instituto de Bioquimica Medica, Universidade Federal do Rio de Janeiro, CCS, Ilha do Fundao, 21941-590 Rio de Janeiro (Brazil); Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig (Germany); Schaefer, Ingo; Seibel, Peter [Department of Molecular Cell Therapy, Leipzig University, Deutscher Platz 5, 04103 Leipzig (Germany); Paeaebo, Svante [Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig (Germany)

    2010-11-26

    Research highlights: {yields} Two mutations in mitochondrial targeting peptides occurred during human evolution, possibly after Neandertals split off from modern human lineage. {yields} The ancestral and modern human versions of these two targeting peptides were tested functionally for their effects on localization and cleavage rate. {yields} In spite of recent evolution, and to the contrary of other mutations in targeting peptides, these mutations had no visible effects. -- Abstract: Targeting of nuclear-encoded proteins to different organelles, such as mitochondria, is a process that can result in the redeployment of proteins to new intracellular destinations during evolution. With the sequencing of the Neandertal genome, it has become possible to identify amino acid substitutions that occurred on the modern human lineage since its separation from the Neandertal lineage. Here we analyze the function of two substitutions in mitochondrial targeting sequences that occurred and rose to high frequency recently during recent human evolution. The ancestral and modern versions of the two targeting sequences do not differ in the efficiency with which they direct a protein to the mitochondria, an observation compatible with the neutral theory of molecular evolution.

  7. A functional test of Neandertal and modern human mitochondrial targeting sequences

    International Nuclear Information System (INIS)

    Gralle, Matthias; Schaefer, Ingo; Seibel, Peter; Paeaebo, Svante

    2010-01-01

    Research highlights: → Two mutations in mitochondrial targeting peptides occurred during human evolution, possibly after Neandertals split off from modern human lineage. → The ancestral and modern human versions of these two targeting peptides were tested functionally for their effects on localization and cleavage rate. → In spite of recent evolution, and to the contrary of other mutations in targeting peptides, these mutations had no visible effects. -- Abstract: Targeting of nuclear-encoded proteins to different organelles, such as mitochondria, is a process that can result in the redeployment of proteins to new intracellular destinations during evolution. With the sequencing of the Neandertal genome, it has become possible to identify amino acid substitutions that occurred on the modern human lineage since its separation from the Neandertal lineage. Here we analyze the function of two substitutions in mitochondrial targeting sequences that occurred and rose to high frequency recently during recent human evolution. The ancestral and modern versions of the two targeting sequences do not differ in the efficiency with which they direct a protein to the mitochondria, an observation compatible with the neutral theory of molecular evolution.

  8. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    Science.gov (United States)

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  9. SSFSE sequence functional MRI of the human cervical spinal cord with complex finger tapping

    International Nuclear Information System (INIS)

    Xie Chuhai; Kong Kangmei; Guan Jitian; Chen Yexi; He Jiankang; Qi Weili; Wang Xinjia; Shen Zhiwei; Wu Renhua

    2009-01-01

    Purpose: Functional MR imaging of the human cervical spinal cord was carried out on volunteers during alternated rest and a complex finger tapping task, in order to detect image intensity changes arising from neuronal activity. Methods: Functional MR imaging data using single-shot fast spin-echo sequence (SSFSE) with echo time 42.4 ms on a 1.5 T GE Clinical System were acquired in eight subjects performing a complex finger tapping task. Cervical spinal cord activation was measured both in the sagittal and transverse imaging planes. Postprocessing was performed by AFNI (Analysis of Functional Neuroimages) software system. Results: Intensity changes (5.5-7.6%) were correlated with the time course of stimulation and were consistently detected in both sagittal and transverse imaging planes of the cervical spinal cord. The activated regions localized to the ipsilateral side of the spinal cord in agreement with the neural anatomy. Conclusion: Functional MR imaging signals can be reliably detected with finger tapping activity in the human cervical spinal cord using a SSFSE sequence with 42.4 ms echo time. The anatomic location of neural activity correlates with the muscles used in the finger tapping task.

  10. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

    Science.gov (United States)

    Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E

    2016-12-01

    Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. © 2016 The Protein Society.

  11. Generation, analysis and functional annotation of expressed sequence tags from the ectoparasitic mite Psoroptes ovis

    Directory of Open Access Journals (Sweden)

    Kenyon Fiona

    2011-07-01

    Full Text Available Abstract Background Sheep scab is caused by Psoroptes ovis and is arguably the most important ectoparasitic disease affecting sheep in the UK. The disease is highly contagious and causes and considerable pruritis and irritation and is therefore a major welfare concern. Current methods of treatment are unsustainable and in order to elucidate novel methods of disease control a more comprehensive understanding of the parasite is required. To date, no full genomic DNA sequence or large scale transcript datasets are available and prior to this study only 484 P. ovis expressed sequence tags (ESTs were accessible in public databases. Results In order to further expand upon the transcriptomic coverage of P. ovis thus facilitating novel insights into the mite biology we undertook a larger scale EST approach, incorporating newly generated and previously described P. ovis transcript data and representing the largest collection of P. ovis ESTs to date. We sequenced 1,574 ESTs and assembled these along with 484 previously generated P. ovis ESTs, which resulted in the identification of 1,545 unique P. ovis sequences. BLASTX searches identified 961 ESTs with significant hits (E-value P. ovis ESTs. Gene Ontology (GO analysis allowed the functional annotation of 880 ESTs and included predictions of signal peptide and transmembrane domains; allowing the identification of potential P. ovis excreted/secreted factors, and mapping of metabolic pathways. Conclusions This dataset currently represents the largest collection of P. ovis ESTs, all of which are publicly available in the GenBank EST database (dbEST (accession numbers FR748230 - FR749648. Functional analysis of this dataset identified important homologues, including house dust mite allergens and tick salivary factors. These findings offer new insights into the underlying biology of P. ovis, facilitating further investigations into mite biology and the identification of novel methods of intervention.

  12. The HIVToolbox 2 web system integrates sequence, structure, function and mutation analysis.

    Directory of Open Access Journals (Sweden)

    David P Sargeant

    Full Text Available There is enormous interest in studying HIV pathogenesis for improving the treatment of patients with HIV infection. HIV infection has become one of the best-studied systems for understanding how a virus can hijack a cell. To help facilitate discovery, we previously built HIVToolbox, a web system for visual data mining. The original HIVToolbox integrated information for HIV protein sequence, structure, functional sites, and sequence conservation. This web system has been used for almost 40,000 searches. We report improvements to HIVToolbox including new functions and workflows, data updates, and updates for ease of use. HIVToolbox2, is an improvement over HIVToolbox with new functions. HIVToolbox2 has new functionalities focused on HIV pathogenesis including drug-binding sites, drug-resistance mutations, and immune epitopes. The integrated, interactive view enables visual mining to generate hypotheses that are not readily revealed by other approaches. Most HIV proteins form multimers, and there are posttranslational modification and protein-protein interaction sites at many of these multimerization interfaces. Analysis of protease drug binding sites reveals an anatomy of drug resistance with different types of drug-resistance mutations regionally localized on the surface of protease. Some of these drug-resistance mutations have a high prevalence in specific HIV-1 M subtypes. Finally, consolidation of Tat functional sites reveals a hotspot region where there appear to be 30 interactions or posttranslational modifications. A cursory analysis with HIVToolbox2 has helped to identify several global patterns for HIV proteins. An initial analysis with this tool identifies homomultimerization of almost all HIV proteins, functional sites that overlap with multimerization sites, a global drug resistance anatomy for HIV protease, and specific distributions of some DRMs in specific HIV M subtypes. HIVToolbox2 is an open-access web application available at

  13. Cytochromes P450 for natural product biosynthesis in Streptomyces: sequence, structure, and function.

    Science.gov (United States)

    Rudolf, Jeffrey D; Chang, Chin-Yuan; Ma, Ming; Shen, Ben

    2017-08-30

    Covering: up to January 2017Cytochrome P450 enzymes (P450s) are some of the most exquisite and versatile biocatalysts found in nature. In addition to their well-known roles in steroid biosynthesis and drug metabolism in humans, P450s are key players in natural product biosynthetic pathways. Natural products, the most chemically and structurally diverse small molecules known, require an extensive collection of P450s to accept and functionalize their unique scaffolds. In this review, we survey the current catalytic landscape of P450s within the Streptomyces genus, one of the most prolific producers of natural products, and comprehensively summarize the functionally characterized P450s from Streptomyces. A sequence similarity network of >8500 P450s revealed insights into the sequence-function relationships of these oxygen-dependent metalloenzymes. Although only ∼2.4% and structurally characterized, respectively, the study of streptomycete P450s involved in the biosynthesis of natural products has revealed their diverse roles in nature, expanded their catalytic repertoire, created structural and mechanistic paradigms, and exposed their potential for biomedical and biotechnological applications. Continued study of these remarkable enzymes will undoubtedly expose their true complement of chemical and biological capabilities.

  14. Versatile Gene-Specific Sequence Tags for Arabidopsis Functional Genomics: Transcript Profiling and Reverse Genetics Applications

    Science.gov (United States)

    Hilson, Pierre; Allemeersch, Joke; Altmann, Thomas; Aubourg, Sébastien; Avon, Alexandra; Beynon, Jim; Bhalerao, Rishikesh P.; Bitton, Frédérique; Caboche, Michel; Cannoot, Bernard; Chardakov, Vasil; Cognet-Holliger, Cécile; Colot, Vincent; Crowe, Mark; Darimont, Caroline; Durinck, Steffen; Eickhoff, Holger; de Longevialle, Andéol Falcon; Farmer, Edward E.; Grant, Murray; Kuiper, Martin T.R.; Lehrach, Hans; Léon, Céline; Leyva, Antonio; Lundeberg, Joakim; Lurin, Claire; Moreau, Yves; Nietfeld, Wilfried; Paz-Ares, Javier; Reymond, Philippe; Rouzé, Pierre; Sandberg, Goran; Segura, Maria Dolores; Serizet, Carine; Tabrett, Alexandra; Taconnat, Ludivine; Thareau, Vincent; Van Hummelen, Paul; Vercruysse, Steven; Vuylsteke, Marnik; Weingartner, Magdalena; Weisbeek, Peter J.; Wirta, Valtteri; Wittink, Floyd R.A.; Zabeau, Marc; Small, Ian

    2004-01-01

    Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics. PMID:15489341

  15. Functional Interaction of the Adenovirus IVa2 Protein with Adenovirus Type 5 Packaging Sequences

    OpenAIRE

    Ostapchuk, Philomena; Yang, Jihong; Auffarth, Ece; Hearing, Patrick

    2005-01-01

    Adenovirus type 5 (Ad5) DNA packaging is initiated in a polar fashion from the left end of the genome. The packaging process is dependent on the cis-acting packaging domain located between nucleotides 230 and 380. Seven AT-rich repeats that direct packaging have been identified within this domain. A1, A2, A5, and A6 are the most important repeats functionally and share a bipartite sequence motif. Several lines of evidence suggest that there is a limiting trans-acting factor(s) that plays a ro...

  16. Hidden Markov event sequence models: toward unsupervised functional MRI brain mapping.

    Science.gov (United States)

    Faisan, Sylvain; Thoraval, Laurent; Armspach, Jean-Paul; Foucher, Jack R; Metz-Lutz, Marie-Noëlle; Heitz, Fabrice

    2005-01-01

    Most methods used in functional MRI (fMRI) brain mapping require restrictive assumptions about the shape and timing of the fMRI signal in activated voxels. Consequently, fMRI data may be partially and misleadingly characterized, leading to suboptimal or invalid inference. To limit these assumptions and to capture the broad range of possible activation patterns, a novel statistical fMRI brain mapping method is proposed. It relies on hidden semi-Markov event sequence models (HSMESMs), a special class of hidden Markov models (HMMs) dedicated to the modeling and analysis of event-based random processes. Activation detection is formulated in terms of time coupling between (1) the observed sequence of hemodynamic response onset (HRO) events detected in the voxel's fMRI signal and (2) the "hidden" sequence of task-induced neural activation onset (NAO) events underlying the HROs. Both event sequences are modeled within a single HSMESM. The resulting brain activation model is trained to automatically detect neural activity embedded in the input fMRI data set under analysis. The data sets considered in this article are threefold: synthetic epoch-related, real epoch-related (auditory lexical processing task), and real event-related (oddball detection task) fMRI data sets. Synthetic data: Activation detection results demonstrate the superiority of the HSMESM mapping method with respect to a standard implementation of the statistical parametric mapping (SPM) approach. They are also very close, sometimes equivalent, to those obtained with an "ideal" implementation of SPM in which the activation patterns synthesized are reused for analysis. The HSMESM method appears clearly insensitive to timing variations of the hemodynamic response and exhibits low sensitivity to fluctuations of its shape (unsustained activation during task). Real epoch-related data: HSMESM activation detection results compete with those obtained with SPM, without requiring any prior definition of the expected

  17. Spectral Velocity Estimation using the Autocorrelation Function and Sparse data Sequences

    DEFF Research Database (Denmark)

    Jensen, Jørgen Arendt

    2005-01-01

    Ultrasound scanners can be used for displaying the distribution of velocities in blood vessels by finding the power spectrum of the received signal. It is desired to show a B-mode image for orientation and data for this has to be acquired interleaved with the flow data. Techniques for maintaining...... both the B-mode frame rate, and at the same time have the highest possible $f_{prf}$ only limited by the depth of investigation, are, thus, of great interest. The power spectrum can be calculated from the Fourier transform of the autocorrelation function $R_r(k)$. The lag $k$ corresponds...... of the sequence. The audio signal has also been synthesized from the autocorrelation data by passing white, Gaussian noise through a filter designed from the power spectrum of the autocorrelation function. The results show that both the full velocity range can be maintained at the same time as a B-mode image...

  18. The myoglobin of Emperor penguin (Aptenodytes forsteri): amino acid sequence and functional adaptation to extreme conditions.

    Science.gov (United States)

    Tamburrini, M; Romano, M; Giardina, B; di Prisco, G

    1999-02-01

    In the framework of a study on molecular adaptations of the oxygen-transport and storage systems to extreme conditions in Antarctic marine organisms, we have investigated the structure/function relationship in Emperor penguin (Aptenodytes forsteri) myoglobin, in search of correlation with the bird life style. In contrast with previous reports, the revised amino acid sequence contains one additional residue and 15 differences. The oxygen-binding parameters seem well adapted to the diving behaviour of the penguin and to the environmental conditions of the Antarctic habitat. Addition of lactate has no major effect on myoglobin oxygenation over a large temperature range. Therefore, metabolic acidosis does not impair myoglobin function under conditions of prolonged physical effort, such as diving.

  19. Designing sequence to control protein function in an EF-hand protein.

    Science.gov (United States)

    Bunick, Christopher G; Nelson, Melanie R; Mangahas, Sheryll; Hunter, Michael J; Sheehan, Jonathan H; Mizoue, Laura S; Bunick, Gerard J; Chazin, Walter J

    2004-05-19

    The extent of conformational change that calcium binding induces in EF-hand proteins is a key biochemical property specifying Ca(2+) sensor versus signal modulator function. To understand how differences in amino acid sequence lead to differences in the response to Ca(2+) binding, comparative analyses of sequence and structures, combined with model building, were used to develop hypotheses about which amino acid residues control Ca(2+)-induced conformational changes. These results were used to generate a first design of calbindomodulin (CBM-1), a calbindin D(9k) re-engineered with 15 mutations to respond to Ca(2+) binding with a conformational change similar to that of calmodulin. The gene for CBM-1 was synthesized, and the protein was expressed and purified. Remarkably, this protein did not exhibit any non-native-like molten globule properties despite the large number of mutations and the nonconservative nature of some of them. Ca(2+)-induced changes in CD intensity and in the binding of the hydrophobic probe, ANS, implied that CBM-1 does undergo Ca(2+) sensorlike conformational changes. The X-ray crystal structure of Ca(2+)-CBM-1 determined at 1.44 A resolution reveals the anticipated increase in hydrophobic surface area relative to the wild-type protein. A nascent calmodulin-like hydrophobic docking surface was also found, though it is occluded by the inter-EF-hand loop. The results from this first calbindomodulin design are discussed in terms of progress toward understanding the relationships between amino acid sequence, protein structure, and protein function for EF-hand CaBPs, as well as the additional mutations for the next CBM design.

  20. Prominence vs. aboutness in sequencing: a functional distinction within the left inferior frontal gyrus.

    Science.gov (United States)

    Bornkessel-Schlesewsky, Ina; Grewe, Tanja; Schlesewsky, Matthias

    2012-02-01

    Prior research on the neural bases of syntactic comprehension suggests that activation in the left inferior frontal gyrus (lIFG) correlates with the processing of word order variations. However, there are inconsistencies with respect to the specific subregion within the IFG that is implicated by these findings: the pars opercularis or the pars triangularis. Here, we examined the hypothesis that the dissociation between pars opercularis and pars triangularis activation may reflect functional differences between clause-medial and clause-initial word order permutations, respectively. To this end, we directly compared clause-medial and clause-initial object-before-subject orders in German in a within-participants, event-related fMRI design. Our results showed increased activation for object-initial sentences in a bilateral network of frontal, temporal and subcortical regions. Within the lIFG, posterior and inferior subregions showed only a main effect of word order, whereas more anterior and superior subregions showed effects of word order and sentence type, with higher activation for sentences with an argument in the clause-initial position. These findings are interpreted as evidence for a functional gradation of sequence processing within the left IFG: posterior subportions correlate with argument prominence-based (local) aspects of sequencing, while anterior subportions correlate with aboutness-based aspects of sequencing, which are crucial in linking the current sentence to the wider discourse. This proposal appears compatible with more general hypotheses about information processing gradients in prefrontal cortex (Koechlin & Summerfield, 2007). Copyright © 2010 Elsevier Inc. All rights reserved.

  1. A functional analysis of the spacer of V(DJ recombination signal sequences.

    Directory of Open Access Journals (Sweden)

    Alfred Ian Lee

    2003-10-01

    Full Text Available During lymphocyte development, V(DJ recombination assembles antigen receptor genes from component V, D, and J gene segments. These gene segments are flanked by a recombination signal sequence (RSS, which serves as the binding site for the recombination machinery. The murine Jbeta2.6 gene segment is a recombinationally inactive pseudogene, but examination of its RSS reveals no obvious reason for its failure to recombine. Mutagenesis of the Jbeta2.6 RSS demonstrates that the sequences of the heptamer, nonamer, and spacer are all important. Strikingly, changes solely in the spacer sequence can result in dramatic differences in the level of recombination. The subsequent analysis of a library of more than 4,000 spacer variants revealed that spacer residues of particular functional importance are correlated with their degree of conservation. Biochemical assays indicate distinct cooperation between the spacer and heptamer/nonamer along each step of the reaction pathway. The results suggest that the spacer serves not only to ensure the appropriate distance between the heptamer and nonamer but also regulates RSS activity by providing additional RAG:RSS interaction surfaces. We conclude that while RSSs are defined by a "digital" requirement for absolutely conserved nucleotides, the quality of RSS function is determined in an "analog" manner by numerous complex interactions between the RAG proteins and the less-well conserved nucleotides in the heptamer, the nonamer, and, importantly, the spacer. Those modulatory effects are accurately predicted by a new computational algorithm for "RSS information content." The interplay between such binary and multiplicative modes of interactions provides a general model for analyzing protein-DNA interactions in various biological systems.

  2. A functional analysis of the spacer of V(D)J recombination signal sequences.

    Science.gov (United States)

    Lee, Alfred Ian; Fugmann, Sebastian D; Cowell, Lindsay G; Ptaszek, Leon M; Kelsoe, Garnett; Schatz, David G

    2003-10-01

    During lymphocyte development, V(D)J recombination assembles antigen receptor genes from component V, D, and J gene segments. These gene segments are flanked by a recombination signal sequence (RSS), which serves as the binding site for the recombination machinery. The murine Jbeta2.6 gene segment is a recombinationally inactive pseudogene, but examination of its RSS reveals no obvious reason for its failure to recombine. Mutagenesis of the Jbeta2.6 RSS demonstrates that the sequences of the heptamer, nonamer, and spacer are all important. Strikingly, changes solely in the spacer sequence can result in dramatic differences in the level of recombination. The subsequent analysis of a library of more than 4,000 spacer variants revealed that spacer residues of particular functional importance are correlated with their degree of conservation. Biochemical assays indicate distinct cooperation between the spacer and heptamer/nonamer along each step of the reaction pathway. The results suggest that the spacer serves not only to ensure the appropriate distance between the heptamer and nonamer but also regulates RSS activity by providing additional RAG:RSS interaction surfaces. We conclude that while RSSs are defined by a "digital" requirement for absolutely conserved nucleotides, the quality of RSS function is determined in an "analog" manner by numerous complex interactions between the RAG proteins and the less-well conserved nucleotides in the heptamer, the nonamer, and, importantly, the spacer. Those modulatory effects are accurately predicted by a new computational algorithm for "RSS information content." The interplay between such binary and multiplicative modes of interactions provides a general model for analyzing protein-DNA interactions in various biological systems.

  3. Targeted Sequencing of Lung Function Loci in Chronic Obstructive Pulmonary Disease Cases and Controls.

    Directory of Open Access Journals (Sweden)

    María Soler Artigas

    Full Text Available Chronic obstructive pulmonary disease (COPD is the third leading cause of death worldwide; smoking is the main risk factor for COPD, but genetic factors are also relevant contributors. Genome-wide association studies (GWAS of the lung function measures used in the diagnosis of COPD have identified a number of loci, however association signals are often broad and collectively these loci only explain a small proportion of the heritability. In order to examine the association with COPD risk of genetic variants down to low allele frequencies, to aid fine-mapping of association signals and to explain more of the missing heritability, we undertook a targeted sequencing study in 300 COPD cases and 300 smoking controls for 26 loci previously reported to be associated with lung function. We used a pooled sequencing approach, with 12 pools of 25 individuals each, enabling high depth (30x coverage per sample to be achieved. This pooled design maximised sample size and therefore power, but led to challenges during variant-calling since sequencing error rates and minor allele frequencies for rare variants can be very similar. For this reason we employed a rigorous quality control pipeline for variant detection which included the use of 3 independent calling algorithms. In order to avoid false positive associations we also developed tests to detect variants with potential batch effects and removed them before undertaking association testing. We tested for the effects of single variants and the combined effect of rare variants within a locus. We followed up the top signals with data available (only 67% of collapsing methods signals in 4,249 COPD cases and 11,916 smoking controls from UK Biobank. We provide suggestive evidence for the combined effect of rare variants on COPD risk in TNXB and in sliding windows within MECOM and upstream of HHIP. These findings can lead to an improved understanding of the molecular pathways involved in the development of COPD.

  4. Sequence and function of LuxO, a negative regulator of luminescence in Vibrio harveyi.

    Science.gov (United States)

    Bassler, B L; Wright, M; Silverman, M R

    1994-05-01

    Density-dependent expression of luminescence in Vibrio harveyi is regulated by the concentration of extracellular signal molecules (autoinducers) in the culture medium. A recombinant clone that restored function to one class of spontaneous dim mutants was found to encode a function required for the density-dependent response. Transposon Tn5 insertions in the recombinant clone were isolated, and the mutations were transferred to the genome of V. harveyi for examination of mutant phenotypes. Expression of luminescence in V. harveyi strains with transposon insertions in one locus, luxO, was independent of the density of the culture and was similar in intensity to the maximal level observed in wild-type bacteria. Sequence analysis of luxO revealed one open reading frame that encoded a protein, LuxO, similar in amino acid sequence to the response regulator domain of the family of two-component, signal transduction proteins. The constitutive phenotype of LuxO- mutants indicates that LuxO acts negatively to control expression of luminescence, and relief of repression by LuxO in the wild type could result from interactions with other components in the Lux signalling system.

  5. The use of orthologous sequences to predict the impact of amino acid substitutions on protein function.

    Directory of Open Access Journals (Sweden)

    Nicholas J Marini

    2010-05-01

    Full Text Available Computational predictions of the functional impact of genetic variation play a critical role in human genetics research. For nonsynonymous coding variants, most prediction algorithms make use of patterns of amino acid substitutions observed among homologous proteins at a given site. In particular, substitutions observed in orthologous proteins from other species are often assumed to be tolerated in the human protein as well. We examined this assumption by evaluating a panel of nonsynonymous mutants of a prototypical human enzyme, methylenetetrahydrofolate reductase (MTHFR, in a yeast cell-based functional assay. As expected, substitutions in human MTHFR at sites that are well-conserved across distant orthologs result in an impaired enzyme, while substitutions present in recently diverged sequences (including a 9-site mutant that "resurrects" the human-macaque ancestor result in a functional enzyme. We also interrogated 30 sites with varying degrees of conservation by creating substitutions in the human enzyme that are accepted in at least one ortholog of MTHFR. Quite surprisingly, most of these substitutions were deleterious to the human enzyme. The results suggest that selective constraints vary between phylogenetic lineages such that inclusion of distant orthologs to infer selective pressures on the human enzyme may be misleading. We propose that homologous proteins are best used to reconstruct ancestral sequences and infer amino acid conservation among only direct lineal ancestors of a particular protein. We show that such an "ancestral site preservation" measure outperforms other prediction methods, not only in our selected set for MTHFR, but also in an exhaustive set of E. coli LacI mutants.

  6. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    Science.gov (United States)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  7. Metatranscriptome Sequencing Reveals Insights into the Gene Expression and Functional Potential of Rumen Wall Bacteria

    Directory of Open Access Journals (Sweden)

    Evelyne Mann

    2018-01-01

    Full Text Available Microbiota of the rumen wall constitute an important niche of rumen microbial ecology and their composition has been elucidated in different ruminants during the last years. However, the knowledge about the function of rumen wall microbes is still limited. Rumen wall biopsies were taken from three fistulated dairy cows under a standard forage-based diet and after 4 weeks of high concentrate feeding inducing a subacute rumen acidosis (SARA. Extracted RNA was used for metatranscriptome sequencing using Illumina HiSeq sequencing technology. The gene expression of the rumen wall microbial community was analyzed by mapping 35 million sequences against the Kyoto Encyclopedia for Genes and Genomes (KEGG database and determining differentially expressed genes. A total of 1,607 functional features were assigned with high expression of genes involved in central metabolism, galactose, starch and sucrose metabolism. The glycogen phosphorylase (EC:2.4.1.1 which degrades (1->4-alpha-D-glucans was among the highest expressed genes being transcribed by 115 bacterial genera. Energy metabolism genes were also highly expressed, including the pyruvate orthophosphate dikinase (EC:2.7.9.1 involved in pyruvate metabolism, which was covered by 177 genera. Nitrogen metabolism genes, in particular glutamate dehydrogenase (EC:1.4.1.4, glutamine synthetase (EC:6.3.1.2 and glutamate synthase (EC:1.4.1.13, EC:1.4.1.14 were also found to be highly expressed and prove rumen wall microbiota to be actively involved in providing host-relevant metabolites for exchange across the rumen wall. In addition, we found all four urease subunits (EC:3.5.1.5 transcribed by members of the genera Flavobacterium, Corynebacterium, Helicobacter, Clostridium, and Bacillus, and the dissimilatory sulfate reductase (EC 1.8.99.5 dsrABC, which is responsible for the reduction of sulfite to sulfide. We also provide in situ evidence for cellulose and cellobiose degradation, a key step in fiber-rich feed

  8. Functional and Structural Overview of G-Protein-Coupled Receptors Comprehensively Obtained from Genome Sequences

    Directory of Open Access Journals (Sweden)

    Makiko Suwa

    2011-04-01

    Full Text Available An understanding of the functional mechanisms of G-protein-coupled receptors (GPCRs is very important for GPCR-related drug design. We have developed an integrated GPCR database (SEVENS http://sevens.cbrc.jp/ that includes 64,090 reliable GPCR genes comprehensively identified from 56 eukaryote genome sequences, and overviewed the sequences and structure spaces of the GPCRs. In vertebrates, the number of receptors for biological amines, peptides, etc. is conserved in most species, whereas the number of chemosensory receptors for odorant, pheromone, etc. significantly differs among species. The latter receptors tend to be single exon type or a few exon type and show a high ratio in the numbers of GPCRs, whereas some families, such as Class B and Class C receptors, have long lengths due to the presence of many exons. Statistical analyses of amino acid residues reveal that most of the conserved residues in Class A GPCRs are found in the cytoplasmic half regions of transmembrane (TM helices, while residues characteristic to each subfamily found on the extracellular half regions. The 69 of Protein Data Bank (PDB entries of complete or fragmentary structures could be mapped on the TM/loop regions of Class A GPCRs covering 14 subfamilies.

  9. Expressed sequence tag analysis of functional genes associated with adventitious rooting in Liriodendron hybrids.

    Science.gov (United States)

    Zhong, Y D; Sun, X Y; Liu, E Y; Li, Y Q; Gao, Z; Yu, F X

    2016-06-24

    Liriodendron hybrids (Liriodendron chinense x L. tulipifera) are important landscaping and afforestation hardwood trees. To date, little genomic research on adventitious rooting has been reported in these hybrids, as well as in the genus Liriodendron. In the present study, we used adventitious roots to construct the first cDNA library for Liriodendron hybrids. A total of 5176 expressed sequence tags (ESTs) were generated and clustered into 2921 unigenes. Among these unigenes, 2547 had significant homology to the non-redundant protein database representing a wide variety of putative functions. Homologs of these genes regulated many aspects of adventitious rooting, including those for auxin signal transduction and root hair development. Results of quantitative real-time polymerase chain reaction showed that AUX1, IRE, and FB1 were highly expressed in adventitious roots and the expression of AUX1, ARF1, NAC1, RHD1, and IRE increased during the development of adventitious roots. Additionally, 181 simple sequence repeats were identified from 166 ESTs and more than 91.16% of these were dinucleotide and trinucleotide repeats. To the best of our knowledge, the present study reports the identification of the genes associated with adventitious rooting in the genus Liriodendron for the first time and provides a valuable resource for future genomic studies. Expression analysis of selected genes could allow us to identify regulatory genes that may be essential for adventitious rooting.

  10. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  11. Functional MRI of the pharynx in obstructive sleep apnea (OSA) with rapid 2-D flash sequences

    International Nuclear Information System (INIS)

    Jaeger, L.; Guenther, E.; Gauger, J.; Nitz, W.; Kastenbauer, E.; Reiser, M.

    1996-01-01

    Functional imaging of the pharynx used to be the domain of cineradiography, CT and ultrafast CT. The development of modern MRI techniques led to new access to functional disorders of the pharynx. The aim of this study was to implement a new MRI technique to examine oropharyngeal obstructive mechanisms in patients with obstructive sleep apnea (OSA). Sixteen patients suffering from OSA and 6 healthy volunteers were examined on a 1.5 T whole-body imager ('Vision', Siemens, Erlangen Medical Engineering, Germany) using a circular polarized head coil. Imaging was performed with 2D flash sequences in midsagittal and axial planes. Patients and volunteers were asked to breathe normally through the nose and to simulate snoring and the Mueller maneuver during magnetic resonance imaging (MRI). Prior to MRI, all patients underwent an ear, nose and throat (ENT) examination, functional fiberoptic nasopharyngoscopy and polysomnography. A temporal resolution of 6 images/s and an in-plane resolution of 2.67x1.8 mm were achieved. The mobility of the tongue, soft palate and pharyngeal surface could be clearly delineated. The MRI findings correlated well with the clinical examinations. We propose ultrafast MRI as a reliable and non-invasive method of evaluating pharyngeal obstruction and their levels. (orig.) [de

  12. Aspects of the generation of finite-difference Green's function sequences for arbitrary 3-D cubic lattice points

    NARCIS (Netherlands)

    de Hon, B.P.; Arnold, J.M.

    2015-01-01

    The robust and speedy evaluation of lattice Green's functions LGFs) is crucial to the effectiveness of finite-difference Green's function diakoptics schemes. We have recently determined a generic recurrence scheme for the construction of scalar LGF sequences at arbitrary points on a 3-D cubic

  13. BrEPS: a flexible and automatic protocol to compute enzyme-specific sequence profiles for functional annotation

    Directory of Open Access Journals (Sweden)

    Schomburg D

    2010-12-01

    Full Text Available Abstract Background Models for the simulation of metabolic networks require the accurate prediction of enzyme function. Based on a genomic sequence, enzymatic functions of gene products are today mainly predicted by sequence database searching and operon analysis. Other methods can support these techniques: We have developed an automatic method "BrEPS" that creates highly specific sequence patterns for the functional annotation of enzymes. Results The enzymes in the UniprotKB are identified and their sequences compared against each other with BLAST. The enzymes are then clustered into a number of trees, where each tree node is associated with a set of EC-numbers. The enzyme sequences in the tree nodes are aligned with ClustalW. The conserved columns of the resulting multiple alignments are used to construct sequence patterns. In the last step, we verify the quality of the patterns by computing their specificity. Patterns with low specificity are omitted and recomputed further down in the tree. The final high-quality patterns can be used for functional annotation. We ran our protocol on a recent Swiss-Prot release and show statistics, as well as a comparison to PRIAM, a probabilistic method that is also specialized on the functional annotation of enzymes. We determine the amount of true positive annotations for five common microorganisms with data from BRENDA and AMENDA serving as standard of truth. BrEPS is almost on par with PRIAM, a fact which we discuss in the context of five manually investigated cases. Conclusions Our protocol computes highly specific sequence patterns that can be used to support the functional annotation of enzymes. The main advantages of our method are that it is automatic and unsupervised, and quite fast once the patterns are evaluated. The results show that BrEPS can be a valuable addition to the reconstruction of metabolic networks.

  14. The C-terminal sequence of several human serine proteases encodes host defense functions.

    Science.gov (United States)

    Kasetty, Gopinath; Papareddy, Praveen; Kalle, Martina; Rydengård, Victoria; Walse, Björn; Svensson, Bo; Mörgelin, Matthias; Malmsten, Martin; Schmidtchen, Artur

    2011-01-01

    Serine proteases of the S1 family have maintained a common structure over an evolutionary span of more than one billion years, and evolved a variety of substrate specificities and diverse biological roles, involving digestion and degradation, blood clotting, fibrinolysis and epithelial homeostasis. We here show that a wide range of C-terminal peptide sequences of serine proteases, particularly from the coagulation and kallikrein systems, share characteristics common with classical antimicrobial peptides of innate immunity. Under physiological conditions, these peptides exert antimicrobial effects as well as immunomodulatory functions by inhibiting macrophage responses to bacterial lipopolysaccharide. In mice, selected peptides are protective against lipopolysaccharide-induced shock. Moreover, these S1-derived host defense peptides exhibit helical structures upon binding to lipopolysaccharide and also permeabilize liposomes. The results uncover new and fundamental aspects on host defense functions of serine proteases present particularly in blood and epithelia, and provide tools for the identification of host defense molecules of therapeutic interest. Copyright © 2011 S. Karger AG, Basel.

  15. Optimal protein library design using recombination or point mutations based on sequence-based scoring functions.

    Science.gov (United States)

    Pantazes, Robert J; Saraf, Manish C; Maranas, Costas D

    2007-08-01

    In this paper, we introduce and test two new sequence-based protein scoring systems (i.e. S1, S2) for assessing the likelihood that a given protein hybrid will be functional. By binning together amino acids with similar properties (i.e. volume, hydrophobicity and charge) the scoring systems S1 and S2 allow for the quantification of the severity of mismatched interactions in the hybrids. The S2 scoring system is found to be able to significantly functionally enrich a cytochrome P450 library over other scoring methods. Given this scoring base, we subsequently constructed two separate optimization formulations (i.e. OPTCOMB and OPTOLIGO) for optimally designing protein combinatorial libraries involving recombination or mutations, respectively. Notably, two separate versions of OPTCOMB are generated (i.e. model M1, M2) with the latter allowing for position-dependent parental fragment skipping. Computational benchmarking results demonstrate the efficacy of models OPTCOMB and OPTOLIGO to generate high scoring libraries of a prespecified size.

  16. The cluster index of regularly varying sequences with applications to limit theory for functions of multivariate Markov chains

    DEFF Research Database (Denmark)

    Mikosch, Thomas Valentin; Wintenberger, Olivier

    2014-01-01

    We introduce the cluster index of a multivariate stationary sequence and characterize the index in terms of the spectral tail process. This index plays a major role in limit theory for partial sums of sequences. We illustrate the use of the cluster index by characterizing infinite variance stable...... limit distributions and precise large deviation results for sums of multivariate functions acting on a stationary Markov chain under a drift condition....

  17. Effects of loading sequences and size of repeated stress block of loads on fatigue life calculated using fatigue functions

    International Nuclear Information System (INIS)

    Schott, G.

    1989-01-01

    It is well-known that collective form, stress intensity and loading sequence of individual stresses as well as size of repeated stress blocks can influence fatigue life, significantly. The basic variant of the consecutive Woehler curve concept will permit these effects to be involved into fatigue life computation. The paper presented will demonstrate that fatigue life computations using fatigue functions reflect the loading sequence effect with multilevel loading precisely and provide reliable fatigue life data. Effects of size of repeated stress block and loading sequence on fatigue life as observed with block program tests can be reproduced using the new computation method. (orig.) [de

  18. Isolation of xylose isomerases by sequence- and function-based screening from a soil metagenomic library

    Directory of Open Access Journals (Sweden)

    Parachin Nádia

    2011-05-01

    Full Text Available Abstract Background Xylose isomerase (XI catalyses the isomerisation of xylose to xylulose in bacteria and some fungi. Currently, only a limited number of XI genes have been functionally expressed in Saccharomyces cerevisiae, the microorganism of choice for lignocellulosic ethanol production. The objective of the present study was to search for novel XI genes in the vastly diverse microbial habitat present in soil. As the exploitation of microbial diversity is impaired by the ability to cultivate soil microorganisms under standard laboratory conditions, a metagenomic approach, consisting of total DNA extraction from a given environment followed by cloning of DNA into suitable vectors, was undertaken. Results A soil metagenomic library was constructed and two screening methods based on protein sequence similarity and enzyme activity were investigated to isolate novel XI encoding genes. These two screening approaches identified the xym1 and xym2 genes, respectively. Sequence and phylogenetic analyses revealed that the genes shared 67% similarity and belonged to different bacterial groups. When xym1 and xym2 were overexpressed in a xylA-deficient Escherichia coli strain, similar growth rates to those in which the Piromyces XI gene was expressed were obtained. However, expression in S. cerevisiae resulted in only one-fourth the growth rate of that obtained for the strain expressing the Piromyces XI gene. Conclusions For the first time, the screening of a soil metagenomic library in E. coli resulted in the successful isolation of two active XIs. However, the discrepancy between XI enzyme performance in E. coli and S. cerevisiae suggests that future screening for XI activity from soil should be pursued directly using yeast as a host.

  19. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

    KAUST Repository

    Kulmanov, Maxat

    2017-09-27

    Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.

  20. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    Science.gov (United States)

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  1. Augmented brain function by coordinated reset stimulation with slowly varying sequences

    Directory of Open Access Journals (Sweden)

    Magteld eZeitler

    2015-03-01

    Full Text Available Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e. an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS. In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e. CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.

  2. Augmented brain function by coordinated reset stimulation with slowly varying sequences.

    Science.gov (United States)

    Zeitler, Magteld; Tass, Peter A

    2015-01-01

    Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.

  3. In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment

    Directory of Open Access Journals (Sweden)

    Chitale Meghana

    2013-02-01

    Full Text Available Abstract Background Many Automatic Function Prediction (AFP methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG meeting at the Intelligent Systems in Molecular Biology (ISMB conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA. Results We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST. Conclusion The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.

  4. Functional assessment of human enhancer activities using whole-genome STARR-sequencing.

    Science.gov (United States)

    Liu, Yuwen; Yu, Shan; Dhiman, Vineet K; Brunetti, Tonya; Eckart, Heather; White, Kevin P

    2017-11-20

    Genome-wide quantification of enhancer activity in the human genome has proven to be a challenging problem. Recent efforts have led to the development of powerful tools for enhancer quantification. However, because of genome size and complexity, these tools have yet to be applied to the whole human genome.  In the current study, we use a human prostate cancer cell line, LNCaP as a model to perform whole human genome STARR-seq (WHG-STARR-seq) to reliably obtain an assessment of enhancer activity. This approach builds upon previously developed STARR-seq in the fly genome and CapSTARR-seq techniques in targeted human genomic regions. With an improved library preparation strategy, our approach greatly increases the library complexity per unit of starting material, which makes it feasible and cost-effective to explore the landscape of regulatory activity in the much larger human genome. In addition to our ability to identify active, accessible enhancers located in open chromatin regions, we can also detect sequences with the potential for enhancer activity that are located in inaccessible, closed chromatin regions. When treated with the histone deacetylase inhibitor, Trichostatin A, genes nearby this latter class of enhancers are up-regulated, demonstrating the potential for endogenous functionality of these regulatory elements. WHG-STARR-seq provides an improved approach to current pipelines for analysis of high complexity genomes to gain a better understanding of the intricacies of transcriptional regulation.

  5. Identification of novel biomass-degrading enzymes from genomic dark matter: Populating genomic sequence space with functional annotation.

    Science.gov (United States)

    Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias

    2014-08-01

    Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. © 2014 Wiley Periodicals, Inc.

  6. Large-scale determination of sequence, structure, and function relationships in cytosolic glutathione transferases across the biosphere.

    Science.gov (United States)

    Mashiyama, Susan T; Malabanan, M Merced; Akiva, Eyal; Bhosle, Rahul; Branch, Megan C; Hillerich, Brandan; Jagessar, Kevin; Kim, Jungwook; Patskovsky, Yury; Seidel, Ronald D; Stead, Mark; Toro, Rafael; Vetting, Matthew W; Almo, Steven C; Armstrong, Richard N; Babbitt, Patricia C

    2014-04-01

    The cytosolic glutathione transferase (cytGST) superfamily comprises more than 13,000 nonredundant sequences found throughout the biosphere. Their key roles in metabolism and defense against oxidative damage have led to thousands of studies over several decades. Despite this attention, little is known about the physiological reactions they catalyze and most of the substrates used to assay cytGSTs are synthetic compounds. A deeper understanding of relationships across the superfamily could provide new clues about their functions. To establish a foundation for expanded classification of cytGSTs, we generated similarity-based subgroupings for the entire superfamily. Using the resulting sequence similarity networks, we chose targets that broadly covered unknown functions and report here experimental results confirming GST-like activity for 82 of them, along with 37 new 3D structures determined for 27 targets. These new data, along with experimentally known GST reactions and structures reported in the literature, were painted onto the networks to generate a global view of their sequence-structure-function relationships. The results show how proteins of both known and unknown function relate to each other across the entire superfamily and reveal that the great majority of cytGSTs have not been experimentally characterized or annotated by canonical class. A mapping of taxonomic classes across the superfamily indicates that many taxa are represented in each subgroup and highlights challenges for classification of superfamily sequences into functionally relevant classes. Experimental determination of disulfide bond reductase activity in many diverse subgroups illustrate a theme common for many reaction types. Finally, sequence comparison between an enzyme that catalyzes a reductive dechlorination reaction relevant to bioremediation efforts with some of its closest homologs reveals differences among them likely to be associated with evolution of this unusual reaction

  7. Large-scale determination of sequence, structure, and function relationships in cytosolic glutathione transferases across the biosphere.

    Directory of Open Access Journals (Sweden)

    Susan T Mashiyama

    2014-04-01

    Full Text Available The cytosolic glutathione transferase (cytGST superfamily comprises more than 13,000 nonredundant sequences found throughout the biosphere. Their key roles in metabolism and defense against oxidative damage have led to thousands of studies over several decades. Despite this attention, little is known about the physiological reactions they catalyze and most of the substrates used to assay cytGSTs are synthetic compounds. A deeper understanding of relationships across the superfamily could provide new clues about their functions. To establish a foundation for expanded classification of cytGSTs, we generated similarity-based subgroupings for the entire superfamily. Using the resulting sequence similarity networks, we chose targets that broadly covered unknown functions and report here experimental results confirming GST-like activity for 82 of them, along with 37 new 3D structures determined for 27 targets. These new data, along with experimentally known GST reactions and structures reported in the literature, were painted onto the networks to generate a global view of their sequence-structure-function relationships. The results show how proteins of both known and unknown function relate to each other across the entire superfamily and reveal that the great majority of cytGSTs have not been experimentally characterized or annotated by canonical class. A mapping of taxonomic classes across the superfamily indicates that many taxa are represented in each subgroup and highlights challenges for classification of superfamily sequences into functionally relevant classes. Experimental determination of disulfide bond reductase activity in many diverse subgroups illustrate a theme common for many reaction types. Finally, sequence comparison between an enzyme that catalyzes a reductive dechlorination reaction relevant to bioremediation efforts with some of its closest homologs reveals differences among them likely to be associated with evolution of this

  8. Coupled high-throughput functional screening and next generation sequencing for identification of plant polymer decomposing enzymes in metagenomic libraries

    Directory of Open Access Journals (Sweden)

    Mari eNyyssönen

    2013-09-01

    Full Text Available Recent advances in sequencing technologies generate new predictions and hypotheses about the functional roles of environmental microorganisms. Yet, until we can test these predictions at a scale that matches our ability to generate them, most of them will remain as hypotheses. Function-based mining of metagenomic libraries can provide direct linkages between genes, metabolic traits and microbial taxa and thus bridge this gap between sequence data generation and functional predictions. Here we developed high-throughput screening assays for function-based characterization of activities involved in plant polymer decomposition from environmental metagenomic libraries. The multiplexed assays use fluorogenic and chromogenic substrates, combine automated liquid handling and use a genetically modified expression host to enable simultaneous screening of 12,160 clones for 14 activities in a total of 170,240 reactions. Using this platform we identified 374 (0.26 % cellulose, hemicellulose, chitin, starch, phosphate and protein hydrolyzing clones from fosmid libraries prepared from decomposing leaf litter. Sequencing on the Illumina MiSeq platform, followed by assembly and gene prediction of a subset of 95 fosmid clones, identified a broad range of bacterial phyla, including Actinobacteria, Bacteroidetes, multiple Proteobacteria sub-phyla in addition to some Fungi. Carbohydrate-active enzyme genes from 20 different glycoside hydrolase families were detected. Using tetranucleotide frequency binning of fosmid sequences, multiple enzyme activities from distinct fosmids were linked, demonstrating how biochemically-confirmed functional traits in environmental metagenomes may be attributed to groups of specific organisms. Overall, our results demonstrate how functional screening of metagenomic libraries can be used to connect microbial functionality to community composition and, as a result, complement large-scale metagenomic sequencing efforts.

  9. Salmonella Persistence in Tomatoes Requires a Distinct Set of Metabolic Functions Identified by Transposon Insertion Sequencing

    Science.gov (United States)

    Desai, Prerak; Porwollik, Steffen; Canals, Rocio; Perez, Daniel R.; Chu, Weiping; McClelland, Michael; Teplitski, Max

    2016-01-01

    ABSTRACT Human enteric pathogens, such as Salmonella spp. and verotoxigenic Escherichia coli, are increasingly recognized as causes of gastroenteritis outbreaks associated with the consumption of fruits and vegetables. Persistence in plants represents an important part of the life cycle of these pathogens. The identification of the full complement of Salmonella genes involved in the colonization of the model plant (tomato) was carried out using transposon insertion sequencing analysis. With this approach, 230,000 transposon insertions were screened in tomato pericarps to identify loci with reduction in fitness, followed by validation of the screen results using competition assays of the isogenic mutants against the wild type. A comparison with studies in animals revealed a distinct plant-associated set of genes, which only partially overlaps with the genes required to elicit disease in animals. De novo biosynthesis of amino acids was critical to persistence within tomatoes, while amino acid scavenging was prevalent in animal infections. Fitness reduction of the Salmonella amino acid synthesis mutants was generally more severe in the tomato rin mutant, which hyperaccumulates certain amino acids, suggesting that these nutrients remain unavailable to Salmonella spp. within plants. Salmonella lipopolysaccharide (LPS) was required for persistence in both animals and plants, exemplifying some shared pathogenesis-related mechanisms in animal and plant hosts. Similarly to phytopathogens, Salmonella spp. required biosynthesis of amino acids, LPS, and nucleotides to colonize tomatoes. Overall, however, it appears that while Salmonella shares some strategies with phytopathogens and taps into its animal virulence-related functions, colonization of tomatoes represents a distinct strategy, highlighting this pathogen's flexible metabolism. IMPORTANCE Outbreaks of gastroenteritis caused by human pathogens have been increasingly associated with foods of plant origin, with tomatoes

  10. Rapid functional and sequence differentiation of a tandemly repeated species-specific multigene family in Drosophila

    DEFF Research Database (Denmark)

    Clifton, Bryan D.; Sanz, Pablo Librado; Yeh, Shu-Dan

    2017-01-01

    Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our understanding of how mutational mechanisms and evolutionary forces shape the structural and functional evolution of these clusters is hindered by the high sequence identity among the copies, which typical...

  11. Metagenomic Sequencing of Marine Periphyton: Taxonomic and Functional Insights into Biofilm Communities

    Directory of Open Access Journals (Sweden)

    Kemal eSanli

    2015-10-01

    Full Text Available Periphyton communities are complex phototrophic, multispecies biofilms that develop on surfaces in aquatic environments. These communities harbor a large diversity of organisms comprising viruses, bacteria, algae, fungi, protozoans and metazoans. However, thus far the total biodiversity of periphyton has not been described. In this study, we use metagenomics to characterize periphyton communities from the marine environment of the Swedish west coast. Although we found approximately ten times more eukaryotic rRNA marker gene sequences compared to prokaryotic, the whole metagenome-based similarity searches showed that bacteria constitute the most abundant phyla in these biofilms. We show that marine periphyton encompass a range of heterotrophic and phototrophic organisms. Heterotrophic bacteria, including the majority of proteobacterial clades and Bacteroidetes, and eukaryotic macro-invertebrates were found to dominate periphyton. The phototrophic groups comprise Cyanobacteria and the alpha-proteobacterial genus Roseobacter, followed by different micro- and macro-algae. We also assess the metabolic pathways that predispose these communities to an attached lifestyle. Functional indicators of the biofilm form of life in periphyton involve genes coding for enzymes that catalyze the production and degradation of extracellular polymeric substances, mainly in the form of complex sugars such as starch and glycogen-like meshes together with chitin. Genes for 278 different transporter proteins were detected in the metagenome, constituting the most abundant protein complexes. Finally, genes encoding enzymes that participate in anaerobic pathways, such as denitrification and methanogenesis, were detected suggesting the presence of anaerobic or low-oxygen micro-zones within the biofilms.

  12. Integrating network, sequence and functional features using machine learning approaches towards identification of novel Alzheimer genes.

    Science.gov (United States)

    Jamal, Salma; Goyal, Sukriti; Shanker, Asheesh; Grover, Abhinav

    2016-10-18

    Alzheimer's disease (AD) is a complex progressive neurodegenerative disorder commonly characterized by short term memory loss. Presently no effective therapeutic treatments exist that can completely cure this disease. The cause of Alzheimer's is still unclear, however one of the other major factors involved in AD pathogenesis are the genetic factors and around 70 % risk of the disease is assumed to be due to the large number of genes involved. Although genetic association studies have revealed a number of potential AD susceptibility genes, there still exists a need for identification of unidentified AD-associated genes and therapeutic targets to have better understanding of the disease-causing mechanisms of Alzheimer's towards development of effective AD therapeutics. In the present study, we have used machine learning approach to identify candidate AD associated genes by integrating topological properties of the genes from the protein-protein interaction networks, sequence features and functional annotations. We also used molecular docking approach and screened already known anti-Alzheimer drugs against the novel predicted probable targets of AD and observed that an investigational drug, AL-108, had high affinity for majority of the possible therapeutic targets. Furthermore, we performed molecular dynamics simulations and MM/GBSA calculations on the docked complexes to validate our preliminary findings. To the best of our knowledge, this is the first comprehensive study of its kind for identification of putative Alzheimer-associated genes using machine learning approaches and we propose that such computational studies can improve our understanding on the core etiology of AD which could lead to the development of effective anti-Alzheimer drugs.

  13. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  14. Phylogenetic and functional analysis of metagenome sequence from high-temperature archaeal habitats demonstrate linkages between metabolic potential and geochemistry

    Directory of Open Access Journals (Sweden)

    William P. Inskeep

    2013-05-01

    Full Text Available Geothermal habitats in Yellowstone National Park (YNP provide an unparalled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (~40-45 Mbase Sanger sequencing per site was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G+C content and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH. These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high temperature systems of YNP.

  15. Functional promoter upstream p53 regulatory sequence of IGFBP3 that is silenced by tumor specific methylation

    International Nuclear Information System (INIS)

    Hanafusa, Tadashi; Shinji, Toshiyuki; Shiraha, Hidenori; Nouso, Kazuhiro; Iwasaki, Yoshiaki; Yumoto, Eichiro; Ono, Toshiro; Koide, Norio

    2005-01-01

    Insulin-like growth factor binding protein (IGFBP)-3 functions as a carrier of insulin-like growth factors (IGFs) in circulation and a mediator of the growth suppression signal in cells. There are two reported p53 regulatory regions in the IGFBP3 gene; one upstream of the promoter and one intronic. We previously reported a hot spot of promoter hypermethylation of IGFBP-3 in human hepatocellular carcinomas and derivative cell lines. As the hot spot locates at the putative upstream p53 consensus sequences, these p53 consensus sequences are really functional is a question to be answered. In this study, we examined the p53 consensus sequences upstream of the IGFBP-3 promoter for the p53 induced expression of IGFBP-3. Deletion, mutagenesis, and methylation constructs of IGFBP-3 promoter were assessed in the human hepatoblastoma cell line HepG2 for promoter activity. Deletions and mutations of these sequences completely abolished the expression of IGFBP-3 in the presence of p53 overexpression. In vitro methylation of these p53 consensus sequences also suppressed IGFBP-3 expression. In contrast, the expression of IGFBP-3 was not affected in the absence of p53 overexpression. Further, we observed by electrophoresis mobility shift assay that p53 binding to the promoter region was diminished when methylated. From these observations, we conclude that four out of eleven p53 consensus sequences upstream of the IGFBP-3 promoter are essential for the p53 induced expression of IGFBP-3, and hypermethylation of these sequences selectively suppresses p53 induced IGFBP-3 expression in HepG2 cells

  16. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

    Directory of Open Access Journals (Sweden)

    Nachon Raethong

    2016-01-01

    Full Text Available Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes, electrochemical potential-driven transporters (33 genes, and primary active transporters (15 genes. To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction.

  17. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study.

    Science.gov (United States)

    Raethong, Nachon; Wong-Ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H(+)-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction.

  18. A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

    Science.gov (United States)

    Keel, B N; Nonneman, D J; Rohrer, G A

    2017-08-01

    Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  19. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing.

    Science.gov (United States)

    Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan

    2017-01-01

    Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces , and Zygosaccharomyces ) and lactic acid bacteria (genus Lactobacillus ) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into

  20. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing

    Directory of Open Access Journals (Sweden)

    Zhewei Song

    2017-07-01

    Full Text Available Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces and lactic acid bacteria (genus Lactobacillus classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol to acid (lactic acid and acetic acid in Chinese Maotai-flavor liquor production. Our findings provide

  1. Discovery and functional prioritization of Parkinson's disease candidate genes from large-scale whole exome sequencing

    NARCIS (Netherlands)

    I. Jansen (Iris); Ye, H. (Hui); Heetveld, S. (Sasja); Lechler, M.C. (Marie C.); Michels, H. (Helen); Seinstra, R.I. (Renée I.); Lubbe, S.J. (Steven J.); Drouet, V. (Valérie); S. Lesage (Suzanne); E. Majounie (Elisa); Gibbs, J.R. (J.Raphael); M.A. Nalls (Michael); M. Ryten (Mina); Botia, J.A. (Juan A.); J. Vandrovcova (Jana); J. Simón-Sánchez (Javier); Castillo-Lizardo, M. (Melissa); P. Rizzu (Patrizia); Blauwendraat, C. (Cornelis); Chouhan, A.K. (Amit K.); Li, Y. (Yarong); Yogi, P. (Puja); N. Amin (Najaf); C.M. van Duijn (Cornelia); Morris, H.R. (Huw R.); Brice, A. (Alexis); A. Singleton (Andrew); David, D.C. (Della C.); Nollen, E.A. (Ellen A.); A. Jain (Ashok); J.M. Shulman; P. Heutink (Peter); D.G. Hernandez (Dena); S. Arepalli (Sampath); J. Brooks (Janet); Price, R. (Ryan); Nicolas, A. (Aude); S. Chong (Sean); M.R. Cookson (Mark); A. Dillman (Allissa); M. Moore (Matt); B.J. Traynor (Bryan); A. Singleton (Andrew); V. Plagnol (Vincent); Nicholas W Wood,; U.-M. Sheerin (Una-Marie); Jose M Bras,; K. Charlesworth (Kate); M. Gardner (Mac); R. Guerreiro (Rita); D. Trabzuni (Danyah); Hardy, J. (John); M. Sharma; M. Saad (Mohamad); Javier Simón-Sánchez,; C. Schulte (Claudia); J.C. Corvol (Jean-Christophe); Dürr, A. (Alexandra); M. Vidailhet (M.); S. Sveinbjörnsdóttir (Sigurlaug); R.A. Barker (Roger); Caroline H Williams-Gray,; Y. Ben-Shlomo; H.W. Berendse (Henk W.); K.D. van Dijk (Karin); D. Berg (Daniela); K. Brockmann; K.D. Wurster (Kathrin); Mätzler, W. (Walter); Gasser, T. (Thomas); M. Martinez (Maria); R.M.A. de Bie (Rob); A. Biffi (Alessandro); D. Velseboer (Daan); B.R. Bloem (Bastiaan); B. Post (Bart); M. Wickremaratchi (Mirdhu); B. van de Warrenburg (Bart); Z. Bochdanovits (Zoltan); M. von Bonin (Malte); H. Pétursson (Hjörvar); O. Riess (Olaf); D.J. Burn (David); Lubbe, S. (Steven); Cooper, J.M. (J Mark); N.H. McNeill (Nathan); Schapira, A. (Anthony); Lungu, C. (Codrin); Chen, H. (Honglei); Dong, J. (Jing); Chinnery, P.F. (Patrick F.); G. Hudson (Gavin); Clarke, C.E. (Carl E.); C. Moorby (Catriona); C. Counsell (Carl); P. Damier (Philippe); J.-F. Dartigues; P. Deloukas (Panagiotis); E. Gray (Emma); T. Edkins (Ted); Hunt, S.E. (Sarah E.); S.C. Potter (Simon); A. Tashakkori-Ghanbaria (Avazeh); G. Deuschl (Günther); D. Lorenz (Delia); D.T. Dexter (David); F. Durif (Frank); J. Evans (Jonathan Mark); Langford, C. (Cordelia); T. Foltynie (Thomas); A.M. Goate (Alison); C. Harris (Clare); J.J. van Hilten (Jacobus); A. Hofman (Albert); J.R. Hollenbeck (John R.); J.L. Holton (Janice); Hu, M. (Michele); X. Huang (Xiaohong); Illig, T. (Thomas); P.V. Jónsson (Pálmi); J.-C. Lambert; S.S. O'Sullivan (Sean); T. Revesz (Tamas); K. Shaw (Karen); A.J. Lees (Andrew); P. Lichtner (Peter); P. Limousin (Patricia); G. Lopez; Escott-Price, V. (Valentina); J. Pearson (Justin); N. Williams (Nigel); E. Mudanohwo (Ese); J.S. Perlmutter (Joel); Pollak, P. (Pierre); F. Rivadeneira Ramirez (Fernando); A.G. Uitterlinden (André); S.J. Sawcer (Stephen); H. Scheffer (Hans); I. Shoulson (Ira); L. Shulman (Lee); Smith, C. (Colin); R. Walker (Robert); C.C.A. Spencer (Chris C.); A. Strange (Amy); H. Stefansson (Hreinn); F. Bettella (Francesco); J-A. Zwart (John-Anker); Stockton, J.D. (Joanna D.); D. Talbot; C.M. Tanner (Carlie); F. Tison (François); S. Winder-Rhodes (Sophie); K.P. Bhatia (Kailash)

    2017-01-01

    textabstractBackground: Whole-exome sequencing (WES) has been successful in identifying genes that cause familial Parkinson's disease (PD). However, until now this approach has not been deployed to study large cohorts of unrelated participants. To discover rare PD susceptibility variants, we

  2. Insights into phylogeny, sex function and age of Fragaria based on whole chloroplast genome sequencing

    Science.gov (United States)

    Wambui Njunguna; Aaron Liston; Richard Cronn; Tia-Lynn Ashman; Nahla Bassil

    2013-01-01

    The cultivated strawberry is one of the youngest domesticated plants, developed in France in the 1700s from chance hybridization between two western hemisphere octoploid species. However, little is known about the evolution of the species that gave rise to this important fruit crop. Phylogenetic analysis of chloroplast genome sequences of 21 Fragaria...

  3. Integrative analysis of functional genomic annotations and sequencing data to identify rare causal variants via hierarchical modeling

    Directory of Open Access Journals (Sweden)

    Marinela eCapanu

    2015-05-01

    Full Text Available Identifying the small number of rare causal variants contributing to disease has beena major focus of investigation in recent years, but represents a formidable statisticalchallenge due to the rare frequencies with which these variants are observed. In thiscommentary we draw attention to a formal statistical framework, namely hierarchicalmodeling, to combine functional genomic annotations with sequencing data with theobjective of enhancing our ability to identify rare causal variants. Using simulations weshow that in all configurations studied, the hierarchical modeling approach has superiordiscriminatory ability compared to a recently proposed aggregate measure of deleteriousness,the Combined Annotation-Dependent Depletion (CADD score, supportingour premise that aggregate functional genomic measures can more accurately identifycausal variants when used in conjunction with sequencing data through a hierarchicalmodeling approach

  4. Association Between Variants of PRDM1 and NDP52 and Crohn's Disease, Based on Exome Sequencing and Functional Studies

    DEFF Research Database (Denmark)

    Ellinghaus, David; Zhang, Hu; Zeissig, Sebastian

    2013-01-01

    BACKGROUND & AIMS: Genome-wide association studies (GWAS) have identified 140 Crohn's disease (CD) susceptibility loci. For most loci, the variants that cause disease are not known and the genes affected by these variants have not been identified. We aimed to identify variants that cause CD through...... detailed sequencing, genetic association, expression, and functional studies. METHODS: We sequenced whole exomes of 42 unrelated subjects with CD and 5 healthy subjects (controls) and then filtered single nucleotide variants by incorporating association results from meta-analyses of CD GWAS and in silico...... mutation effect prediction algorithms. We then genotyped 9348 subjects with CD, 2868 subjects with ulcerative colitis, and 14,567 control subjects and associated variants analyzed in functional studies using materials from subjects and controls and in vitro model systems. RESULTS: We identified rare...

  5. Oxidative DNA damage in lung tissue from patients with COPD is clustered in functionally significant sequences

    Directory of Open Access Journals (Sweden)

    Viktor M Pastukh

    2011-03-01

    Full Text Available Viktor M Pastukh1, Li Zhang2, Mykhaylo V Ruchko1, Olena Gorodnya1, Gina C Bardwell1, Rubin M Tuder2, Mark N Gillespie11Department of Pharmacology and Center for Lung Biology, University of South Alabama College of Medicine, Mobile, AL, USA; 2Program in Translational Lung Research, Division of Pulmonary Sciences and Critical Care Medicine, Department of Medicine, University of Colorado at Denver, Aurora, CO, USAAbstract: Lung tissue from COPD patients displays oxidative DNA damage. The present study determined whether oxidative DNA damage was randomly distributed or whether it was localized in specific sequences in either the nuclear or mitochondrial genomes. The DNA damage-specific histone, gamma-H2AX, was detected immunohistochemically in alveolar wall cells in lung tissue from COPD patients but not control subjects. A PCR-based method was used to search for oxidized purine base products in selected 200 bp sequences in promoters and coding regions of the VEGF, TGF-β1, HO-1, Egr1, and β-actin genes while quantitative Southern blot analysis was used to detect oxidative damage to the mitochondrial genome in lung tissue from control subjects and COPD patients. Among the nuclear genes examined, oxidative damage was detected in only 1 sequence in lung tissue from COPD patients: the hypoxic response element (HRE of the VEGF promoter. The content of VEGF mRNA also was reduced in COPD lung tissue. Mitochondrial DNA content was unaltered in COPD lung tissue, but there was a substantial increase in mitochondrial DNA strand breaks and/or abasic sites. These findings show that oxidative DNA damage in COPD lungs is prominent in the HRE of the VEGF promoter and in the mitochondrial genome and raise the intriguing possibility that genome and sequence-specific oxidative DNA damage could contribute to transcriptional dysregulation and cell fate decisions in COPD.Keywords: DNA damage, VEGF hypoxic response element, mtDNA, COPD

  6. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    Directory of Open Access Journals (Sweden)

    Johannes Eichner

    Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  7. Functional dissection of the alphavirus capsid protease: sequence requirements for activity.

    Science.gov (United States)

    Thomas, Saijo; Rai, Jagdish; John, Lijo; Günther, Stephan; Drosten, Christian; Pützer, Brigitte M; Schaefer, Stephan

    2010-11-18

    The alphavirus capsid is multifunctional and plays a key role in the viral life cycle. The nucleocapsid domain is released by the self-cleavage activity of the serine protease domain within the capsid. All alphaviruses analyzed to date show this autocatalytic cleavage. Here we have analyzed the sequence requirements for the cleavage activity of Chikungunya virus capsid protease of genus alphavirus. Amongst alphaviruses, the C-terminal amino acid tryptophan (W261) is conserved and found to be important for the cleavage. Mutating tryptophan to alanine (W261A) completely inactivated the protease. Other amino acids near W261 were not having any effect on the activity of this protease. However, serine protease inhibitor AEBSF did not inhibit the activity. Through error-prone PCR we found that isoleucine 227 is important for the effective activity. The loss of activity was analyzed further by molecular modelling and comparison of WT and mutant structures. It was found that lysine introduced at position 227 is spatially very close to the catalytic triad and may disrupt electrostatic interactions in the catalytic site and thus inactivate the enzyme. We are also examining other sequence requirements for this protease activity. We analyzed various amino acid sequence requirements for the activity of ChikV capsid protease and found that amino acids outside the catalytic triads are important for the activity.

  8. MicroRNA repertoire for functional genome research in tilapia identified by deep sequencing.

    Science.gov (United States)

    Yan, Biao; Wang, Zhen-Hua; Zhu, Chang-Dong; Guo, Jin-Tao; Zhao, Jin-Liang

    2014-08-01

    The Nile tilapia (Oreochromis niloticus; Cichlidae) is an economically important species in aquaculture and occupies a prominent position in the aquaculture industry. MicroRNAs (miRNAs) are a class of noncoding RNAs that post-transcriptionally regulate gene expression involved in diverse biological and metabolic processes. To increase the repertoire of miRNAs characterized in tilapia, we used the Illumina/Solexa sequencing technology to sequence a small RNA library using pooled RNA sample isolated from the different developmental stages of tilapia. Bioinformatic analyses suggest that 197 conserved and 27 novel miRNAs are expressed in tilapia. Sequence alignments indicate that all tested miRNAs and miRNAs* are highly conserved across many species. In addition, we characterized the tissue expression patterns of five miRNAs using real-time quantitative PCR. We found that miR-1/206, miR-7/9, and miR-122 is abundantly expressed in muscle, brain, and liver, respectively, implying a potential role in the regulation of tissue differentiation or the maintenance of tissue identity. Overall, our results expand the number of tilapia miRNAs, and the discovery of miRNAs in tilapia genome contributes to a better understanding the role of miRNAs in regulating diverse biological processes.

  9. Cellulase linkers are optimized based on domain type and function: insights from sequence analysis, biophysical measurements, and molecular simulation.

    Directory of Open Access Journals (Sweden)

    Deanne W Sammond

    Full Text Available Cellulase enzymes deconstruct cellulose to glucose, and are often comprised of glycosylated linkers connecting glycoside hydrolases (GHs to carbohydrate-binding modules (CBMs. Although linker modifications can alter cellulase activity, the functional role of linkers beyond domain connectivity remains unknown. Here we investigate cellulase linkers connecting GH Family 6 or 7 catalytic domains to Family 1 or 2 CBMs, from both bacterial and eukaryotic cellulases to identify conserved characteristics potentially related to function. Sequence analysis suggests that the linker lengths between structured domains are optimized based on the GH domain and CBM type, such that linker length may be important for activity. Longer linkers are observed in eukaryotic GH Family 6 cellulases compared to GH Family 7 cellulases. Bacterial GH Family 6 cellulases are found with structured domains in either N to C terminal order, and similar linker lengths suggest there is no effect of domain order on length. O-glycosylation is uniformly distributed across linkers, suggesting that glycans are required along entire linker lengths for proteolysis protection and, as suggested by simulation, for extension. Sequence comparisons show that proline content for bacterial linkers is more than double that observed in eukaryotic linkers, but with fewer putative O-glycan sites, suggesting alternative methods for extension. Conversely, near linker termini where linkers connect to structured domains, O-glycosylation sites are observed less frequently, whereas glycines are more prevalent, suggesting the need for flexibility to achieve proper domain orientations. Putative N-glycosylation sites are quite rare in cellulase linkers, while an N-P motif, which strongly disfavors the attachment of N-glycans, is commonly observed. These results suggest that linkers exhibit features that are likely tailored for optimal function, despite possessing low sequence identity. This study suggests

  10. Analysis of the functional compatibility of SIV capsid sequences in the context of the FIV gag precursor.

    Directory of Open Access Journals (Sweden)

    César A Ovejero

    Full Text Available The formation of immature lentiviral particles is dependent on the multimerization of the Gag polyprotein at the plasma membrane of the infected cells. One key player in the virus assembly process is the capsid (CA domain of Gag, which establishes the protein-protein interactions that give rise to the hexagonal lattice of Gag molecules in the immature virion. To gain a better understanding of the functional equivalence between the CA proteins of simian and feline immunodeficiency viruses (SIV and FIV, respectively, we generated a series of chimeric FIV Gag proteins in which the CA-coding region was partially or totally replaced by its SIV counterpart. All the FIV Gag chimeras were found to be assembly-defective; however, all of them are able to interact with wild-type SIV Gag and be recruited into extracellular virus-like particles, regardless of the SIV CA sequences present in the chimeric FIV Gag. The results presented here markedly contrast with our previous findings showing that chimeric SIVs carrying FIV CA-derived sequences are assembly-competent. Overall, our data support the notion that although the SIV and FIV CA proteins share 51% amino acid sequence similarity and exhibit a similar organization, i.e., an N-terminal domain joined by a flexible linker to a C-terminal domain, their functional exchange between these different lentiviruses is strictly dependent on the context of the recipient Gag precursor.

  11. Analysis of the Sequences, Structures, and Functions of Product-Releasing Enzyme Domains in Fungal Polyketide Synthases

    Directory of Open Access Journals (Sweden)

    Lu Liu

    2017-09-01

    Full Text Available Product-releasing enzyme (PRE domains in fungal non-reducing polyketide synthases (NR-PKSs play a crucial role in catalysis and editing during polyketide biosynthesis, especially accelerating final biosynthetic reactions accompanied with product offloading. However, up to date, the systematic knowledge about PRE domains is deficient. In the present study, the relationships between sequences, structures, and functions of PRE domains were analyzed with 574 NR-PKSs of eight groups (I–VIII. It was found that the PRE domains in NR-PKSs could be mainly classified into three types, thioesterase (TE, reductase (R, and metallo-β-lactamase-type TE (MβL-TE. The widely distributed TE or TE-like domains were involved in NR-PKSs of groups I–IV, VI, and VIII. The R domains appeared in NR-PKSs of groups IV and VII, while the physically discrete MβL-TE domains were employed by most NR-PKSs of group V. The changes of catalytic sites and structural characteristics resulted in PRE functional differentiations. The phylogeny revealed that the evolution of TE domains was accompanied by complex functional divergence. The diverse sequence lengths of TE lid-loops affected substrate specificity with different chain lengths. The volume diversification of TE catalytic pockets contributed to catalytic mechanisms with functional differentiations. The above findings may help to understand the crucial catalysis of fungal aromatic polyketide biosyntheses and govern recombination of NR-PKSs to obtain unnatural target products.

  12. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

    Science.gov (United States)

    Quang, Daniel; Xie, Xiaohui

    2016-06-20

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures.

    Science.gov (United States)

    Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2012-02-15

    We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.

  14. Functional and structural analysis of the DNA sequence conferring glucocorticoid inducibility to the mouse mammary tumor virus gene

    International Nuclear Information System (INIS)

    Skroch, P.

    1987-05-01

    In the first part of my thesis I show that the DNA element conferring glucocorticoid inducibility to the Mouse Mammary Tumor Virus (HRE) has enhancer properties. It activates a heterologous promoter - that of the β-globin gene, independently of distance, position and orientation. These properties however have to be regarded in relation to the remaining regulatory elements of the activated gene as the recombinants between HRE and the TK gene have demonstrated. In the second part of my thesis I investigated the biological significance of certain sequence motifs of the HRE, which are remarkable by their interaction with transacting factors or sequence homologies with other regulatory DNA elements. I could confirm the generally postulated modular structure of enhancers for the HRE and bring the relevance of the single subdomains for the function of the element into relationship. (orig.) [de

  15. Cross-functional shifts in roadmapping : Sequence analysis of roadmapping practices at a large corporation

    NARCIS (Netherlands)

    Simonse, W.L.; Perks, H.

    2014-01-01

    This study unravels the nature of inter-functional integration in roadmapping. Roadmapping is indicated as an important innovation phenomenon and is practiced by multiple large organizations. Functional integration is widely acknowledged to play a significant role in enhancing new product success.

  16. Genome-wide linkage, exome sequencing and functional analyses identify ABCB6 as the pathogenic gene of dyschromatosis universalis hereditaria.

    Directory of Open Access Journals (Sweden)

    Hong Liu

    Full Text Available As a genetic disorder of abnormal pigmentation, the molecular basis of dyschromatosis universalis hereditaria (DUH had remained unclear until recently when ABCB6 was reported as a causative gene of DUH.We performed genome-wide linkage scan using Illumina Human 660W-Quad BeadChip and exome sequencing analyses using Agilent SureSelect Human All Exon Kits in a multiplex Chinese DUH family to identify the pathogenic mutations and verified the candidate mutations using Sanger sequencing. Quantitative RT-PCR and Immunohistochemistry was performed to verify the expression of the pathogenic gene, Zebrafish was also used to confirm the functional role of ABCB6 in melanocytes and pigmentation.Genome-wide linkage (assuming autosomal dominant inheritance mode and exome sequencing analyses identified ABCB6 as the disease candidate gene by discovering a coding mutation (c.1358C>T; p.Ala453Val that co-segregates with the disease phenotype. Further mutation analysis of ABCB6 in four other DUH families and two sporadic cases by Sanger sequencing confirmed the mutation (c.1358C>T; p.Ala453Val and discovered a second, co-segregating coding mutation (c.964A>C; p.Ser322Lys in one of the four families. Both mutations were heterozygous in DUH patients and not present in the 1000 Genome Project and dbSNP database as well as 1,516 unrelated Chinese healthy controls. Expression analysis in human skin and mutagenesis interrogation in zebrafish confirmed the functional role of ABCB6 in melanocytes and pigmentation. Given the involvement of ABCB6 mutations in coloboma, we performed ophthalmological examination of the DUH carriers of ABCB6 mutations and found ocular abnormalities in them.Our study has advanced our understanding of DUH pathogenesis and revealed the shared pathological mechanism between pigmentary DUH and ocular coloboma.

  17. Optical properties and electronic transitions of DNA oligonucleotides as a function of composition and stacking sequence.

    Science.gov (United States)

    Schimelman, Jacob B; Dryden, Daniel M; Poudel, Lokendra; Krawiec, Katherine E; Ma, Yingfang; Podgornik, Rudolf; Parsegian, V Adrian; Denoyer, Linda K; Ching, Wai-Yim; Steinmetz, Nicole F; French, Roger H

    2015-02-14

    The role of base pair composition and stacking sequence in the optical properties and electronic transitions of DNA is of fundamental interest. We present and compare the optical properties of DNA oligonucleotides (AT)10, (AT)5(GC)5, and (AT-GC)5 using both ab initio methods and UV-vis molar absorbance measurements. Our data indicate a strong dependence of both the position and intensity of UV absorbance features on oligonucleotide composition and stacking sequence. The partial densities of states for each oligonucleotide indicate that the valence band edge arises from a feature associated with the PO4(3-) complex anion, and the conduction band edge arises from anti-bonding states in DNA base pairs. The results show a strong correspondence between the ab initio and experimentally determined optical properties. These results highlight the benefit of full spectral analysis of DNA, as opposed to reductive methods that consider only the 260 nm absorbance (A260) or simple purity ratios, such as A260/A230 or A260/A280, and suggest that the slope of the absorption edge onset may provide a useful metric for the degree of base pair stacking in DNA. These insights may prove useful for applications in biology, bioelectronics, and mesoscale self-assembly.

  18. Exploring the sequence-function relationship in transcriptional regulation by the lac O1 operator.

    Science.gov (United States)

    Maity, Tuhin S; Jha, Ramesh K; Strauss, Charlie E M; Dunbar, John

    2012-07-01

    Understanding how binding of a transcription factor to an operator is influenced by the operator sequence is an ongoing quest. It facilitates discovery of alternative binding sites as well as tuning of transcriptional regulation. We investigated the behavior of the Escherichia coli Lac repressor (LacI) protein with a large set of lac O(1) operator variants. The 114 variants examined contained a mean of 2.9 (range 0-4) mutations at positions -4, -2, +2 and +4 in the minimally required 17 bp operator. The relative affinity of LacI for the operators was examined by quantifying expression of a GFP reporter gene and Rosetta structural modeling. The combinations of mutations in the operator sequence created a wide range of regulatory behaviors. We observed variations in the GFP fluorescent signal among the operator variants of more than an order of magnitude under both uninduced and induced conditions. We found that a single nucleotide change may result in changes of up to six- and 12-fold in uninduced and induced GFP signals, respectively. Among the four positions mutated, we found that nucleotide G at position -4 is strongly correlated with strong repression. By Rosetta modeling, we found a significant correlation between the calculated binding energy and the experimentally observed transcriptional repression strength for many operators. However, exceptions were also observed, underscoring the necessity for further improvement in biophysical models of protein-DNA interactions. © 2012 The Authors Journal compilation © 2012 FEBS.

  19. The Hydrologic Implications Of Unique Urban Soil Horizon Sequencing On The Functions Of Passive Green Infrastructure

    Science.gov (United States)

    Shuster, W.; Schifman, L. A.; Herrmann, D.

    2017-12-01

    Green infrastructure represents a broad set of site- to landscape-scale practices that can be flexibly implemented to increase sewershed retention capacity, and can thereby improve on the management of water quantity and quality. Although much green infrastructure presents as formal engineered designs, urbanized landscapes with highly-interspersed pervious surfaces (e.g., right-of-way, parks, lawns, vacant land) may offer ecosystem services as passive, infiltrative green infrastructure. Yet, infiltration and drainage processes are regulated by soil surface conditions, and then the layering of subsoil horizons, respectively. Drawing on a unique urban soil taxonomic and hydrologic dataset collected in 12 cities (each city representing a major soil order), we determined how urbanization processes altered the sequence of soil horizons (compared to pre-urbanized reference soil pedons) and modeled the hydrologic implications of these shifts in layering with an unsaturated zone code (HYDRUS2D). We found that the different layering sequences in urbanized soils render different types and extents of supporting (plant-available soil water), provisioning (productive vegetation), and regulating (runoff mitigation) ecosystem services.

  20. Plasmid origin of replication of herpesvirus papio: DNA sequence and enhancer function.

    Science.gov (United States)

    Loeb, D D; Sung, N S; Pesano, R L; Sexton, C J; Hutchison, C; Pagano, J S

    1990-01-01

    Herpesvirus papio (HVP) is a lymphotropic virus of baboons which is related to Epstein-Barr virus (EBV) and produces latent infection. The nucleotide sequence of the 5,775-base-pair (bp) EcoRI K fragment of HVP, which has previously been shown to confer the ability to replicate autonomously, has been determined. Within this DNA fragment is a region which bears structural and sequence similarity to the ori-P region of EBV. The HVP ori-P region has a 10- by 26-bp tandem array which is related to the 20- by 30-bp tandem array from the EBV ori-P region. In HVP there is an intervening region of 764 bp followed by five partial copies of the 26-bp monomer. Both the EBV and HVP 3' regions have the potential to form dyad structures which, however, differ in arrangement. We also demonstrate that a transcriptional enhancer which requires transactivation by a virus-encoded factor is present in the HVP ori-P. Images PMID:2159548

  1. Sequence-function-stability relationships in proteins from datasets of functionally annotated variants: The case of TEM beta-lactamases

    NARCIS (Netherlands)

    Abriata, L.A.; Salverda, M.L.M.; Tomatis, P.E.

    2012-01-01

    A dataset of TEM lactamase variants with different substrate and inhibition profiles was compiled and analyzed. Trends show that loops are the main evolvable regions in these enzymes, gradually accumulating mutations to generate increasingly complex functions. Notably, many mutations present in

  2. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy.

    Science.gov (United States)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis G; De Francisci, Davide; Valle, Giorgio; Angelidaki, Irini

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which different members have distinct roles in the establishment of a collective organization. Deciphering the complex microbial community engaged in this process is interesting both for unraveling the network of bacterial interactions and for applicability potential to the derived knowledge. In this study, we dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy performed using >400 proteins revealed that the biogas community is a trove of new species. A new approach based on functional properties as per network representation was developed to assign roles to the microbial species. The organization of the anaerobic digestion microbiome is resembled by a funnel concept, in which the microbial consortium presents a progressive functional specialization while reaching the final step of the process (i.e., methanogenesis). Key microbial genomes encoding enzymes involved in specific metabolic pathways, such as carbohydrates utilization, fatty acids degradation, amino acids fermentation, and syntrophic acetate oxidation, were identified. Additionally, the analysis identified a new uncultured archaeon that was putatively related to Methanomassiliicoccales but surprisingly having a methylotrophic methanogenic pathway. This study is a pioneer research on the phylogenetic and functional characterization of the microbial community populating biogas reactors. By applying for the first time high-throughput sequencing and a novel binning strategy, the

  3. NHE3 in an ancestral vertebrate: primary sequence, distribution, localization, and function in gills.

    Science.gov (United States)

    Choe, Keith P; Kato, Akira; Hirose, Shigehisa; Plata, Consuelo; Sindic, Aleksandra; Romero, Michael F; Claiborne, J B; Evans, David H

    2005-11-01

    In mammals, the Na+/H+ exchanger 3 (NHE3) is expressed with Na+/K+-ATPase in renal proximal tubules, where it secretes H+ and absorbs Na+ to maintain blood pH and volume. In elasmobranchs (sharks, skates, and stingrays), the gills are the dominant site of pH and osmoregulation. This study was conducted to determine whether epithelial NHE homologs exist in elasmobranchs and, if so, to localize their expression in gills and determine whether their expression is altered by environmental salinity or hypercapnia. Degenerate primers and RT-PCR were used to deduce partial sequences of mammalian NHE2 and NHE3 homologs from the gills of the euryhaline Atlantic stingray (Dasyatis sabina). Real-time PCR was then used to demonstrate that mRNA expression of the NHE3 homolog increased when stingrays were transferred to low salinities but not during hypercapnia. Expression of the NHE2 homolog did not change with either treatment. Rapid amplification of cDNA was then used to deduce the complete sequence of a putative NHE3. The 2,744-base pair cDNA includes a coding region for a 2,511-amino acid protein that is 70% identical to human NHE3 (SLC9A3). Antisera generated against the carboxyl tail of the putative stingray NHE3 labeled the apical membranes of Na+/K+-ATPase-rich epithelial cells, and acclimation to freshwater caused a redistribution of labeling in the gills. This study provides the first NHE3 cloned from an elasmobranch and is the first to demonstrate an increase in gill NHE3 expression during acclimation to low salinities, suggesting that NHE3 can absorb Na+ from ion-poor environments.

  4. Applications of Some Classes of Sequences on Approximation of Functions (Signals by Almost Generalized Nörlund Means of Their Fourier Series

    Directory of Open Access Journals (Sweden)

    Xhevat Z. Krasniqi

    2015-11-01

    Full Text Available In this paper, using rest bounded variation sequences and head bounded variation sequences, some new results on approximation of functions (signals by almost generalized Nörlund means of their Fourier series are obtained. To our best knowledge this the first time to use such classes of sequences on approximations of the type treated in this paper. In addition, several corollaries are derived from our results as well as those obtained previously by others.

  5. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    Science.gov (United States)

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-10-28

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Nucleobase-functionalized grapheme nanoribbons for accurate high-speed DNA sequencing

    NARCIS (Netherlands)

    Paulechka, Eugene; Wassenaar, Tsjerk; Kroenlein, Kenneth; Kazakov, Andrei; Smolyanitsky, Alex

    2016-01-01

    We propose a water-immersed nucleobase-functionalized suspended graphene nanoribbon as an intrinsically selective device for nucleotide detection. The proposed sensing method combines Watson–Crick selective base pairing with graphene's capacity for converting anisotropic lattice strain to changes in

  7. Diversity and functions of bacterial community in drinking water biofilms revealed by high-throughput sequencing

    Science.gov (United States)

    Chao, Yuanqing; Mao, Yanping; Wang, Zhiping; Zhang, Tong

    2015-06-01

    The development of biofilms in drinking water (DW) systems may cause various problems to water quality. To investigate the community structure of biofilms on different pipe materials and the global/specific metabolic functions of DW biofilms, PCR-based 454 pyrosequencing data for 16S rRNA genes and Illumina metagenomic data were generated and analysed. Considerable differences in bacterial diversity and taxonomic structure were identified between biofilms formed on stainless steel and biofilms formed on plastics, indicating that the metallic materials facilitate the formation of higher diversity biofilms. Moreover, variations in several dominant genera were observed during biofilm formation. Based on PCA analysis, the global functions in the DW biofilms were similar to other DW metagenomes. Beyond the global functions, the occurrences and abundances of specific protective genes involved in the glutathione metabolism, the SoxRS system, the OxyR system, RpoS regulated genes, and the production/degradation of extracellular polymeric substances were also evaluated. A near-complete and low-contamination draft genome was constructed from the metagenome of the DW biofilm, based on the coverage and tetranucleotide frequencies, and identified as a Bradyrhizobiaceae-like bacterium according to a phylogenetic analysis. Our findings provide new insight into DW biofilms, especially in terms of their metabolic functions.

  8. GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank.

    Science.gov (United States)

    You, Ronghui; Zhang, Zihan; Xiong, Yi; Sun, Fengzhu; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2018-03-07

    Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only advantage over state-of-the-art AFP methods. http://datamining-iip.fudan.edu.cn/golabeler. zhusf@fudan.edu.cn. Supplementary data are available at Bioinformatics online.

  9. The Genome Sequence of Leishmania (Leishmania) amazonensis: Functional Annotation and Extended Analysis of Gene Models

    Science.gov (United States)

    Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; e Ferreira, Renata Carmona; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana

    2013-01-01

    We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3′-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904

  10. Functional translation and linguistic variation: the use of didactic sequence in teaching languages

    Directory of Open Access Journals (Sweden)

    Valdecy Oliveira Pontes

    2017-12-01

    Full Text Available In the context of the approach of the linguistic variation of Spanish and the use of Functionalist Translation in Foreign Language classes, this article aims to report the results of the application of a Didactic Sequence (SD, in the style of the Geneva School, Hispanic plays for the teaching of linguistic variation in the pronominal treatment forms of the Spanish-Portuguese Brazilian language pair. SD was applied in the subject "Introduction to Translation Studies in Spanish Language" (2nd semester, offered by the course in Letters - Spanish Language and its Literatures, of the Federal University of Ceará. This article was based on the theoretical foundations of Functionalist Translation (NORD, 1994, 1996, 2009, 2012, Translation and Sociolinguistics (BOLAÑOS-CUELLAR, 2000; MAYORAL, 1998, elaboration of SD (DOLZ; NOVERRAZ; SCHNEUWLY, 2004; CRISTÓVÃO, 2010; BARROS, 2012 and research on the variation in the forms of treatment of Spanish and Portuguese (FONTANELLA DE WEINBER, 1999; SCHERRE et al, 2015.

  11. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy

    DEFF Research Database (Denmark)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which...... performed using >400 proteins revealed that the biogas community is a trove of new species. A new approach based on functional properties as per network representation was developed to assign roles to the microbial species. The organization of the anaerobic digestion microbiome is resembled by a funnel...... on the phylogenetic and functional characterization of the microbial community populating biogas reactors. By applying for the first time high-throughput sequencing and a novel binning strategy, the identified genes were anchored to single genomes providing a clear understanding of their metabolic pathways...

  12. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    Science.gov (United States)

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs

  13. Proline: the distribution, frequency, positioning, and common functional roles of proline and polyproline sequences in the human proteome.

    Directory of Open Access Journals (Sweden)

    Alexander A Morgan

    Full Text Available Proline is an anomalous amino acid. Its nitrogen atom is covalently locked within a ring, thus it is the only proteinogenic amino acid with a constrained phi angle. Sequences of three consecutive prolines can fold into polyproline helices, structures that join alpha helices and beta pleats as architectural motifs in protein configuration. Triproline helices are participants in protein-protein signaling interactions. Longer spans of repeat prolines also occur, containing as many as 27 consecutive proline residues. Little is known about the frequency, positioning, and functional significance of these proline sequences. Therefore we have undertaken a systematic bioinformatics study of proline residues in proteins. We analyzed the distribution and frequency of 687,434 proline residues among 18,666 human proteins, identifying single residues, dimers, trimers, and longer repeats. Proline accounts for 6.3% of the 10,882,808 protein amino acids. Of all proline residues, 4.4% are in trimers or longer spans. We detected patterns that influence function based on proline location, spacing, and concentration. We propose a classification based on proline-rich, polyproline-rich, and proline-poor status. Whereas singlet proline residues are often found in proteins that display recurring architectural patterns, trimers or longer proline sequences tend be associated with the absence of repetitive structural motifs. Spans of 6 or more are associated with DNA/RNA processing, actin, and developmental processes. We also suggest a role for proline in Kruppel-type zinc finger protein control of DNA expression, and in the nucleation and translocation of actin by the formin complex.

  14. Whole exome re-sequencing implicates CCDC38 and cilia structure and function in resistance to smoking related airflow obstruction.

    Directory of Open Access Journals (Sweden)

    Louise V Wain

    2014-05-01

    Full Text Available Chronic obstructive pulmonary disease (COPD is a leading cause of global morbidity and mortality and, whilst smoking remains the single most important risk factor, COPD risk is heritable. Of 26 independent genomic regions showing association with lung function in genome-wide association studies, eleven have been reported to show association with airflow obstruction. Although the main risk factor for COPD is smoking, some individuals are observed to have a high forced expired volume in 1 second (FEV1 despite many years of heavy smoking. We hypothesised that these "resistant smokers" may harbour variants which protect against lung function decline caused by smoking and provide insight into the genetic determinants of lung health. We undertook whole exome re-sequencing of 100 heavy smokers who had healthy lung function given their age, sex, height and smoking history and applied three complementary approaches to explore the genetic architecture of smoking resistance. Firstly, we identified novel functional variants in the "resistant smokers" and looked for enrichment of these novel variants within biological pathways. Secondly, we undertook association testing of all exonic variants individually with two independent control sets. Thirdly, we undertook gene-based association testing of all exonic variants. Our strongest signal of association with smoking resistance for a non-synonymous SNP was for rs10859974 (P = 2.34 × 10(-4 in CCDC38, a gene which has previously been reported to show association with FEV1/FVC, and we demonstrate moderate expression of CCDC38 in bronchial epithelial cells. We identified an enrichment of novel putatively functional variants in genes related to cilia structure and function in resistant smokers. Ciliary function abnormalities are known to be associated with both smoking and reduced mucociliary clearance in patients with COPD. We suggest that genetic influences on the development or function of cilia in the bronchial

  15. The shikimate pathway: review of amino acid sequence, function and three-dimensional structures of the enzymes.

    Science.gov (United States)

    Mir, Rafia; Jallu, Shais; Singh, T P

    2015-06-01

    The aromatic compounds such as aromatic amino acids, vitamin K and ubiquinone are important prerequisites for the metabolism of an organism. All organisms can synthesize these aromatic metabolites through shikimate pathway, except for mammals which are dependent on their diet for these compounds. The pathway converts phosphoenolpyruvate and erythrose 4-phosphate to chorismate through seven enzymatically catalyzed steps and chorismate serves as a precursor for the synthesis of variety of aromatic compounds. These enzymes have shown to play a vital role for the viability of microorganisms and thus are suggested to present attractive molecular targets for the design of novel antimicrobial drugs. This review focuses on the seven enzymes of the shikimate pathway, highlighting their primary sequences, functions and three-dimensional structures. The understanding of their active site amino acid maps, functions and three-dimensional structures will provide a framework on which the rational design of antimicrobial drugs would be based. Comparing the full length amino acid sequences and the X-ray crystal structures of these enzymes from bacteria, fungi and plant sources would contribute in designing a specific drug and/or in developing broad-spectrum compounds with efficacy against a variety of pathogens.

  16. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences.

    Science.gov (United States)

    Kanehisa, Minoru; Sato, Yoko; Morishima, Kanae

    2016-02-22

    BlastKOALA and GhostKOALA are automatic annotation servers for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and KEGG modules to infer high-level functions of the organism or the ecosystem. Both servers are made freely available at the KEGG Web site (http://www.kegg.jp/blastkoala/). In BlastKOALA, the KO assignment is performed by a modified version of the internally used KOALA algorithm after the BLAST search against a non-redundant dataset of pangenome sequences at the species, genus or family level, which is generated from the KEGG GENES database by retaining the KO content of each taxonomic category. In GhostKOALA, which utilizes more rapid GHOSTX for database search and is suitable for metagenome annotation, the pangenome dataset is supplemented with Cd-hit clusters including those for viral genes. The result files may be downloaded and manipulated for further KEGG Mapper analysis, such as comparative pathway analysis using multiple BlastKOALA results. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  17. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes

    DEFF Research Database (Denmark)

    Lin, Michael F; Kheradpour, Pouya; Washietl, Stefan

    2011-01-01

    conservation compared to typical protein-coding genes—especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29......-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ~2% of their synonymous sites. We collect numerous lines of evidence that the observed...... synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian...

  18. Non-parametric Bayesian models of response function in dynamic image sequences

    Czech Academy of Sciences Publication Activity Database

    Tichý, Ondřej; Šmídl, Václav

    2016-01-01

    Roč. 151, č. 1 (2016), s. 90-100 ISSN 1077-3142 R&D Projects: GA ČR GA13-29225S Institutional support: RVO:67985556 Keywords : Response function * Blind source separation * Dynamic medical imaging * Probabilistic models * Bayesian methods Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 2.498, year: 2016 http://library.utia.cas.cz/separaty/2016/AS/tichy-0456983.pdf

  19. Multi-Functional Sensing for Swarm Robots Using Time Sequence Classification: HoverBot, an Example

    Directory of Open Access Journals (Sweden)

    Markus P. Nemitz

    2018-05-01

    Full Text Available Scaling up robot swarms to collectives of hundreds or even thousands without sacrificing sensing, processing, and locomotion capabilities is a challenging problem. Low-cost robots are potentially scalable, but the majority of existing systems have limited capabilities, and these limitations substantially constrain the type of experiments that could be performed by robotics researchers. Instead of adding functionality by adding more components and therefore increasing the cost, we demonstrate how low-cost hardware can be used beyond its standard functionality. We systematically review 15 swarm robotic systems and analyse their sensing capabilities by applying a general sensor model from the sensing and measurement community. This work is based on the HoverBot system. A HoverBot is a levitating circuit board that manoeuvres by pulling itself towards magnetic anchors that are embedded into the robot arena. We show that HoverBot’s magnetic field readouts from its Hall-effect sensor can be associated to successful movement, robot rotation and collision measurands. We build a time series classifier based on these magnetic field readouts. We modify and apply signal processing techniques to enable the online classification of the time-variant magnetic field measurements on HoverBot’s low-cost microcontroller. We enabled HoverBot with successful movement, rotation, and collision sensing capabilities by utilising its single Hall-effect sensor. We discuss how our classification method could be applied to other sensors to increase a robot’s functionality while retaining its cost.

  20. Detection of discriminative sequence patterns in the neighborhood of proline cis peptide bonds and their functional annotation

    Directory of Open Access Journals (Sweden)

    Papaloukas Costas

    2009-04-01

    Full Text Available Abstract Background Polypeptides are composed of amino acids covalently bonded via a peptide bond. The majority of peptide bonds in proteins is found to occur in the trans conformation. In spite of their infrequent occurrence, cis peptide bonds play a key role in the protein structure and function, as well as in many significant biological processes. Results We perform a systematic analysis of regions in protein sequences that contain a proline cis peptide bond in order to discover non-random associations between the primary sequence and the nature of proline cis/trans isomerization. For this purpose an efficient pattern discovery algorithm is employed which discovers regular expression-type patterns that are overrepresented (i.e. appear frequently repeated in a set of sequences. Four types of pattern discovery are performed: i exact pattern discovery, ii pattern discovery using a chemical equivalency set, iii pattern discovery using a structural equivalency set and iv pattern discovery using certain amino acids' physicochemical properties. The extracted patterns are carefully validated using a specially implemented scoring function and a significance measure (i.e. log-probability estimate indicative of their specificity. The score threshold for the first three types of pattern discovery is 0.90 while for the last type of pattern discovery 0.80. Regarding the significance measure, all patterns yielded values in the range [-9, -31] which ensure that the derived patterns are highly unlikely to have emerged by chance. Among the highest scoring patterns, most of them are consistent with previous investigations concerning the neighborhood of cis proline peptide bonds, and many new ones are identified. Finally, the extracted patterns are systematically compared against the PROSITE database, in order to gain insight into the functional implications of cis prolyl bonds. Conclusion Cis patterns with matches in the PROSITE database fell mostly into two

  1. Intrinsic width and luminosity function of the M92 main sequence

    International Nuclear Information System (INIS)

    Sandage, A.; Katem, B.

    1983-01-01

    Measurements of B and V magnitudes of approx.475 identified stars in the magnitude interval 18.0 - 4 is too low. The luminosity function, obtained from the present data, is compared with that determined earlier by Tayler, by Hartwick, by van den Bergh, and with Fukuoka and Simoda, with good agreement. The evidence favors that phi(M/sub v/) flattens fainter than M/sub v/approx. =+6 as predicted in some dynamical models, due to loss of low mass stars

  2. Candida albicans Agglutinin-Like Sequence (Als) Family Vignettes: A Review of Als Protein Structure and Function

    Science.gov (United States)

    Hoyer, Lois L.; Cota, Ernesto

    2016-01-01

    Approximately two decades have passed since the description of the first gene in the Candida albicans ALS (agglutinin-like sequence) family. Since that time, much has been learned about the composition of the family and the function of its encoded cell-surface glycoproteins. Solution of the structure of the Als adhesive domain provides the opportunity to evaluate the molecular basis for protein function. This review article is formatted as a series of fundamental questions and explores the diversity of the Als proteins, as well as their role in ligand binding, aggregative effects, and attachment to abiotic surfaces. Interaction of Als proteins with each other, their functional equivalence, and the effects of protein abundance on phenotypic conclusions are also examined. Structural features of Als proteins that may facilitate invasive function are considered. Conclusions that are firmly supported by the literature are presented while highlighting areas that require additional investigation to reveal basic features of the Als proteins, their relatedness to each other, and their roles in C. albicans biology. PMID:27014205

  3. Identification and Functional Analysis of Gene Regulatory Sequences Interacting with Colorectal Tumor Suppressors

    DEFF Research Database (Denmark)

    Dahlgaard, Katja; Troelsen, Jesper

    2018-01-01

    Several tumor suppressors possess gene regulatory activity. Here, we describe how promoter and promoter/enhancer reporter assays can be used to characterize a colorectal tumor suppressor proteins’ gene regulatory activity of possible target genes. In the first part, a bioinformatic approach...... of the quick and efficient In-Fusion cloning method, and how to carry out transient transfections of Caco-2 colon cancer cells with the produced luciferase reporter plasmids using polyethyleneimine (PEI). A plan describing how to set up and carry out the luciferase expression assay is presented. The luciferase...... to identify relevant gene regulatory regions of potential target genes is presented. In the second part, it is demonstrated how to prepare and carry out the functional assay. We explain how to clone the bioinformatically identified gene regulatory regions into luciferase reporter plasmids by the use...

  4. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    Directory of Open Access Journals (Sweden)

    Chandrasekhar Natarajan

    2015-12-01

    Full Text Available A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages, and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization. In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.

  5. A Sequence and Structure Based Method to Predict Putative Substrates, Functions and Regulatory Networks of Endo Proteases

    Science.gov (United States)

    Venkatraman, Prasanna; Balakrishnan, Satish; Rao, Shashidhar; Hooda, Yogesh; Pol, Suyog

    2009-01-01

    Background Proteases play a central role in cellular homeostasis and are responsible for the spatio- temporal regulation of function. Many putative proteases have been recently identified through genomic approaches, leading to a surge in global profiling attempts to characterize their function. Through such efforts and others it has become evident that many proteases play non-traditional roles. Accordingly, the number and the variety of the substrate repertoire of proteases are expected to be much larger than previously assumed. In line with such global profiling attempts, we present here a method for the prediction of natural substrates of endo proteases (human proteases used as an example) by employing short peptide sequences as specificity determinants. Methodology/Principal Findings Our method incorporates specificity determinants unique to individual enzymes and physiologically relevant dual filters namely, solvent accessible surface area-a parameter dependent on protein three-dimensional structure and subcellular localization. By incorporating such hitherto unused principles in prediction methods, a novel ligand docking strategy to mimic substrate binding at the active site of the enzyme, and GO functions, we identify and perform subjective validation on putative substrates of matriptase and highlight new functions of the enzyme. Using relative solvent accessibility to rank order we show how new protease regulatory networks and enzyme cascades can be created. Conclusion We believe that our physiologically relevant computational approach would be a very useful complementary method in the current day attempts to profile proteases (endo proteases in particular) and their substrates. In addition, by using functional annotations, we have demonstrated how normal and unknown functions of a protease can be envisaged. We have developed a network which can be integrated to create a proteolytic world. This network can in turn be extended to integrate other regulatory

  6. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    Science.gov (United States)

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  7. The function analysis of full-length cDNA sequence from IRM-2 mouse cDNA library

    International Nuclear Information System (INIS)

    Wang Qin; Liu Xiaoqiu; Xu Chang; Du Liqing; Sun Zhijuan; Wang Yan; Liu Qiang; Song Li; Li Jin; Fan Feiyue

    2013-01-01

    Objective: To identify the function of full-length cDNA sequence from IRM-2 mouse cDNA library. Methods: Full-length cDNA products were amplified by PCR from IRM-2 mouse cDNA library according to twenty-one pieces of expressed sequence tag. The expression of full-length cDNAs were detected after mouse embryonic fibroblasts were exposed to 6.5 Gy γ-ray radiation. And the effect on the growth of radiosensitivity cells AT5B1VA transfected with full-length cDNAs was investigated. Results: The expression of No.4, 5 and 2 full-length cDNAs from IRM-2 mouse were higher than that of parental ICR and 615 mouse after mouse embryonic fibroblasts irradiated with γ-ray radiation. And the survival rate of AT5B1VA cells transfected with No.4, 5 and 2 full-length cDNAs was high. Conclusion: No.4, 5 and 2 full-length cDNAs of IRM-2 mouse are of high radioresistance. (authors)

  8. Targeted next generation sequencing identifies functionally deleterious germline mutations in novel genes in early-onset/familial prostate cancer.

    Directory of Open Access Journals (Sweden)

    Paula Paulo

    2018-04-01

    Full Text Available Considering that mutations in known prostate cancer (PrCa predisposition genes, including those responsible for hereditary breast/ovarian cancer and Lynch syndromes, explain less than 5% of early-onset/familial PrCa, we have sequenced 94 genes associated with cancer predisposition using next generation sequencing (NGS in a series of 121 PrCa patients. We found monoallelic truncating/functionally deleterious mutations in seven genes, including ATM and CHEK2, which have previously been associated with PrCa predisposition, and five new candidate PrCa associated genes involved in cancer predisposing recessive disorders, namely RAD51C, FANCD2, FANCI, CEP57 and RECQL4. Furthermore, using in silico pathogenicity prediction of missense variants among 18 genes associated with breast/ovarian cancer and/or Lynch syndrome, followed by KASP genotyping in 710 healthy controls, we identified "likely pathogenic" missense variants in ATM, BRIP1, CHEK2 and TP53. In conclusion, this study has identified putative PrCa predisposing germline mutations in 14.9% of early-onset/familial PrCa patients. Further data will be necessary to confirm the genetic heterogeneity of inherited PrCa predisposition hinted in this study.

  9. Kinetic and sequence-structure-function analysis of known LinA variants with different hexachlorocyclohexane isomers.

    Directory of Open Access Journals (Sweden)

    Pooja Sharma

    Full Text Available BACKGROUND: Here we report specific activities of all seven naturally occurring LinA variants towards three different isomers, α, γ and δ, of a priority persistent pollutant, hexachlorocyclohexane (HCH. Sequence-structure-function differences contributing to the differences in their stereospecificity for α-, γ-, and δ-HCH and enantiospecificity for (+- and (--α -HCH are also discussed. METHODOLOGY/PRINCIPAL FINDINGS: Enzyme kinetic studies were performed with purified LinA variants. Models of LinA2(B90A A110T, A111C, A110T/A111C and LinA1(B90A were constructed using the FoldX computer algorithm. Turnover rates (min(-1 showed that the LinAs exhibited differential substrate affinity amongst the four HCH isomers tested. α-HCH was found to be the most preferred substrate by all LinA's, followed by the γ and then δ isomer. CONCLUSIONS/SIGNIFICANCE: The kinetic observations suggest that LinA-γ1-7 is the best variant for developing an enzyme-based bioremediation technology for HCH. The majority of the sequence variation in the various linA genes that have been isolated is not neutral, but alters the enantio- and stereoselectivity of the encoded proteins.

  10. Analysis and prediction of translation rate based on sequence and functional features of the mRNA.

    Directory of Open Access Journals (Sweden)

    Tao Huang

    Full Text Available Protein concentrations depend not only on the mRNA level, but also on the translation rate and the degradation rate. Prediction of mRNA's translation rate would provide valuable information for in-depth understanding of the translation mechanism and dynamic proteome. In this study, we developed a new computational model to predict the translation rate, featured by (1 integrating various sequence-derived and functional features, (2 applying the maximum relevance & minimum redundancy method and incremental feature selection to select features to optimize the prediction model, and (3 being able to predict the translation rate of RNA into high or low translation rate category. The prediction accuracies under rich and starvation condition were 68.8% and 70.0%, respectively, evaluated by jackknife cross-validation. It was found that the following features were correlated with translation rate: codon usage frequency, some gene ontology enrichment scores, number of RNA binding proteins known to bind its mRNA product, coding sequence length, protein abundance and 5'UTR free energy. These findings might provide useful information for understanding the mechanisms of translation and dynamic proteome. Our translation rate prediction model might become a high throughput tool for annotating the translation rate of mRNAs in large-scale.

  11. The 2017 North Korea M6 seismic sequence: moment tensor, source time function, and aftershocks

    Science.gov (United States)

    Ni, S.; Zhan, Z.; Chu, R.; He, X.

    2017-12-01

    On September 3rd, 2017, an M6 seismic event occurred in North Korea, with location near previous nuclear test sites. The event features strong P waves and short period Rayleigh waves are observed in contrast to weak S waves, suggesting mostly explosion mechanism. We performed joint inversion for moment tensor and depth with both local and teleseismic waveforms, and find that the event is shallow with mostly isotropic yet substantial non-isotropic components. Deconvolution of seismic waveforms of this event with respect to previous nuclear test events shows clues of complexity in source time function. The event is followed by smaller earthquakes, as early as 8.5 minutes and lasted at least to October. The later events occurred in a compact region, and show clear S waves, suggesting double couple focal mechanism. Via analyzing Rayleigh wave spectrum, these smaller events are found to be shallow. Relative locations, difference in waveforms of the events are used to infer their possible links and generation mechanism.

  12. A bacterial genetic screen identifies functional coding sequences of the insect mariner transposable element Famar1 amplified from the genome of the earwig, Forficula auricularia.

    Science.gov (United States)

    Barry, Elizabeth G; Witherspoon, David J; Lampe, David J

    2004-02-01

    Transposons of the mariner family are widespread in animal genomes and have apparently infected them by horizontal transfer. Most species carry only old defective copies of particular mariner transposons that have diverged greatly from their active horizontally transferred ancestor, while a few contain young, very similar, and active copies. We report here the use of a whole-genome screen in bacteria to isolate somewhat diverged Famar1 copies from the European earwig, Forficula auricularia, that encode functional transposases. Functional and nonfunctional coding sequences of Famar1 and nonfunctional copies of Ammar1 from the European honey bee, Apis mellifera, were sequenced to examine their molecular evolution. No selection for sequence conservation was detected in any clade of a tree derived from these sequences, not even on branches leading to functional copies. This agrees with the current model for mariner transposon evolution that expects neutral evolution within particular hosts, with selection for function occurring only upon horizontal transfer to a new host. Our results further suggest that mariners are not finely tuned genetic entities and that a greater amount of sequence diversification than had previously been appreciated can occur in functional copies in a single host lineage. Finally, this method of isolating active copies can be used to isolate other novel active transposons without resorting to reconstruction of ancestral sequences.

  13. Microbial Aggregate and Functional Community Distribution in a Sequencing Batch Reactor with Anammox Granules

    KAUST Repository

    Sun, Shan

    2013-05-01

    Anammox (anaerobic ammonium oxidation) process is a one-step conversion of ammonia into nitrogen gas with nitrite as an electron acceptor. It has been developed as a sustainable technology for ammonia removal from wastewater in the last decade. For wastewater treatment, anammox biomass was widely developed as microbial aggregate where the conditions for enrichment of anammox community must be delicately controlled and growth of other bacteria especially NOB should be suppressed to enhance nitrogen removal efficiency. Little is known about the distribution of microbial aggregates in anammox process. Thus the objective of our study was to assess whether segregation of biomass occurs in granular anammox system. In this study, a laboratory-scale sequential batch reactor (SBR) was successfully operated for a period of 80 days with granular anammox biomass. Temporal and spatial distribution of microbial aggregates was studied by particle characterization system and the distribution of functional microbial communities was studied with qPCR and 16s rRNA amplicon pyrosequencing. Our study revealed the spatial and temporal distribution of biomass aggregates based on their sizes and density. Granules (>200 μm) preferentially accumulated in the bottom of the reactor while floccules (30-200 μm) were relatively rich at the top layer. The average density of aggregate was higher at the bottom than the density of those at the top layer. Degranulation caused by lack of hydrodynamic shear force in the top layer was considered responsible for this phenomenon. NOB was relatively rich in the top layer while percentage of anammox population was higher at the bottom, and anammox bacteria population gradually increased over a period of time. NOB growth was supposed to be associated with the increase of floccules based on the concurrent occurrence. Thus, segregation of biomass can be utilized to develop an effective strategy to enrich anammox and wash out NOB by shortening the settling

  14. Encoding and recall of finger sequences in experienced pianists compared with musically naïve controls: a combined behavioral and functional imaging study.

    Science.gov (United States)

    Pau, S; Jahn, G; Sakreida, K; Domin, M; Lotze, M

    2013-01-01

    Long-term intensive sensorimotor training alters functional representation of the motor and sensory system and might even result in structural changes. However, there is not much knowledge about how previous training impacts learning transfer and functional representation. We tested 14 amateur pianists and 15 musically naïve participants in a short-term finger sequence training procedure, differing considerably from piano playing and measured associated functional representation with functional magnetic resonance imaging. The conditions consisted of encoding a finger sequence indicated by hand symbols ("sequence encoding") and subsequently replaying the sequence from memory, both with and without auditory feedback ("sequence retrieval"). Piano players activated motor areas and the mirror neuron system more strongly than musically naïve participants during encoding. When retrieving the sequence, musically naïve participants showed higher activation in similar brain areas. Thus, retrieval activations of naïve participants were comparable to encoding activations of piano players, who during retrieval performed the sequences more accurately despite lower motor activations. Interestingly, both groups showed primary auditory activation even during sequence retrieval without auditory feedback, supporting previous reports about coactivation of the auditory cortex after learned association with motor performance. When playing with auditory feedback, only pianists lateralized to the left auditory cortex. During encoding activation in left primary somatosensory cortex in the height of the finger representations had a predictive value for increased motor performance later on (error rates). Contrarily, decreased performance was associated with increased visual cortex activation during encoding. Our study extends previous reports about training transfer of motor knowledge resulting in superior training effects in musicians. Performance increase went along with activity in

  15. Metatranscriptome Sequencing of a Reef-building Coral Elucidates Holobiont Community Gene Functions in Health and Disease

    Science.gov (United States)

    Timberlake, S.; Helbig, T.; Fernando, S.; Penn, K.; Alm, E.; Thompson, F.; Thompson, J. R.

    2012-12-01

    The coral reefs of the Abrolhos Bank of Brazil play a vital ecological role in the health of the Southern Atlantic Ocean, but accelerating rates of disease, particularly white plague, threaten this ecosystem. Thus, an understanding of white plague disease and diagnostic tests for it are urgently needed. The coral animal is associated with a distinct microbiome, a diverse assemblage of eukaryotes, bacteria, and viruses. That these microbes have a great influence on the health of the coral has been long known, however, most of their functions are still mysterious. While recent studies have contrasted healthy and white-plague-associated communities, the causative agents and mechanisms of the disease remain unknown. We collected fragments of healthy and diseased corals, as well as post-disease skeleton, from 12 colonies of the genus Mussismilia, the major component of the reef structure in the Abrolhos bank, and increasingly, a victim of white-plague disease. Fragments were flash-frozen in situ, and prepped for culture-free high throughput sequencing of gene transcripts with the Illumina II-G. While the membership of the microbial communities associated with coral has been previously described, the a coral holobiont community's gene function has, to date, never been assayed by this powerful approach. We designed a bioinformatics pipeline to analyze the short-read data from this complex sample: identifying the functions of genes expressed in the holobiont, and describing the active community's taxonomic composition. We show that gene functions expressed by the coral's bacterial assemblage are distinct from those of the underlying skeleton, and we highlight differences in the disease samples. We find that gene markers for the dissimilatory sulfate reduction pathway more abundant in the disease state, and we further quantify this difference with qPCR. Finally, we report the abundant expression of highly repetitive transcripts in the diseased coral samples, and highlight

  16. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features

    Directory of Open Access Journals (Sweden)

    Aaron Sievers

    2017-04-01

    Full Text Available In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4 on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs, which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST and the

  17. Generation of mast cells from mouse fetus: analysis of differentiation and functionality, and transcriptome profiling using next generation sequencer.

    Directory of Open Access Journals (Sweden)

    Nobuyuki Fukuishi

    Full Text Available While gene knockout technology can reveal the roles of proteins in cellular functions, including in mast cells, fetal death due to gene manipulation frequently interrupts experimental analysis. We generated mast cells from mouse fetal liver (FLMC, and compared the fundamental functions of FLMC with those of bone marrow-derived mouse mast cells (BMMC. Under electron microscopy, numerous small and electron-dense granules were observed in FLMC. In FLMC, the expression levels of a subunit of the FcεRI receptor and degranulation by IgE cross-linking were comparable with BMMC. By flow cytometry we observed surface expression of c-Kit prior to that of FcεRI on FLMC, although on BMMC the expression of c-Kit came after FcεRI. The surface expression levels of Sca-1 and c-Kit, a marker of putative mast cell precursors, were slightly different between bone marrow cells and fetal liver cells, suggesting that differentiation stage or cell type are not necessarily equivalent between both lineages. Moreover, this indicates that phenotypically similar mast cells may not have undergone an identical process of differentiation. By comprehensive analysis using the next generation sequencer, the same frequency of gene expression was observed for 98.6% of all transcripts in both cell types. These results indicate that FLMC could represent a new and useful tool for exploring mast cell differentiation, and may help to elucidate the roles of individual proteins in the function of mast cells where gene manipulation can induce embryonic lethality in the mid to late stages of pregnancy.

  18. Generation of mast cells from mouse fetus: analysis of differentiation and functionality, and transcriptome profiling using next generation sequencer.

    Science.gov (United States)

    Fukuishi, Nobuyuki; Igawa, Yuusuke; Kunimi, Tomoyo; Hamano, Hirofumi; Toyota, Masao; Takahashi, Hironobu; Kenmoku, Hiromichi; Yagi, Yasuyuki; Matsui, Nobuaki; Akagi, Masaaki

    2013-01-01

    While gene knockout technology can reveal the roles of proteins in cellular functions, including in mast cells, fetal death due to gene manipulation frequently interrupts experimental analysis. We generated mast cells from mouse fetal liver (FLMC), and compared the fundamental functions of FLMC with those of bone marrow-derived mouse mast cells (BMMC). Under electron microscopy, numerous small and electron-dense granules were observed in FLMC. In FLMC, the expression levels of a subunit of the FcεRI receptor and degranulation by IgE cross-linking were comparable with BMMC. By flow cytometry we observed surface expression of c-Kit prior to that of FcεRI on FLMC, although on BMMC the expression of c-Kit came after FcεRI. The surface expression levels of Sca-1 and c-Kit, a marker of putative mast cell precursors, were slightly different between bone marrow cells and fetal liver cells, suggesting that differentiation stage or cell type are not necessarily equivalent between both lineages. Moreover, this indicates that phenotypically similar mast cells may not have undergone an identical process of differentiation. By comprehensive analysis using the next generation sequencer, the same frequency of gene expression was observed for 98.6% of all transcripts in both cell types. These results indicate that FLMC could represent a new and useful tool for exploring mast cell differentiation, and may help to elucidate the roles of individual proteins in the function of mast cells where gene manipulation can induce embryonic lethality in the mid to late stages of pregnancy.

  19. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    Science.gov (United States)

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics. Copyright © 2014 by the American Society for Biochemistry and Molecular Biology, Inc.

  20. Transcriptome Sequencing Analysis and Functional Identification of Sex Differentiation Genes from the Mosquito Parasitic Nematode, Romanomermis wuchangensis.

    Directory of Open Access Journals (Sweden)

    Mingyue Duan

    Full Text Available Mosquito-transmitted diseases like malaria and dengue fever are global problem and an estimated 50-100 million of dengue or dengue hemorrhagic fever cases are reported worldwide every year. The mermithid nematode Romanomermis wuchangensis has been successfully used as an ecosystem-friendly biocontrol agent for mosquito prevention in laboratory studies. However, this nematode can not undergo sex differentiation in vitro culture, which has seriously affected their application of biocontrol in the field. In this study, based on transcriptome sequencing analysis of R. wuchangensis, Rwucmab-3, Rwuclaf-1 and Rwuctra-2 were cloned and used to investigate molecular regulatory function of sex differentiation. qRT-PCR results demonstrated that the expression level of Rwucmab-3 between male and female displayed obvious difference on the 3rd day of parasitic stage, which was earlier than Rwuclaf-1 and Rwuctra-2, highlighting sex differentiation process may start on the 3rd day of parasitic stage. Besides, FITC was used as a marker to test dsRNA uptake efficiency of R. wuchangensis, which fluorescence intensity increased with FITC concentration after 16 h incubation, indicating this nematode can successfully ingest soaking solution via its cuticle. RNAi results revealed the sex ratio of R. wuchangensis from RNAi treated groups soaked in dsRNA of Rwucmab-3 was significantly higher than gfp dsRNA treated groups and control groups, highlighting RNAi of Rwumab-3 may hinder the development of male nematodes. These results suggest that Rwucmab-3 mainly involves in the initiation of sex differentiation and the development of male sexual dimorphism. Rwuclaf-1 and Rwuctra-2 may play vital role in nematode reproductive and developmental system. In conclusion, transcript sequences presented in this study could provide more bioinformatics resources for future studies on gene cloning and other molecular regulatory mechanism in R. wuchangensis. Moreover, identification

  1. Comparative analysis of function and interaction of transcription factors in nematodes: Extensive conservation of orthology coupled to rapid sequence evolution

    Directory of Open Access Journals (Sweden)

    Singh Rama S

    2008-08-01

    Full Text Available Abstract Background Much of the morphological diversity in eukaryotes results from differential regulation of gene expression in which transcription factors (TFs play a central role. The nematode Caenorhabditis elegans is an established model organism for the study of the roles of TFs in controlling the spatiotemporal pattern of gene expression. Using the fully sequenced genomes of three Caenorhabditid nematode species as well as genome information from additional more distantly related organisms (fruit fly, mouse, and human we sought to identify orthologous TFs and characterized their patterns of evolution. Results We identified 988 TF genes in C. elegans, and inferred corresponding sets in C. briggsae and C. remanei, containing 995 and 1093 TF genes, respectively. Analysis of the three gene sets revealed 652 3-way reciprocal 'best hit' orthologs (nematode TF set, approximately half of which are zinc finger (ZF-C2H2 and ZF-C4/NHR types and HOX family members. Examination of the TF genes in C. elegans and C. briggsae identified the presence of significant tandem clustering on chromosome V, the majority of which belong to ZF-C4/NHR family. We also found evidence for lineage-specific duplications and rapid evolution of many of the TF genes in the two species. A search of the TFs conserved among nematodes in Drosophila melanogaster, Mus musculus and Homo sapiens revealed 150 reciprocal orthologs, many of which are associated with important biological processes and human diseases. Finally, a comparison of the sequence, gene interactions and function indicates that nematode TFs conserved across phyla exhibit significantly more interactions and are enriched in genes with annotated mutant phenotypes compared to those that lack orthologs in other species. Conclusion Our study represents the first comprehensive genome-wide analysis of TFs across three nematode species and other organisms. The findings indicate substantial conservation of transcription

  2. MPID-T2: a database for sequence-structure-function analyses of pMHC and TR/pMHC structures.

    Science.gov (United States)

    Khan, Javed Mohammed; Cheruku, Harish Reddy; Tong, Joo Chuan; Ranganathan, Shoba

    2011-04-15

    Sequence-structure-function information is critical in understanding the mechanism of pMHC and TR/pMHC binding and recognition. A database for sequence-structure-function information on pMHC and TR/pMHC interactions, MHC-Peptide Interaction Database-TR version 2 (MPID-T2), is now available augmented with the latest PDB and IMGT/3Dstructure-DB data, advanced features and new parameters for the analysis of pMHC and TR/pMHC structures. http://biolinfo.org/mpid-t2. shoba.ranganathan@mq.edu.au Supplementary data are available at Bioinformatics online.

  3. Using Markov chains of nucleotide sequences as a possible precursor to predict functional roles of human genome: a case study on inactive chromatin regions.

    Science.gov (United States)

    Lee, K-E; Lee, E-J; Park, H-S

    2016-08-30

    Recent advances in computational epigenetics have provided new opportunities to evaluate n-gram probabilistic language models. In this paper, we describe a systematic genome-wide approach for predicting functional roles in inactive chromatin regions by using a sequence-based Markovian chromatin map of the human genome. We demonstrate that Markov chains of sequences can be used as a precursor to predict functional roles in heterochromatin regions and provide an example comparing two publicly available chromatin annotations of large-scale epigenomics projects: ENCODE project consortium and Roadmap Epigenomics consortium.

  4. Soy Glycinin Contains a Functional Inhibitory Sequence against Muscle-Atrophy-Associated Ubiquitin Ligase Cbl-b

    Directory of Open Access Journals (Sweden)

    Tomoki Abe

    2013-01-01

    Full Text Available Background. Unloading stress induces skeletal muscle atrophy. We have reported that Cbl-b ubiquitin ligase is a master regulator of unloading-associated muscle atrophy. The present study was designed to elucidate whether dietary soy glycinin protein prevents denervation-mediated muscle atrophy, based on the presence of inhibitory peptides against Cbl-b ubiquitin ligase in soy glycinin protein. Methods. Mice were fed either 20% casein diet, 20% soy protein isolate diet, 10% glycinin diet containing 10% casein, or 20% glycinin diet. One week later, the right sciatic nerve was cut. The wet weight, cross sectional area (CSA, IGF-1 signaling, and atrogene expression in hindlimb muscles were examined at 1, 3, 3.5, or 4 days after denervation. Results. 20% soy glycinin diet significantly prevented denervation-induced decreases in muscle wet weight and myofiber CSA. Furthermore, dietary soy protein inhibited denervation-induced ubiquitination and degradation of IRS-1 in tibialis anterior muscle. Dietary soy glycinin partially suppressed the denervation-mediated expression of atrogenes, such as MAFbx/atrogin-1 and MuRF-1, through the protection of IGF-1 signaling estimated by phosphorylation of Akt-1. Conclusions. Soy glycinin contains a functional inhibitory sequence against muscle-atrophy-associated ubiquitin ligase Cbl-b. Dietary soy glycinin protein significantly prevented muscle atrophy after denervation in mice.

  5. Critical structural and functional roles for the N-terminal insertion sequence in surfactant protein B analogs.

    Directory of Open Access Journals (Sweden)

    Frans J Walther

    2010-01-01

    Full Text Available Surfactant protein B (SP-B; 79 residues belongs to the saposin protein superfamily, and plays functional roles in lung surfactant. The disulfide cross-linked, N- and C-terminal domains of SP-B have been theoretically predicted to fold as charged, amphipathic helices, suggesting their participation in surfactant activities. Earlier structural studies with Mini-B, a disulfide-linked construct based on the N- and C-terminal regions of SP-B (i.e., approximately residues 8-25 and 63-78, confirmed that these neighboring domains are helical; moreover, Mini-B retains critical in vitro and in vivo surfactant functions of the native protein. Here, we perform similar analyses on a Super Mini-B construct that has native SP-B residues (1-7 attached to the N-terminus of Mini-B, to test whether the N-terminal sequence is also involved in surfactant activity.FTIR spectra of Mini-B and Super Mini-B in either lipids or lipid-mimics indicated that these peptides share similar conformations, with primary alpha-helix and secondary beta-sheet and loop-turns. Gel electrophoresis demonstrated that Super Mini-B was dimeric in SDS detergent-polyacrylamide, while Mini-B was monomeric. Surface plasmon resonance (SPR, predictive aggregation algorithms, and molecular dynamics (MD and docking simulations further suggested a preliminary model for dimeric Super Mini-B, in which monomers self-associate to form a dimer peptide with a "saposin-like" fold. Similar to native SP-B, both Mini-B and Super Mini-B exhibit in vitro activity with spread films showing near-zero minimum surface tension during cycling using captive bubble surfactometry. In vivo, Super Mini-B demonstrates oxygenation and dynamic compliance that are greater than Mini-B and compare favorably to full-length SP-B.Super Mini-B shows enhanced surfactant activity, probably due to the self-assembly of monomer peptide into dimer Super Mini-B that mimics the functions and putative structure of native SP-B.

  6. Development of a strategy to functionalize a dextrin-based hydrogel for animal cell cultures using a starch-binding module fused to RGD sequence

    Directory of Open Access Journals (Sweden)

    Gama Miguel

    2008-10-01

    Full Text Available Abstract Background Several approaches can be used to functionalize biomaterials, such as hydrogels, for biomedical applications. One of the molecules often used to improve cells adhesion is the peptide Arg-Gly-Asp (RGD. The RGD sequence, present in several proteins from the extra-cellular matrix (ECM, is a ligand for integrin-mediated cell adhesion; this sequence was recognized as a major functional group responsible for cellular adhesion. In this work a bi-functional recombinant protein, containing a starch binding module (SBM and RGD sequence was used to functionalize a dextrin-based hydrogel. The SBM, which belongs to an α-amylase from Bacillus sp. TS-23, has starch (and dextrin, depolymerized starch affinity, acting as a binding molecule to adsorb the RGD sequence to the hydrogel surface. Results The recombinant proteins SBM and RGD-SBM were cloned, expressed, purified and tested in in vitro assays. The evaluation of cell attachment, spreading and proliferation on the dextrin-based hydrogel surface activated with recombinant proteins were performed using mouse embryo fibroblasts 3T3. A polystyrene cell culture plate was used as control. The results showed that the RGD-SBM recombinant protein improved, by more than 30%, the adhesion of fibroblasts to dextrin-based hydrogel. In fact, cell spreading on the hydrogel surface was observed only in the presence of the RGD-SBM. Conclusion The fusion protein RGD-SBM provides an efficient way to functionalize the dextrin-based hydrogel. Many proteins in nature that hold a RGD sequence are not cell adhesive, probably due to the conformation/accessibility of the peptide. We therefore emphasise the successful expression of a bi-functional protein with potential for different applications.

  7. Deep Illumina-based shotgun sequencing reveals dietary effects on the structure and function of the fecal microbiome of growing kittens.

    Directory of Open Access Journals (Sweden)

    Oliver Deusch

    Full Text Available Previously, we demonstrated that dietary protein:carbohydrate ratio dramatically affects the fecal microbial taxonomic structure of kittens using targeted 16S gene sequencing. The present study, using the same fecal samples, applied deep Illumina shotgun sequencing to identify the diet-associated functional potential and analyze taxonomic changes of the feline fecal microbiome.Fecal samples from kittens fed one of two diets differing in protein and carbohydrate content (high-protein, low-carbohydrate, HPLC; and moderate-protein, moderate-carbohydrate, MPMC were collected at 8, 12 and 16 weeks of age (n = 6 per group. A total of 345.3 gigabases of sequence were generated from 36 samples, with 99.75% of annotated sequences identified as bacterial. At the genus level, 26% and 39% of reads were annotated for HPLC- and MPMC-fed kittens, with HPLC-fed cats showing greater species richness and microbial diversity. Two phyla, ten families and fifteen genera were responsible for more than 80% of the sequences at each taxonomic level for both diet groups, consistent with the previous taxonomic study. Significantly different abundances between diet groups were observed for 324 genera (56% of all genera identified demonstrating widespread diet-induced changes in microbial taxonomic structure. Diversity was not affected over time. Functional analysis identified 2,013 putative enzyme function groups were different (p<0.000007 between the two dietary groups and were associated to 194 pathways, which formed five discrete clusters based on average relative abundance. Of those, ten contained more (p<0.022 enzyme functions with significant diet effects than expected by chance. Six pathways were related to amino acid biosynthesis and metabolism linking changes in dietary protein with functional differences of the gut microbiome.These data indicate that feline feces-derived microbiomes have large structural and functional differences relating to the dietary

  8. Open questions in origin of life : Experimental studies on the origin of nucleic acids and proteins with specific and functional sequences by a chemical synthetic biology approach

    NARCIS (Netherlands)

    Adamala, K.; Anella, F.M.; Wieczorek, R.; Stano, P.; Chiarabelli, C.; Luisi, P.L.

    2014-01-01

    In this mini-review we present some experimental approaches to the important issue in the origin of life, namely the origin of nucleic acids and proteins with specific and functional sequences. The formation of macromolecules on prebiotic Earth faces practical and conceptual difficulties. From the

  9. THE REST-FRAME OPTICAL LUMINOSITY FUNCTION OF CLUSTER GALAXIES AT z < 0.8 AND THE ASSEMBLY OF THE CLUSTER RED SEQUENCE

    International Nuclear Information System (INIS)

    Rudnick, Gregory; Von der Linden, Anja; De Lucia, Gabriella; White, Simon; Pello, Roser; Aragon-Salamanca, Alfonso; Marchesini, Danilo; Clowe, Douglas; Halliday, Claire; Jablonka, Pascale; Milvang-Jensen, Bo; Poggianti, Bianca; Saglia, Roberto; Simard, Luc; Zaritsky, Dennis

    2009-01-01

    We present the rest-frame optical luminosity function (LF) of red-sequence galaxies in 16 clusters at 0.4 < z < 0.8 drawn from the ESO Distant Cluster Survey (EDisCS). We compare our clusters to an analogous sample from the Sloan Digital Sky Survey (SDSS) and match the EDisCS clusters to their most likely descendants. We measure all LFs down to M ∼ M * + (2.5-3.5). At z < 0.8, the bright end of the LF is consistent with passive evolution but there is a significant buildup of the faint end of the red sequence toward lower redshift. There is a weak dependence of the LF on cluster velocity dispersion for EDisCS but no such dependence for the SDSS clusters. We find tentative evidence that red-sequence galaxies brighter than a threshold magnitude are already in place, and that this threshold evolves to fainter magnitudes toward lower redshifts. We compare the EDisCS LFs with the LF of coeval red-sequence galaxies in the field and find that the bright end of the LFs agree. However, relative to the number of bright red galaxies, the field has more faint red galaxies than clusters at 0.6 < z < 0.8 but fewer at 0.4 < z < 0.6, implying differential evolution. We compare the total light in the EDisCS cluster red sequences to the total red-sequence light in our SDSS cluster sample. Clusters at 0.4 < z < 0.8 must increase their luminosity on the red sequence (and therefore stellar mass in red galaxies) by a factor of 1-3 by z = 0. The necessary processes that add mass to the red sequence in clusters predict local clusters that are overluminous as compared to those observed in the SDSS. The predicted cluster luminosities can be reconciled with observed local cluster luminosities by combining multiple previously known effects.

  10. Relationships between functional genes in Lactobacillus delbrueckii ssp. bulgaricus isolates and phenotypic characteristics associated with fermentation time and flavor production in yogurt elucidated using multilocus sequence typing.

    Science.gov (United States)

    Liu, Wenjun; Yu, Jie; Sun, Zhihong; Song, Yuqin; Wang, Xueni; Wang, Hongmei; Wuren, Tuoya; Zha, Musu; Menghe, Bilige; Heping, Zhang

    2016-01-01

    Lactobacillus delbrueckii ssp. bulgaricus (L. bulgaricus) is well known for its worldwide application in yogurt production. Flavor production and acid producing are considered as the most important characteristics for starter culture screening. To our knowledge this is the first study applying functional gene sequence multilocus sequence typing technology to predict the fermentation and flavor-producing characteristics of yogurt-producing bacteria. In the present study, phenotypic characteristics of 35 L. bulgaricus strains were quantified during the fermentation of milk to yogurt and during its subsequent storage; these included fermentation time, acidification rate, pH, titratable acidity, and flavor characteristics (acetaldehyde concentration). Furthermore, multilocus sequence typing analysis of 7 functional genes associated with fermentation time, acid production, and flavor formation was done to elucidate the phylogeny and genetic evolution of the same L. bulgaricus isolates. The results showed that strains significantly differed in fermentation time, acidification rate, and acetaldehyde production. Combining functional gene sequence analysis with phenotypic characteristics demonstrated that groups of strains established using genotype data were consistent with groups identified based on their phenotypic traits. This study has established an efficient and rapid molecular genotyping method to identify strains with good fermentation traits; this has the potential to replace time-consuming conventional methods based on direct measurement of phenotypic traits. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  11. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

    Science.gov (United States)

    Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.

  12. Functional anthology of intrinsic disorder. 2. Cellular components, domains, technical terms, developmental processes, and coding sequence diversities correlated with long disordered regions.

    Science.gov (United States)

    Vucetic, Slobodan; Xie, Hongbo; Iakoucheva, Lilia M; Oldfield, Christopher J; Dunker, A Keith; Obradovic, Zoran; Uversky, Vladimir N

    2007-05-01

    Biologically active proteins without stable ordered structure (i.e., intrinsically disordered proteins) are attracting increased attention. Functional repertoires of ordered and disordered proteins are very different, and the ability to differentiate whether a given function is associated with intrinsic disorder or with a well-folded protein is crucial for modern protein science. However, there is a large gap between the number of proteins experimentally confirmed to be disordered and their actual number in nature. As a result, studies of functional properties of confirmed disordered proteins, while helpful in revealing the functional diversity of protein disorder, provide only a limited view. To overcome this problem, a bioinformatics approach for comprehensive study of functional roles of protein disorder was proposed in the first paper of this series (Xie, H.; Vucetic, S.; Iakoucheva, L. M.; Oldfield, C. J.; Dunker, A. K.; Obradovic, Z.; Uversky, V. N. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J. Proteome Res. 2007, 5, 1882-1898). Applying this novel approach to Swiss-Prot sequences and functional keywords, we found over 238 and 302 keywords to be strongly positively or negatively correlated, respectively, with long intrinsically disordered regions. This paper describes approximately 90 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes, and coding sequence diversities possessing strong positive and negative correlation with long disordered regions.

  13. Functional Anthology of Intrinsic Disorder. II. Cellular Components, Domains, Technical Terms, Developmental Processes and Coding Sequence Diversities Correlated with Long Disordered Regions

    Science.gov (United States)

    Vucetic, Slobodan; Xie, Hongbo; Iakoucheva, Lilia M.; Oldfield, Christopher J.; Dunker, A. Keith; Obradovic, Zoran; Uversky, Vladimir N.

    2008-01-01

    Biologically active proteins without stable ordered structure (i.e., intrinsically disordered proteins) are attracting increased attention. Functional repertoires of ordered and disordered proteins are very different, and the ability to differentiate whether a given function is associated with intrinsic disorder or with a well-folded protein is crucial for modern protein science. However, there is a large gap between the number of proteins experimentally confirmed to be disordered and their actual number in nature. As a result, studies of functional properties of confirmed disordered proteins, while helpful in revealing the functional diversity of protein disorder, provide only a limited view. To overcome this problem, a bioinformatics approach for comprehensive study of functional roles of protein disorder was proposed in the first paper of this series (Xie H., Vucetic S., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. I. Biological processes and functions of proteins with long disordered regions. J. Proteome Res.). Applying this novel approach to Swiss-Prot sequences and functional keywords, we found over 238 and 302 keywords to be strongly positively or negatively correlated, respectively, with long intrinsically disordered regions. This paper describes ~90 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes and coding sequence diversities possessing strong positive and negative correlation with long disordered regions. PMID:17391015

  14. Utility of sequenced genomes for microsatellite marker development in non-model organisms: a case study of functionally important genes in nine-spined sticklebacks (Pungitius pungitius

    Directory of Open Access Journals (Sweden)

    Shimada Yukinori

    2010-05-01

    Full Text Available Abstract Background Identification of genes involved in adaptation and speciation by targeting specific genes of interest has become a plausible strategy also for non-model organisms. We investigated the potential utility of available sequenced fish genomes to develop microsatellite (cf. simple sequence repeat, SSR markers for functionally important genes in nine-spined sticklebacks (Pungitius pungitius, as well as cross-species transferability of SSR primers from three-spined (Gasterosteus aculeatus to nine-spined sticklebacks. In addition, we examined the patterns and degree of SSR conservation between these species using their aligned sequences. Results Cross-species amplification success was lower for SSR markers located in or around functionally important genes (27 out of 158 than for those randomly derived from genomic (35 out of 101 and cDNA (35 out of 87 libraries. Polymorphism was observed at a large proportion (65% of the cross-amplified loci independently of SSR type. To develop SSR markers for functionally important genes in nine-spined sticklebacks, SSR locations were surveyed in or around 67 target genes based on the three-spined stickleback genome and these regions were sequenced with primers designed from conserved sequences in sequenced fish genomes. Out of the 81 SSRs identified in the sequenced regions (44,084 bp, 57 exhibited the same motifs at the same locations as in the three-spined stickleback. Di- and trinucleotide SSRs appeared to be highly conserved whereas mononucleotide SSRs were less so. Species-specific primers were designed to amplify 58 SSRs using the sequences of nine-spined sticklebacks. Conclusions Our results demonstrated that a large proportion of SSRs are conserved in the species that have diverged more than 10 million years ago. Therefore, the three-spined stickleback genome can be used to predict SSR locations in the nine-spined stickleback genome. While cross-species utility of SSR primers is limited due

  15. The carbohydrate-binding module (CBM)-like sequence is crucial for rice CWA1/BC1 function in proper assembly of secondary cell wall materials.

    Science.gov (United States)

    Sato, Kanna; Ito, Sachiko; Fujii, Takeo; Suzuki, Ryu; Takenouchi, Sachi; Nakaba, Satoshi; Funada, Ryo; Sano, Yuzou; Kajita, Shinya; Kitano, Hidemi; Katayama, Yoshihiro

    2010-11-01

    We recently reported that the cwa1 mutation disturbed the deposition and assembly of secondary cell wall materials in the cortical fiber of rice internodes. Genetic analysis revealed that cwa1 is allelic to bc1, which encodes glycosylphosphatidylinositol (GPI)-anchored COBRA-like protein with the highest homology to Arabidopsis COBRA-like 4 (COBL4) and maize Brittle Stalk 2 (Bk2). Our results suggested that CWA1/BC1 plays a role in assembling secondary cell wall materials at appropriate sites, enabling synthesis of highly ordered secondary cell wall structure with solid and flexible internodes in rice. The N-terminal amino acid sequence of CWA1/BC1, as well as its orthologs (COBL4, Bk2) and other BC1-like proteins in rice, shows weak similarity to a family II carbohydrate-binding module (CBM2) of several bacterial cellulases. To investigate the importance of the CBM-like sequence of CWA1/BC1 in the assembly of secondary cell wall materials, Trp residues in the CBM-like sequence, which is important for carbohydrate binding, were substituted for Val residues and introduced into the cwa1 mutant. CWA1/BC1 with the mutated sequence did not complement the abnormal secondary cell walls seen in the cwa1 mutant, indicating that the CBM-like sequence is essential for the proper function of CWA1/BC1, including assembly of secondary cell wall materials.

  16. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  17. An abundance of rare functional variants in 202 drug target genes sequenced in 14.002 people

    DEFF Research Database (Denmark)

    Nelson, Matthew R.; Wegmann, Daniel; Ehm, Margaret G.

    2012-01-01

    Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases)...

  18. An integrated native mass spectrometry and top-down proteomics method that connects sequence to structure and function of macromolecular complexes

    Science.gov (United States)

    Li, Huilin; Nguyen, Hong Hanh; Ogorzalek Loo, Rachel R.; Campuzano, Iain D. G.; Loo, Joseph A.

    2018-02-01

    Mass spectrometry (MS) has become a crucial technique for the analysis of protein complexes. Native MS has traditionally examined protein subunit arrangements, while proteomics MS has focused on sequence identification. These two techniques are usually performed separately without taking advantage of the synergies between them. Here we describe the development of an integrated native MS and top-down proteomics method using Fourier-transform ion cyclotron resonance (FTICR) to analyse macromolecular protein complexes in a single experiment. We address previous concerns of employing FTICR MS to measure large macromolecular complexes by demonstrating the detection of complexes up to 1.8 MDa, and we demonstrate the efficacy of this technique for direct acquirement of sequence to higher-order structural information with several large complexes. We then summarize the unique functionalities of different activation/dissociation techniques. The platform expands the ability of MS to integrate proteomics and structural biology to provide insights into protein structure, function and regulation.

  19. Direct repeat sequences are essential for function of the cis-acting locus of transfer (clt) of Streptomyces phaeochromogenes plasmid pJV1.

    Science.gov (United States)

    Franco, Bernardo; González-Cerón, Gabriela; Servín-González, Luis

    2003-11-01

    The functionality of direct and inverted repeat sequences inside the cis acting locus of transfer (clt) of the Streptomyces plasmid pJV1 was determined by testing the effect of different deletions on plasmid transfer. The results show that the single most important element for pJV1 clt function is a series of evenly spaced 9 bp long direct repeats which match the consensus CCGCACA(C/G)(C/G), since their deletion caused a dramatic reduction in plasmid transfer. The presence of these repeats in the absence of any other clt sequences allowed plasmid transfer to occur at a frequency that was at least two orders of magnitude higher than that obtained in the complete absence of clt. A database search revealed regions with a similar organization, and in the same position, in Streptomyces plasmids pSN22 and pSLS, which have transfer proteins homologous to those of pJV1.

  20. Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly.

    Science.gov (United States)

    Bai, Yongsheng; Kinne, Jeff; Ding, Lizhong; Rath, Ethan C; Cox, Aaron; Naidu, Siva Dharman

    2017-10-03

    It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation sequencing technologies have revolutionized the field. The "Read-Split-Walk" (RSW) and "Read-Split-Run" (RSR) methods were developed to identify genome-wide non-canonical spliced regions including special events occurring in cytoplasm. As the significant amount of genome/transcriptome data such as, Encyclopedia of DNA Elements (ENCODE) project, have been generated, we have advanced a newer more memory-efficient version of the algorithm, "Read-Split-Fly" (RSF), which can detect non-canonical spliced regions with higher sensitivity and improved speed. The RSF algorithm also outputs the spliced sequences for further downstream biological function analysis. We used open access ENCODE project RNA-Seq data to search spliced intron sequences against the U12-type spliced intron sequence database to examine whether some events could occur as potential signatures of U12-type splicing. The check was performed by searching spliced sequences against 5'ss and 3'ss sequences from the well-known orthologous U12-type spliceosomal intron database U12DB. Preliminary results of searching 70 ENCODE samples indicated that the presence of 5'ss with U12-type signature is more frequent than U2-type and prevalent in non-canonical junctions reported by RSF. The selected spliced sequences have also been further studied using miRBase to elucidate their functionality. Preliminary results from 70 samples of ENCODE datasets show that several miRNAs are prevalent in studied ENCODE samples. Two of these are associated with many diseases as suggested in the literature. Specifically, hsa-miR-1273 and hsa-miR-548 are associated with many diseases and cancers. Our RSF pipeline is able to detect many possible junctions

  1. A trans-activator function is generated by integration of hepatitis B virus preS/S sequences in human hepatocellular carcinoma DNA

    International Nuclear Information System (INIS)

    Caselmann, W.H.; Meyer, M.; Kekule, A.S.; Lauer, U.; Hofschneider, P.H.; Koshy, R.

    1990-01-01

    The X gene of wild-type hepatitis B virus or integrated DNA has recently been shown to stimulate transcription of a variety of enhancers and promoters. To further delineate the viral sequences responsible for trans-activation in hepatomas, the authors cloned the single hepatitis B virus insert from human hepatocellular carcinoma DNA M1. The plasmid pM1 contains 2004 base of hepatitis B virus DNA subtype adr, including truncated preS/S sequences and the enhancer element. The X promoter and 422 nucleotides of the X coding region are present. The entire preC/C gene is deleted. In transient cotransfection assays using Chang liver cells (CCL 13), pM1 DNA exerts a 6- to 10-fold trans-activating effect on the expression of the pSV2CAT reporter plasmid. The transactivation occurs by stimulation of transcription and is dependent on the simian virus 40 enhancer in the reporter plasmid. Deletion analysis of pM1 subclones reveals that the transactivator is encoded by preS/S and not by X sequences. A frameshift mutation within the preS2 open reading frame shows that this portion is indispensable for the trans-activating function. Initiation of transcription has been mapped to the S1 promoter. A comparable trans-activating effect is also observed with cloned wild-type hepatitis B virus sequences similarly truncated. These results show that a transcriptional trans-activator function not present in the intact gene is generated by 3' truncation of integrated hepatitis B virus DNA preS/S sequences

  2. Quantitative assessment of hepatic function: modified look-locker inversion recovery (MOLLI) sequence for T1 mapping on Gd-EOB-DTPA-enhanced liver MR imaging

    Energy Technology Data Exchange (ETDEWEB)

    Yoon, Jeong Hee [Seoul National University Hospital, Department of Radiology, Seoul (Korea, Republic of); Lee, Jeong Min; Han, Joon Koo; Choi, Byung Ihn [Seoul National University Hospital, Department of Radiology, Seoul (Korea, Republic of); Seoul National University College of Medicine, Institute of Radiation Medicine, Jongno-gu, Seoul (Korea, Republic of); Paek, Munyoung [Siemens Healthcare, Seoul (Korea, Republic of)

    2016-06-15

    To determine whether multislice T1 mapping of the liver using a modified look-locker inversion recovery (MOLLI) sequence on gadoxetic acid-enhanced magnetic resonance imaging (MRI) can be used as a quantitative tool to estimate liver function and predict the presence of oesophageal or gastric varices. Phantoms filled with gadoxetic acid were scanned three times using MOLLI sequence to test repeatability. Patients with chronic liver disease or liver cirrhosis who underwent gadoxetic acid-enhanced liver MRI including MOLLI sequence at 3 T were included (n = 343). Pre- and postcontrast T1 relaxation times of the liver (T1liver), changes between pre- and postcontrast T1liver (ΔT1liver), and adjusted postcontrast T1liver (postcontrast T1liver-T1spleen/T1spleen) were compared among Child-Pugh classes. In 62 patients who underwent endoscopy, all T1 parameters and spleen sizes were correlated with varices. Phantom study showed excellent repeatability of MOLLI sequence. As Child-Pugh scores increased, pre- and postcontrast T1liver were significantly prolonged (P < 0.001), and ΔT1liver and adjusted postcontrast T1liver decreased (P< 0.001). Adjusted postcontrast T1liver and spleen size were independently associated with varices (R{sup 2} = 0.29, P < 0.001). T1 mapping of the liver using MOLLI sequence on gadoxetic acid-enhanced MRI demonstrated potential in quantitatively estimating liver function, and adjusted postcontrast T1liver was significantly associated with varices. (orig.)

  3. Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data.

    Science.gov (United States)

    He, Zihuai; Xu, Bin; Lee, Seunggeun; Ionita-Laza, Iuliana

    2017-09-07

    Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  4. Prehospital rapid sequence intubation improves functional outcome for patients with severe traumatic brain injury: a randomized controlled trial.

    Science.gov (United States)

    Bernard, Stephen A; Nguyen, Vina; Cameron, Peter; Masci, Kevin; Fitzgerald, Mark; Cooper, David J; Walker, Tony; Std, B Paramed; Myles, Paul; Murray, Lynne; David; Taylor; Smith, Karen; Patrick, Ian; Edington, John; Bacon, Andrew; Rosenfeld, Jeffrey V; Judson, Rodney

    2010-12-01

    To determine whether paramedic rapid sequence intubation in patients with severe traumatic brain injury (TBI) improves neurologic outcomes at 6 months compared with intubation in the hospital. Severe TBI is associated with a high rate of mortality and long-term morbidity. Comatose patients with TBI routinely undergo endo-tracheal intubation to protect the airway, prevent hypoxia, and control ventilation. In many places, paramedics perform intubation prior to hospital arrival. However, it is unknown whether this approach improves outcomes. In a prospective, randomized, controlled trial, we assigned adults with severe TBI in an urban setting to either prehospital rapid sequence intubation by paramedics or transport to a hospital emergency department for intubation by physicians. The primary outcome measure was the median extended Glasgow Outcome Scale (GOSe) score at 6 months. Secondary end-points were favorable versus unfavorable outcome at 6 months, length of intensive care and hospital stay, and survival to hospital discharge. A total of 312 patients with severe TBI were randomly assigned to paramedic rapid sequence intubation or hospital intubation. The success rate for paramedic intubation was 97%. At 6 months, the median GOSe score was 5 (interquartile range, 1-6) in patients intubated by paramedics compared with 3 (interquartile range, 1-6) in the patients intubated at hospital (P = 0.28).The proportion of patients with favorable outcome (GOSe, 5-8) was 80 of 157 patients (51%) in the paramedic intubation group compared with 56 of 142 patients (39%) in the hospital intubation group (risk ratio, 1.28; 95% confidence interval, 1.00-1.64; P = 0.046). There were no differences in intensive care or hospital length of stay, or in survival to hospital discharge. In adults with severe TBI, prehospital rapid sequence intubation by paramedics increases the rate of favorable neurologic outcome at 6 months compared with intubation in the hospital.

  5. Comparative sensitivities of functional MRI sequences in detection of local recurrence of prostate carcinoma after radical prostatectomy or external-beam radiotherapy.

    Science.gov (United States)

    Roy, Catherine; Foudi, Fatah; Charton, Jeanne; Jung, Michel; Lang, Hervé; Saussine, Christian; Jacqmin, Didier

    2013-04-01

    The aim of this retrospective study was to determine the respective accuracies of three types of functional MRI sequences-diffusion-weighted imaging (DWI), dynamic contrast-enhanced (DCE) MRI, and 3D (1)H-MR spectroscopy (MRS)-in the depiction of local prostate cancer recurrence after two different initial therapy options. From a cohort of 83 patients with suspicion of local recurrence based on prostate-specific antigen (PSA) kinetics who were imaged on a 3-T MRI unit using an identical protocol including the three functional sequences with an endorectal coil, we selected 60 patients (group A, 28 patients who underwent radical prostatectomy; group B, 32 patients who underwent external-beam radiation) who had local recurrence ascertained on the basis of a transrectal ultrasound-guided biopsy results and a reduction in PSA level after salvage therapy. All patients presented with a local relapse. Sensitivity with T2-weighted MRI and 3D (1)H-MRS sequences was 57% and 53%, respectively, for group A and 71% and 78%, respectively, for group B. DCE-MRI alone showed a sensitivity of 100% and 96%, respectively, for groups A and B. DWI alone had a higher sensitivity for group B (96%) than for group A (71%). The combination of T2-weighted imaging plus DWI plus DCE-MRI provided a sensitivity as high as 100% in group B. The performance of functional imaging sequences for detecting recurrence is different after radical prostatectomy and external-beam radiotherapy. DCE-MRI is a valid and efficient tool to detect prostate cancer recurrence in radical prostatectomy as well as in external-beam radiotherapy. The combination of DCE-MRI and DWI is highly efficient after radiation therapy. Three-dimensional (1)H-MRS needs to be improved. Even though it is not accurate enough, T2-weighted imaging remains essential for the morphologic analysis of the area.

  6. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

    Science.gov (United States)

    Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

    2008-09-01

    A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.

  7. IL-4 function can be transferred to the IL-2 receptor by tyrosine containing sequences found in the IL-4 receptor alpha chain.

    Science.gov (United States)

    Wang, H Y; Paul, W E; Keegan, A D

    1996-02-01

    IL-4 binds to a cell surface receptor complex that consists of the IL-4 binding protein (IL-4R alpha) and the gamma chain of the IL-2 receptor complex (gamma c). The receptors for IL-4 and IL-2 have several features in common; both use the gamma c as a receptor component, and both activate the Janus kinases JAK-1 and JAK-3. In spite of these similarities, IL-4 evokes specific responses, including the tyrosine phosphorylation of 4PS/IRS-2 and the induction of CD23. To determine whether sequences within the cytoplasmic domain of the IL-4R alpha specify these IL-4-specific responses, we transplanted the insulin IL-4 receptor motif (I4R motif) of the huIL-4R alpha to the cytoplasmic domain of a truncated IL-2R beta. In addition, we transplanted a region that contains peptide sequences shown to block Stat6 binding to DNA. We analyzed the ability of cells expressing these IL-2R-IL-4R chimeric constructs to respond to IL-2. We found that IL-4 function could be transplanted to the IL-2 receptor by these regions and that proliferative and differentiative functions can be induced by different receptor sequences.

  8. Left Ventricular Function Evaluation on a 3T MR Scanner with Parallel RF Transmission Technique: Prospective Comparison of Cine Sequences Acquired before and after Gadolinium Injection.

    Science.gov (United States)

    Caspar, Thibault; Schultz, Anthony; Schaeffer, Mickaël; Labani, Aïssam; Jeung, Mi-Young; Jurgens, Paul Thomas; El Ghannudi, Soraya; Roy, Catherine; Ohana, Mickaël

    To compare cine MR b-TFE sequences acquired before and after gadolinium injection, on a 3T scanner with a parallel RF transmission technique in order to potentially improve scanning time efficiency when evaluating LV function. 25 consecutive patients scheduled for a cardiac MRI were prospectively included and had their b-TFE cine sequences acquired before and right after gadobutrol injection. Images were assessed qualitatively (overall image quality, LV edge sharpness, artifacts and LV wall motion) and quantitatively with measurement of LVEF, LV mass, and telediastolic volume and contrast-to-noise ratio (CNR) between the myocardium and the cardiac chamber. Statistical analysis was conducted using a Bayesian paradigm. No difference was found before or after injection for the LVEF, LV mass and telediastolic volume evaluations. Overall image quality and CNR were significantly lower after injection (estimated coefficient cine after > cine before gadolinium: -1.75 CI = [-3.78;-0.0305], prob(coef>0) = 0% and -0.23 CI = [-0.49;0.04], prob(coef>0) = 4%) respectively), but this decrease did not affect the visual assessment of LV wall motion (cine after > cine before gadolinium: -1.46 CI = [-4.72;1.13], prob(coef>0) = 15%). In 3T cardiac MRI acquired with parallel RF transmission technique, qualitative and quantitative assessment of LV function can reliably be performed with cine sequences acquired after gadolinium injection, despite a significant decrease in the CNR and the overall image quality.

  9. Comparative analysis of taxonomic, functional, and metabolic patterns of microbiomes from 14 full-scale biogas reactors by metagenomic sequencing and radioisotopic analysis.

    Science.gov (United States)

    Luo, Gang; Fotidis, Ioannis A; Angelidaki, Irini

    2016-01-01

    Biogas production is a very complex process due to the high complexity in diversity and interactions of the microorganisms mediating it, and only limited and diffuse knowledge exists about the variation of taxonomic and functional patterns of microbiomes across different biogas reactors, and their relationships with the metabolic patterns. The present study used metagenomic sequencing and radioisotopic analysis to assess the taxonomic, functional, and metabolic patterns of microbiomes from 14 full-scale biogas reactors operated under various conditions treating either sludge or manure. The results from metagenomic analysis showed that the dominant methanogenic pathway revealed by radioisotopic analysis was not always correlated with the taxonomic and functional compositions. It was found by radioisotopic experiments that the aceticlastic methanogenic pathway was dominant, while metagenomics analysis showed higher relative abundance of hydrogenotrophic methanogens. Principal coordinates analysis showed the sludge-based samples were clearly distinct from the manure-based samples for both taxonomic and functional patterns, and canonical correspondence analysis showed that the both temperature and free ammonia were crucial environmental variables shaping the taxonomic and functional patterns. The study further the overall patterns of functional genes were strongly correlated with overall patterns of taxonomic composition across different biogas reactors. The discrepancy between the metabolic patterns determined by metagenomic analysis and metabolic pathways determined by radioisotopic analysis was found. Besides, a clear correlation between taxonomic and functional patterns was demonstrated for biogas reactors, and also the environmental factors that shaping both taxonomic and functional genes patterns were identified.

  10. Sequencing and functional analysis of the nifENXorf1orf2 gene cluster of Herbaspirillum seropedicae.

    Science.gov (United States)

    Klassen, G; Pedrosa, F O; Souza, E M; Yates, M G; Rigo, L U

    1999-12-01

    A 5.1-kb DNA fragment from the nifHDK region of H. seropedicae was isolated and sequenced. Sequence analysis showed the presence of nifENXorf1orf2 but nifTY were not present. No nif or consensus promoter was identified. Furthermore, orf1 expression occurred only under nitrogen-fixing conditions and no promoter activity was detected between nifK and nifE, suggesting that these genes are expressed from the upstream nifH promoter and are parts of a unique nif operon. Mutagenesis studies indicate that nifN was essential for nitrogenase activity whereas nifXorf1orf2 were not. High homology between the C-terminal region of the NifX and NifB proteins from H. seropedicae was observed. Since the NifX and NifY proteins are important for FeMo cofactor (FeMoco) synthesis, we propose that alternative proteins with similar activities exist in H. seropedicae.

  11. Analysis and functional annotation of expressed sequence tags (ESTs from multiple tissues of oil palm (Elaeis guineensis Jacq.

    Directory of Open Access Journals (Sweden)

    Lee Weng-Wah

    2007-10-01

    Full Text Available Abstract Background Oil palm is the second largest source of edible oil which contributes to approximately 20% of the world's production of oils and fats. In order to understand the molecular biology involved in in vitro propagation, flowering, efficient utilization of nitrogen sources and root diseases, we have initiated an expressed sequence tag (EST analysis on oil palm. Results In this study, six cDNA libraries from oil palm zygotic embryos, suspension cells, shoot apical meristems, young flowers, mature flowers and roots, were constructed. We have generated a total of 14537 expressed sequence tags (ESTs from these libraries, from which 6464 tentative unique contigs (TUCs and 2129 singletons were obtained. Approximately 6008 of these tentative unique genes (TUGs have significant matches to the non-redundant protein database, from which 2361 were assigned to one or more Gene Ontology categories. Predominant transcripts and differentially expressed genes were identified in multiple oil palm tissues. Homologues of genes involved in many aspects of flower development were also identified among the EST collection, such as CONSTANS-like, AGAMOUS-like (AGL2, AGL20, LFY-like, SQUAMOSA, SQUAMOSA binding protein (SBP etc. Majority of them are the first representatives in oil palm, providing opportunities to explore the cause of epigenetic homeotic flowering abnormality in oil palm, given the importance of flowering in fruit production. The transcript levels of two flowering-related genes, EgSBP and EgSEP were analysed in the flower tissues of various developmental stages. Gene homologues for enzymes involved in oil biosynthesis, utilization of nitrogen sources, and scavenging of oxygen radicals, were also uncovered among the oil palm ESTs. Conclusion The EST sequences generated will allow comparative genomic studies between oil palm and other monocotyledonous and dicotyledonous plants, development of gene-targeted markers for the reference genetic map

  12. Renal sequence and functional series scintigraphy with o131I-hippuric acid in patients with diabetes mellitus

    International Nuclear Information System (INIS)

    Kempken, K.; Heidenreich, P.; Langhammer, H.; Bottermann, P.; Pabst, H.W.; Technische Univ. Muenchen

    1974-01-01

    The appearance of disturbances of the renal function in diabetes mellitus is a function of the quality of the therapy and control of the patient, the duration of the disease, and the age of manifestation as well as other factors such as the lipide metabolism and unspecific infections. Such infections often lead to pyelonephritis which may be regarded as a real complication and in particular so in connection with the late diabetic syndrome (Lundbaek), i.e. especially in diabetics of advanced age. Apart from the true diabetic nephropathia, gromerulosclerosis, which is more frequently found in younger patients, arterio-arteriolosclerosis of the kidneys and tubular atrophies due to interstitial deposits of proteins and glycogens should also be mentioned. An assessment of the renal function in all stages of diabetes mellitus and in hypertonicity was carried out with the aid of renal sequential and functional series scintigraphy. No similar investigations have been reported in the relevant literature. (orig./AK) [de

  13. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences

    OpenAIRE

    Huerta-Cepas, J.; Szklarczyk, D.; Forslund, K.; Cook, H.; Heller, D.; Walter, M.C.; Rattei, T.; Mende, D.R.; Sunagawa, S.; Kuhn, M.; Jensen, L.J.; von Mering, C.; Bork, P.

    2016-01-01

    eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing...

  14. Functional comparison of the nematode Hox gene lin-39 in C. elegans and P. pacificus reveals evolutionary conservation of protein function despite divergence of primary sequences.

    Science.gov (United States)

    Grandien, K; Sommer, R J

    2001-08-15

    Hox transcription factors have been implicated in playing a central role in the evolution of animal morphology. Many studies indicate the evolutionary importance of regulatory changes in Hox genes, but little is known about the role of functional changes in Hox proteins. In the nematodes Pristionchus pacificus and Caenorhabditis elegans, developmental processes can be compared at the cellular, genetic, and molecular levels and differences in gene function can be identified. The Hox gene lin-39 is involved in the regulation of nematode vulva development. Comparison of known lin-39 mutations in P. pacificus and C. elegans revealed both conservation and changes of gene function. Here, we study evolutionary changes of lin-39 function using hybrid transgenes and site-directed mutagenesis in an in vivo assay using C. elegans lin-39 mutants. Our data show that despite the functional differences of LIN-39 between the two species, Ppa-LIN-39, when driven by Cel-lin-39 regulatory elements, can functionally replace Cel-lin-39. Furthermore, we show that the MAPK docking and phosphorylation motifs unique for Cel-LIN-39 are dispensable for Cel-lin-39 function. Therefore, the evolution of lin-39 function is driven by changes in regulatory elements rather than changes in the protein itself.

  15. [Topographic mapping of retinal function with a scanning laser ophthalmoscope and multifocal electroretinography using short M-sequences].

    Science.gov (United States)

    Rudolph, G; Bechmann, M; Berninger, T; Kutschbach, E; Held, U; Tornow, R P; Kalpadakis, P; Zol'nikova, I V; Shamshinova, A M

    2001-01-01

    A new method of multifocal electroretinography making use of scanning laser ophthalmoscope with a wavelength of 630 nm (SLO-m-ERG), evoking short spatial visual stimuli on the retina, is proposed. Algorithm of presenting the visual stimuli and analysis of distribution of local electroretinograms on the surface of the retina is based on short m-sequences. Mathematical cross correlation analysis shows a three-dimensional distribution of bioelectrical activity of the retina in the central visual field. In normal subjects the cone bioelectrical activity is the maximum in the macular area (corresponding to the density of cone distribution) and absent in the blind spot. The method detects the slightest pathological changes in the retina under control of the site of stimulation and ophthalmoscopic picture of the fundus oculi. The site of the pathological process correlates with the topography of changes in bioelectrical activity of the examined retinal area in diseases of the macular area and pigmented retinitis detectable by ophthalmoscopy.

  16. Molecular C dynamics downstream: the biochemical decomposition sequence and its impact on soil organic matter structure and function.

    Science.gov (United States)

    Grandy, A Stuart; Neff, Jason C

    2008-10-15

    Advances in spectroscopic and other chemical methods have greatly enhanced our ability to characterize soil organic matter chemistry. As a result, the molecular characteristics of soil C are now known for a range of ecosystems, soil types, and management intensities. Placing this knowledge into a broader ecological and management context is difficult, however, and remains one of the fundamental challenges of soil organic matter research. Here we present a conceptual model of molecular soil C dynamics to stimulate inter-disciplinary research into the ecological implications of molecular C turnover and its management- and process-level controls. Our model describes three properties of soil C dynamics: 1) soil size fractions have unique molecular patterns that reflect varying degrees of biological and physical control over decomposition; 2) there is a common decomposition sequence independent of plant inputs or other ecosystem properties; and 3) molecular decomposition sequences, although consistent, are not uniform and can be altered by processes that accelerate or slow the microbial transformation of specific molecules. The consequences of this model include several key points. First, lignin presents a constraint to decomposition of plant litter and particulate C (>53 microm) but exerts little influence on more stable mineral-associated soil fractions stabilized onto mineral fractions has a distinct composition related more to microbially processed organic matter than to plant-related compounds. Third, disturbances, such as N fertilization and tillage, which alter decomposition rates, can have "downstream effects"; that is, a disturbance that directly alters the molecular dynamics of particulate C may have a series of indirect effects on C stabilization in silt and clay fractions.

  17. IG and TR single chain fragment variable (scFv) sequence analysis: a new advanced functionality of IMGT/V-QUEST and IMGT/HighV-QUEST.

    Science.gov (United States)

    Giudicelli, Véronique; Duroux, Patrice; Kossida, Sofia; Lefranc, Marie-Paule

    2017-06-26

    IMGT®, the international ImMunoGeneTics information system® ( http://www.imgt.org ), was created in 1989 in Montpellier, France (CNRS and Montpellier University) to manage the huge and complex diversity of the antigen receptors, and is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. Immunoglobulins (IG) or antibodies and T cell receptors (TR) are managed and described in the IMGT® databases and tools at the level of receptor, chain and domain. The analysis of the IG and TR variable (V) domain rearranged nucleotide sequences is performed by IMGT/V-QUEST (online since 1997, 50 sequences per batch) and, for next generation sequencing (NGS), by IMGT/HighV-QUEST, the high throughput version of IMGT/V-QUEST (portal begun in 2010, 500,000 sequences per batch). In vitro combinatorial libraries of engineered antibody single chain Fragment variable (scFv) which mimic the in vivo natural diversity of the immune adaptive responses are extensively screened for the discovery of novel antigen binding specificities. However the analysis of NGS full length scFv (~850 bp) represents a challenge as they contain two V domains connected by a linker and there is no tool for the analysis of two V domains in a single chain. The functionality "Analyis of single chain Fragment variable (scFv)" has been implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST for the analysis of the two V domains of IG and TR scFv. It proceeds in five steps: search for a first closest V-REGION, full characterization of the first V-(D)-J-REGION, then search for a second V-REGION and full characterization of the second V-(D)-J-REGION, and finally linker delimitation. For each sequence or NGS read, positions of the 5'V-DOMAIN, linker and 3'V-DOMAIN in the scFv are provided in the 'V-orientated' sense. Each V-DOMAIN is fully characterized (gene identification, sequence description, junction analysis, characterization of mutations and amino

  18. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    Directory of Open Access Journals (Sweden)

    Charles Richard Bradshaw

    Full Text Available Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10, a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in

  19. Functional comparison of the nematode Hox gene lin-39 in C. elegans and P. pacificus reveals evolutionary conservation of protein function despite divergence of primary sequences

    OpenAIRE

    Grandien, Kaj; Sommer, Ralf J.

    2001-01-01

    Hox transcription factors have been implicated in playing a central role in the evolution of animal morphology. Many studies indicate the evolutionary importance of regulatory changes in Hox genes, but little is known about the role of functional changes in Hox proteins. In the nematodes Pristionchus pacificus and Caenorhabditis elegans, developmental processes can be compared at the cellular, genetic, and molecular levels and differences in gene function can be identified. The Hox gene lin-3...

  20. Whole Exome Re-Sequencing Implicates CCDC38 and Cilia Structure and Function in Resistance to Smoking Related Airflow Obstruction

    NARCIS (Netherlands)

    Wain, Louise V.; Sayers, Ian; Artigas, Maria Soler; Portelli, Michael A.; Zeggini, Eleftheria; Obeidat, Ma'en; Sin, Don D.; Bosse, Yohan; Nickle, David; Brandsma, Corry-Anke; Malarstig, Anders; Vangjeli, Ciara; Jelinsky, Scott A.; John, Sally; Kilty, Iain; McKeever, Tricia; Shrine, Nick R. G.; Cook, James P.; Patel, Shrina; Spector, Tim D.; Hollox, Edward J.; Hall, Ian P.; Tobin, Martin D.

    Chronic obstructive pulmonary disease (COPD) is a leading cause of global morbidity and mortality and, whilst smoking remains the single most important risk factor, COPD risk is heritable. Of 26 independent genomic regions showing association with lung function in genome-wide association studies,

  1. Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity

    Science.gov (United States)

    Li, Chang-Lin; Li, Kai-Cheng; Wu, Dan; Chen, Yan; Luo, Hao; Zhao, Jing-Rong; Wang, Sa-Shuang; Sun, Ming-Ming; Lu, Ying-Jin; Zhong, Yan-Qing; Hu, Xu-Ye; Hou, Rui; Zhou, Bei-Bei; Bao, Lan; Xiao, Hua-Sheng; Zhang, Xu

    2016-01-01

    Sensory neurons are distinguished by distinct signaling networks and receptive characteristics. Thus, sensory neuron types can be defined by linking transcriptome-based neuron typing with the sensory phenotypes. Here we classify somatosensory neurons of the mouse dorsal root ganglion (DRG) by high-coverage single-cell RNA-sequencing (10 950 ± 1 218 genes per neuron) and neuron size-based hierarchical clustering. Moreover, single DRG neurons responding to cutaneous stimuli are recorded using an in vivo whole-cell patch clamp technique and classified by neuron-type genetic markers. Small diameter DRG neurons are classified into one type of low-threshold mechanoreceptor and five types of mechanoheat nociceptors (MHNs). Each of the MHN types is further categorized into two subtypes. Large DRG neurons are categorized into four types, including neurexophilin 1-expressing MHNs and mechanical nociceptors (MNs) expressing BAI1-associated protein 2-like 1 (Baiap2l1). Mechanoreceptors expressing trafficking protein particle complex 3-like and Baiap2l1-marked MNs are subdivided into two subtypes each. These results provide a new system for cataloging somatosensory neurons and their transcriptome databases. PMID:26691752

  2. Functional role of bacteriophage transfer RNAs: codon usage analysis of genomic sequences stored in the GENBANK/EMBL/DDBJ databases

    Directory of Open Access Journals (Sweden)

    T Kunisawa

    2006-01-01

    Full Text Available Complete genomic sequence data are stored in the public GenBank/EMBL/DDBJ databases so that any investigator can make use of the data. This report describes a comparative analysis of codon usage that is impossible without such a public and open data system. A limited number of bacteriophages harbor their own transfer RNAs. Based on a comparison between T4 phage-encoded tRNA species and the relative cellular amounts of host Escherichia coli tRNAs, it is hypothesized that T4 tRNAs could serve to supplement host isoacceptor tRNA species that are present in minor amounts and thus enhance the translational efficiency of phage proteins. When compared to their respective host bacteria, the codon usage data of bacteriophages D3, φC31, HP1, D29 and 933W all show an increased frequency of synonymous codons or amino acids that correspond to phage tRNA species, suggesting their supplemental role in the efficient production of phage proteins. The data-analysis presents an example in which the availability of an open and fully accessible database system would allow one to obtain comprehensive insights into a fundamental problem in molecular biology.

  3. Cascade detection for the extraction of localized sequence features; specificity results for HIV-1 protease and structure-function results for the Schellman loop.

    Science.gov (United States)

    Newell, Nicholas E

    2011-12-15

    The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened. Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources. Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home. nacnewell@comcast.net Supplementary information is available at Bioinformatics online.

  4. Strand bias in complementary single-nucleotide polymorphisms of transcribed human sequences: evidence for functional effects of synonymous polymorphisms

    Directory of Open Access Journals (Sweden)

    Majewski Jacek

    2006-08-01

    Full Text Available Abstract Background Complementary single-nucleotide polymorphisms (SNPs may not be distributed equally between two DNA strands if the strands are functionally distinct, such as in transcribed genes. In introns, an excess of A↔G over the complementary C↔T substitutions had previously been found and attributed to transcription-coupled repair (TCR, demonstrating the valuable functional clues that can be obtained by studying such asymmetry. Here we studied asymmetry of human synonymous SNPs (sSNPs in the fourfold degenerate (FFD sites as compared to intronic SNPs (iSNPs. Results The identities of the ancestral bases and the direction of mutations were inferred from human-chimpanzee genomic alignment. After correction for background nucleotide composition, excess of A→G over the complementary T→C polymorphisms, which was observed previously and can be explained by TCR, was confirmed in FFD SNPs and iSNPs. However, when SNPs were separately examined according to whether they mapped to a CpG dinucleotide or not, an excess of C→T over G→A polymorphisms was found in non-CpG site FFD SNPs but was absent from iSNPs and CpG site FFD SNPs. Conclusion The genome-wide discrepancy of human FFD SNPs provides novel evidence for widespread selective pressure due to functional effects of sSNPs. The similar asymmetry pattern of FFD SNPs and iSNPs that map to a CpG can be explained by transcription-coupled mechanisms, including TCR and transcription-coupled mutation. Because of the hypermutability of CpG sites, more CpG site FFD SNPs are relatively younger and have confronted less selection effect than non-CpG FFD SNPs, which can explain the asymmetric discrepancy of CpG site FFD SNPs vs. non-CpG site FFD SNPs.

  5. Abundance and diversity of bacterial nitrifiers and denitrifiers and their functional genes in tannery wastewater treatment plants revealed by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Zhu Wang

    Full Text Available Biological nitrification/denitrification is frequently used to remove nitrogen from tannery wastewater containing high concentrations of ammonia. However, information is limited about the bacterial nitrifiers and denitrifiers and their functional genes in tannery wastewater treatment plants (WWTPs due to the low-throughput of the previously used methods. In this study, 454 pyrosequencing and Illumina high-throughput sequencing, combined with molecular methods, were used to comprehensively characterize structures and functions of nitrification and denitrification bacterial communities in aerobic and anaerobic sludge of two full-scale tannery WWTPs. Pyrosequencing of 16S rRNA genes showed that Proteobacteria and Synergistetes dominated in the aerobic and anaerobic sludge, respectively. Ammonia-oxidizing bacteria (AOB amoA gene cloning revealed that Nitrosomonas europaea dominated the ammonia-oxidizing community in the WWTPs. Metagenomic analysis showed that the denitrifiers mainly included the genera of Thauera, Paracoccus, Hyphomicrobium, Comamonas and Azoarcus, which may greatly contribute to the nitrogen removal in the two WWTPs. It is interesting that AOB and ammonia-oxidizing archaea had low abundance although both WWTPs demonstrated high ammonium removal efficiency. Good correlation between the qPCR and metagenomic analysis is observed for the quantification of functional genes amoA, nirK, nirS and nosZ, indicating that the metagenomic approach may be a promising method used to comprehensively investigate the abundance of functional genes of nitrifiers and denitrifiers in the environment.

  6. Somatic loss of function mutations in neurofibromin 1 and MYC associated factor X genes identified by exome-wide sequencing in a wild-type GIST case

    International Nuclear Information System (INIS)

    Belinsky, Martin G.; Rink, Lori; Cai, Kathy Q.; Capuzzi, Stephen J.; Hoang, Yen; Chien, Jeremy; Godwin, Andrew K.; Mehren, Margaret von

    2015-01-01

    Approximately 10–15 % of gastrointestinal stromal tumors (GISTs) lack gain of function mutations in the KIT and platelet-derived growth factor receptor alpha (PDGFRA) genes. An alternate mechanism of oncogenesis through loss of function of the succinate-dehydrogenase (SDH) enzyme complex has been identified for a subset of these “wild type” GISTs. Paired tumor and normal DNA from an SDH-intact wild-type GIST case was subjected to whole exome sequencing to identify the pathogenic mechanism(s) in this tumor. Selected findings were further investigated in panels of GIST tumors through Sanger DNA sequencing, quantitative real-time PCR, and immunohistochemical approaches. A hemizygous frameshift mutation (p.His2261Leufs*4), in the neurofibromin 1 (NF1) gene was identified in the patient’s GIST; however, no germline NF1 mutation was found. A somatic frameshift mutation (p.Lys54Argfs*31) in the MYC associated factor X (MAX) gene was also identified. Immunohistochemical analysis for MAX on a large panel of GISTs identified loss of MAX expression in the MAX-mutated GIST and in a subset of mainly KIT-mutated tumors. This study suggests that inactivating NF1 mutations outside the context of neurofibromatosis may be the oncogenic mechanism for a subset of sporadic GIST. In addition, loss of function mutation of the MAX gene was identified for the first time in GIST, and a broader role for MAX in GIST progression was suggested. The online version of this article (doi:10.1186/s12885-015-1872-y) contains supplementary material, which is available to authorized users

  7. Harnessing Omics Big Data in Nine Vertebrate Species by Genome-Wide Prioritization of Sequence Variants with the Highest Predicted Deleterious Effect on Protein Function.

    Science.gov (United States)

    Rozman, Vita; Kunej, Tanja

    2018-05-10

    Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).

  8. Application of zero eigenvalue for solving the potential, heat, and wave equations using a sequence of special functions

    Directory of Open Access Journals (Sweden)

    2006-01-01

    Full Text Available In the solution of boundary value problems, usually zero eigenvalue is ignored. This case also happens in calculating the eigenvalues of matrices, so that we would often like to find the nonzero solutions of the linear system A X = λ X when λ ≠ 0 . But λ = 0 implies that det A = 0 for X ≠ 0 and then the rank of matrix A is reduced at least one degree. This comment can similarly be stated for boundary value problems. In other words, if at least one of the eigens of equations related to the main problem is considered zero, then one of the solutions will be specified in advance. By using this note, first we study a class of special functions and then apply it for the potential, heat, and wave equations in spherical coordinate. In this way, some practical examples are also given.

  9. A novel Drosophila model of TDP-43 proteinopathies: N-terminal sequences combined with the Q/N domain induce protein functional loss and locomotion defects

    Directory of Open Access Journals (Sweden)

    Simona Langellotti

    2016-06-01

    Full Text Available Transactive response DNA-binding protein 43 kDa (TDP-43, also known as TBPH in Drosophila melanogaster and TARDBP in mammals is the main protein component of the pathological inclusions observed in neurons of patients affected by different neurodegenerative disorders, including amyotrophic lateral sclerosis (ALS and fronto-temporal lobar degeneration (FTLD. The number of studies investigating the molecular mechanisms underlying neurodegeneration is constantly growing; however, the role played by TDP-43 in disease onset and progression is still unclear. A fundamental shortcoming that hampers progress is the lack of animal models showing aggregation of TDP-43 without overexpression. In this manuscript, we have extended our cellular model of aggregation to a transgenic Drosophila line. Our fly model is not based on the overexpression of a wild-type TDP-43 transgene. By contrast, we engineered a construct that includes only the specific TDP-43 amino acid sequences necessary to trigger aggregate formation and capable of trapping endogenous Drosophila TDP-43 into a non-functional insoluble form. Importantly, the resulting recombinant product lacks functional RNA recognition motifs (RRMs and, thus, does not have specific TDP-43-physiological functions (i.e. splicing regulation ability that might affect the animal phenotype per se. This novel Drosophila model exhibits an evident degenerative phenotype with reduced lifespan and early locomotion defects. Additionally, we show that important proteins involved in neuromuscular junction function, such as syntaxin (SYX, decrease their levels as a consequence of TDP-43 loss of function implying that the degenerative phenotype is a consequence of TDP-43 sequestration into the aggregates. Our data lend further support to the role of TDP-43 loss-of-function in the pathogenesis of neurodegenerative disorders. The novel transgenic Drosophila model presented in this study will help to gain further insight into the

  10. Sequence variation of functional HTLV-II tax alleles among isolates from an endemic population: lack of evidence for oncogenic determinant in tax.

    Science.gov (United States)

    Hjelle, B; Chaney, R

    1992-02-01

    Human T-cell leukemia-lymphoma virus type II (HTLV-II) has been isolated from patients with hairy cell leukemia (HCL). We previously described a population with longstanding endemic HTLV-II infection, and showed that there is no increased risk for HCL in the affected groups. We thus have direct evidence that the endemic form(s) of HTLV-II cause HCL infrequently, if at all. By comparison, there is reason to suspect that the viruses isolated from patients with HCL had an etiologic role in the disease in those patients. One way to reconcile these conflicting observations is to consider that isolates of HTLV-II might differ in oncogenic potential. To determine whether the structure of the putative oncogenic determinant of HTLV-II, tax2, might differ in the new isolates compared to the tax of the prototype HCL isolate, MO, four new functional tax cDNAs were cloned from new isolates. Sequence analysis showed only minor (0.9-2.0%) amino acid variation compared to the published sequence of MO tax2. Some codons were consistently different from published sequences of the MO virus, but in most cases, such variations were also found in each of two tax2 clones we isolated from the MO T-cell line. These variations rendered the new clones more similar to the tax1 of the pathogenic virus HTLV-I. Thus we find no evidence that pathologic determinants of HTLV-II can be assigned to the tax gene.

  11. Niemann-Pick C1 (NPC1/NPC1-like1 Chimeras Define Sequences Critical for NPC1’s Function as a Filovirus Entry Receptor

    Directory of Open Access Journals (Sweden)

    Esther Ndungo

    2012-10-01

    Full Text Available We recently demonstrated that Niemann-Pick C1 (NPC1, a ubiquitous 13-pass cellular membrane protein involved in lysosomal cholesterol transport, is a critical entry receptor for filoviruses. Here we show that Niemann-Pick C1-like1 (NPC1L1, an NPC1 paralog and hepatitis C virus entry factor, lacks filovirus receptor activity. We exploited the structural similarity between NPC1 and NPC1L1 to construct and analyze a panel of chimeras in which NPC1L1 sequences were replaced with cognate sequences from NPC1. Only one chimera, NPC1L1 containing the second luminal domain (C of NPC1 in place of its own, bound to the viral glycoprotein, GP. This engineered protein mediated authentic filovirus infection nearly as well as wild-type NPC1, and more efficiently than did a minimal NPC1 domain C-based receptor recently described by us. A reciprocal chimera, NPC1 containing NPC1L1’s domain C, was completely inactive. Remarkably, an intra-domain NPC1L1-NPC1 chimera bearing only a ~130-amino acid N–terminal region of NPC1 domain C could confer substantial viral receptor activity on NPC1L1. Taken together, these findings account for the failure of NPC1L1 to serve as a filovirus receptor, highlight the central role of the luminal domain C of NPC1 in filovirus entry, and reveal the direct involvement of N–terminal domain C sequences in NPC1’s function as a filovirus receptor.

  12. Prediction of non-canonical polyadenylation signals in human genomic sequences based on a novel algorithm using a fuzzy membership function.

    Science.gov (United States)

    Kamasawa, Masami; Horiuchi, Jun-Ichi

    2009-05-01

    Computational prediction of polyadenylation signals (PASes) is essential for analysis of alternative polyadenylation that plays crucial roles in gene regulations by generating heterogeneity of 3'-UTR of mRNAs. To date, several algorithms that are mostly based on machine learning methods have been developed to predict PASes. Accuracies of predictions by those algorithms have improved significantly for the last decade. However, they are designed primarily for prediction of the most canonical AAUAAA and its common variant AUUAAA whereas other variants have been ignored in their predictions despite recent studies indicating that non-canonical variants of AAUAAA are more important in the polyadenylation process than commonly recognized. Here we present a new algorithm "PolyF" employing fuzzy logic to confer an advance in computational PAS prediction--enable prediction of the non-canonical variants, and improve the accuracies for the canonical A(A/U)UAAA prediction. PolyF is a simple computational algorithm that is composed of membership functions defining sequence features of downstream sequence element (DSE) and upstream sequence element (USE), together with an inference engine. As a result, PolyF successfully identified the 10 single-nucleotide variants with approximately the same or higher accuracies compared to those for A(A/U)UAAA. PolyF also achieved higher accuracies for A(A/U)UAAA prediction than those by commonly known PAS finder programs, Polyadq and Erpin. Incorporating the USE into the PolyF algorithm was found to enhance prediction accuracies for all the 12 PAS hexamers compared to those using only the DSE, suggesting an important contribution of the USE in the polyadenylation process.

  13. Structural and functional analysis of mouse Msx1 gene promoter: sequence conservation with human MSX1 promoter points at potential regulatory elements.

    Science.gov (United States)

    Gonzalez, S M; Ferland, L H; Robert, B; Abdelhay, E

    1998-06-01

    Vertebrate Msx genes are related to one of the most divergent homeobox genes of Drosophila, the muscle segment homeobox (msh) gene, and are expressed in a well-defined pattern at sites of tissue interactions. This pattern of expression is conserved in vertebrates as diverse as quail, zebrafish, and mouse in a range of sites including neural crest, appendages, and craniofacial structures. In the present work, we performed structural and functional analyses in order to identify potential cis-acting elements that may be regulating Msx1 gene expression. To this end, a 4.9-kb segment of the 5'-flanking region was sequenced and analyzed for transcription-factor binding sites. Four regions showing a high concentration of these sites were identified. Transfection assays with fragments of regulatory sequences driving the expression of the bacterial lacZ reporter gene showed that a region of 4 kb upstream of the transcription start site contains positive and negative elements responsible for controlling gene expression. Interestingly, a fragment of 130 bp seems to contain the minimal elements necessary for gene expression, as its removal completely abolishes gene expression in cultured cells. These results are reinforced by comparison of this region with the human Msx1 gene promoter, which shows extensive conservation, including many consensus binding sites, suggesting a regulatory role for them.

  14. Effect of alteplase thrombolysis sequenced by low molecular heparin calcium antithrombosis on the neurological function and serum cytokines in patients with cerebral infarction

    Directory of Open Access Journals (Sweden)

    Yi-Ping Dan

    2017-04-01

    Full Text Available Objective: To study the effect of alteplase thrombolysis sequenced by low molecular heparin calcium antithrombosis on the neurological function and serum cytokines in patients with cerebral infarction. Methods: Patients with acute cerebral infarction who received alteplase thrombolysis in Zigong Fourth People's Hospital between June 2014 and October 2016 were retrospectively analyzed and divided into the intervention group who received low molecular heparin calcium treatment and the control group who did not receive low molecular heparin calcium treatment. The serum was collected before and after treatment to determine the contents of platelet activation factors, nerve injury molecules, soluble apoptotic molecules and growth factors. Results: Serum CD62p, CD63, PAF, GMP-140, NSE, S100B, GFAP, sFas, sFasL, sTRAIL, IGF-1, VEGF, BDNF and bFGF levels of both groups of patients after treatment were lower than those before treatment, serum CD62p, CD63, PAF, GMP-140, NSE, S100B, GFAP, sFas, sFasL and sTRAIL levels of intervention group after treatment were lower than those of control group while IGF-1, VEGF, BDNF and bFGF levels were higher than those of control group. Conclusion: Alteplase thrombolysis sequenced by low molecular heparin calcium antithrombosis for acute cerebral infarction can inhibit platelet activation and cell apoptosis, alleviate nerve injury and improve neurotrophy status.

  15. Multiple signalling systems controlling expression of luminescence in Vibrio harveyi: sequence and function of genes encoding a second sensory pathway.

    Science.gov (United States)

    Bassler, B L; Wright, M; Silverman, M R

    1994-07-01

    Density-dependent expression of luminescence in Vibrio harveyi is regulated by the concentration of extracellular signal molecules (autoinducers) in the culture medium. One signal-response system is encoded by the luxL,M,N locus. The luxL and luxM genes are required for the production of an autoinducer (probably beta-hydroxybutyl homoserine lactone), and the luxN gene is required for the response to that autoinducer. Analysis of the phenotypes of LuxL,M and N mutants indicated that an additional signal-response system also controls density sensing. We report here the identification, cloning and analysis of luxP and luxQ, which encode functions required for a second density-sensing system. Mutants with defects in luxP and luxQ are defective in response to a second autoinducer substance. LuxQ, like LuxN, is similar to members of the family of two-component, signal transduction proteins and contains both a histidine protein kinase and a response regulator domain. Analysis of signalling mutant phenotypes indicates that there are at least two separate signal-response pathways which converge to regulate expression of luminescence in V. harveyi.

  16. Comparative sequence analysis revealed altered chromosomal organization and a novel insertion sequence encoding DNA modification and potentially stress-related functions in an Escherichia coli O157:H7 foodborne isolate

    Science.gov (United States)

    We recently described the complete genome of enterohemorrhagic Escherichia coli (EHEC) O157:H7 strain NADC 6564, an isolate of strain 86-24 linked to the 1986 disease outbreak. In the current study, we compared the chromosomal sequence of NADC 6564 to the well-characterized chromosomal sequences of ...

  17. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery.

    Directory of Open Access Journals (Sweden)

    Bharat Bhusan Patnaik

    Full Text Available The freshwater mussel Cristaria plicata (Bivalvia: Eulamellibranchia: Unionidae, is an economically important species in molluscan aquaculture due to its use in pearl farming. The species have been listed as endangered in South Korea due to the loss of natural habitats caused by anthropogenic activities. The decreasing population and a lack of genomic information on the species is concerning for environmentalists and conservationists. In this study, we conducted a de novo transcriptome sequencing and annotation analysis of C. plicata using Illumina HiSeq 2500 next-generation sequencing (NGS technology, the Trinity assembler, and bioinformatics databases to prepare a sustainable resource for the identification of candidate genes involved in immunity, defense, and reproduction.The C. plicata transcriptome analysis included a total of 286,152,584 raw reads and 281,322,837 clean reads. The de novo assembly identified a total of 453,931 contigs and 374,794 non-redundant unigenes with average lengths of 731.2 and 737.1 bp, respectively. Furthermore, 100% coverage of C. plicata mitochondrial genes within two unigenes supported the quality of the assembler. In total, 84,274 unigenes showed homology to entries in at least one database, and 23,246 unigenes were allocated to one or more Gene Ontology (GO terms. The most prominent GO biological process, cellular component, and molecular function categories (level 2 were cellular process, membrane, and binding, respectively. A total of 4,776 unigenes were mapped to 123 biological pathways in the KEGG database. Based on the GO terms and KEGG annotation, the unigenes were suggested to be involved in immunity, stress responses, sex-determination, and reproduction. A total of 17,251 cDNA simple sequence repeats (cSSRs were identified from 61,141 unigenes (size of >1 kb with the most abundant being dinucleotide repeats.This dataset represents the first transcriptome analysis of the endangered mollusc, C. plicata

  18. Push it to the limit: Characterizing the convergence of common sequences of basis sets for intermolecular interactions as described by density functional theory

    Energy Technology Data Exchange (ETDEWEB)

    Witte, Jonathon [Department of Chemistry, University of California, Berkeley, California 94720 (United States); Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Neaton, Jeffrey B. [Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Physics, University of California, Berkeley, California 94720 (United States); Kavli Energy Nanosciences Institute at Berkeley, Berkeley, California 94720 (United States); Head-Gordon, Martin, E-mail: mhg@cchem.berkeley.edu [Department of Chemistry, University of California, Berkeley, California 94720 (United States); Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States)

    2016-05-21

    With the aim of systematically characterizing the convergence of common families of basis sets such that general recommendations for basis sets can be made, we have tested a wide variety of basis sets against complete-basis binding energies across the S22 set of intermolecular interactions—noncovalent interactions of small and medium-sized molecules consisting of first- and second-row atoms—with three distinct density functional approximations: SPW92, a form of local-density approximation; B3LYP, a global hybrid generalized gradient approximation; and B97M-V, a meta-generalized gradient approximation with nonlocal correlation. We have found that it is remarkably difficult to reach the basis set limit; for the methods and systems examined, the most complete basis is Jensen’s pc-4. The Dunning correlation-consistent sequence of basis sets converges slowly relative to the Jensen sequence. The Karlsruhe basis sets are quite cost effective, particularly when a correction for basis set superposition error is applied: counterpoise-corrected def2-SVPD binding energies are better than corresponding energies computed in comparably sized Dunning and Jensen bases, and on par with uncorrected results in basis sets 3-4 times larger. These trends are exhibited regardless of the level of density functional approximation employed. A sense of the magnitude of the intrinsic incompleteness error of each basis set not only provides a foundation for guiding basis set choice in future studies but also facilitates quantitative comparison of existing studies on similar types of systems.

  19. Time Sequence Spectroscopy of AW UMa. The 518 nm Mg i Triplet Region Analyzed With Broadening Functions

    Science.gov (United States)

    Rucinski, Slavek M.

    2015-02-01

    High-resolution spectroscopic observations of AW UMa, obtained on three consecutive nights with a median time resolution of 2.1 minutes, have been analyzed using the broadening function method in the spectral window of 22.75 nm around the 518 nm Mg i triplet region. Doppler images of the system reveal the presence of vigorous mass motions within the binary system; their presence puts into question the solid-body rotation assumption of the contact binary model. AW UMa appears to be a very tight, semi-detached binary; the mass transfer takes place from the more massive to the less massive component. The primary, a fast-rotating star with Vsin i=181.4+/- 2.5 km s-1, is covered with inhomogeneities: very slowly drifting spots and a dense network of ripples more closely participating in its rotation. The spectral lines of the primary show an additional broadening component (called the “pedestal”) that originates either in the equatorial regions, which rotate faster than the rest of the star by about 50 km s-1, or in an external disk-like structure. The secondary component appears to be smaller than predicted by the contact model. The radial velocity field around the secondary is dominated by accretion of matter transferred from (and possibly partly returned to) the primary component. The parameters of the binary are Asin i=2.73+/- 0.11 {{R}⊙ } and {{M}1}{{sin }3}i=1.29+/- 0.15 {{M}⊙ }, {{M}2}{{sin }3}i=0.128+/- 0.016 {{M}⊙ }. The mass ratio, {{q}sp}={{M}2}/{{M}1}=0.099+/- 0.003, while still the most uncertain among the spectroscopic elements, is substantially different from the previous numerous and mutually consistent photometric investigations which were based on the contact model. It should be studied why photometry and spectroscopy give such discrepant results and whether AW UMa is an unusual object or if only very high-quality spectroscopy can reveal the true nature of W UMa-type binaries. Based on observations obtained at the Canada

  20. Time sequence spectroscopy of AW UMa. The 518 nm Mg I triplet region analyzed with broadening functions

    International Nuclear Information System (INIS)

    Rucinski, Slavek M.

    2015-01-01

    High-resolution spectroscopic observations of AW UMa, obtained on three consecutive nights with a median time resolution of 2.1 minutes, have been analyzed using the broadening function method in the spectral window of 22.75 nm around the 518 nm Mg i triplet region. Doppler images of the system reveal the presence of vigorous mass motions within the binary system; their presence puts into question the solid-body rotation assumption of the contact binary model. AW UMa appears to be a very tight, semi-detached binary; the mass transfer takes place from the more massive to the less massive component. The primary, a fast-rotating star with Vsini=181.4±2.5 km s −1 , is covered with inhomogeneities: very slowly drifting spots and a dense network of ripples more closely participating in its rotation. The spectral lines of the primary show an additional broadening component (called the “pedestal”) that originates either in the equatorial regions, which rotate faster than the rest of the star by about 50 km s −1 , or in an external disk-like structure. The secondary component appears to be smaller than predicted by the contact model. The radial velocity field around the secondary is dominated by accretion of matter transferred from (and possibly partly returned to) the primary component. The parameters of the binary are Asini=2.73±0.11 R ⊙ and M 1 sin 3 i=1.29±0.15 M ⊙ , M 2 sin 3 i=0.128±0.016 M ⊙ . The mass ratio, q sp =M 2 /M 1 =0.099±0.003, while still the most uncertain among the spectroscopic elements, is substantially different from the previous numerous and mutually consistent photometric investigations which were based on the contact model. It should be studied why photometry and spectroscopy give such discrepant results and whether AW UMa is an unusual object or if only very high-quality spectroscopy can reveal the true nature of W UMa-type binaries.

  1. Avian reovirus L2 genome segment sequences and predicted structure/function of the encoded RNA-dependent RNA polymerase protein

    Directory of Open Access Journals (Sweden)

    Xu Wanhong

    2008-12-01

    Full Text Available Abstract Background The orthoreoviruses are infectious agents that possess a genome comprised of 10 double-stranded RNA segments encased in two concentric protein capsids. Like virtually all RNA viruses, an RNA-dependent RNA polymerase (RdRp enzyme is required for viral propagation. RdRp sequences have been determined for the prototype mammalian orthoreoviruses and for several other closely-related reoviruses, including aquareoviruses, but have not yet been reported for any avian orthoreoviruses. Results We determined the L2 genome segment nucleotide sequences, which encode the RdRp proteins, of two different avian reoviruses, strains ARV138 and ARV176 in order to define conserved and variable regions within reovirus RdRp proteins and to better delineate structure/function of this important enzyme. The ARV138 L2 genome segment was 3829 base pairs long, whereas the ARV176 L2 segment was 3830 nucleotides long. Both segments were predicted to encode λB RdRp proteins 1259 amino acids in length. Alignments of these newly-determined ARV genome segments, and their corresponding proteins, were performed with all currently available homologous mammalian reovirus (MRV and aquareovirus (AqRV genome segment and protein sequences. There was ~55% amino acid identity between ARV λB and MRV λ3 proteins, making the RdRp protein the most highly conserved of currently known orthoreovirus proteins, and there was ~28% identity between ARV λB and homologous MRV and AqRV RdRp proteins. Predictive structure/function mapping of identical and conserved residues within the known MRV λ3 atomic structure indicated most identical amino acids and conservative substitutions were located near and within predicted catalytic domains and lining RdRp channels, whereas non-identical amino acids were generally located on the molecule's surfaces. Conclusion The ARV λB and MRV λ3 proteins showed the highest ARV:MRV identity values (~55% amongst all currently known ARV and MRV

  2. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  3. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  4. Converging evidence that sequence variations in the novel candidate gene MAP2K7 (MKK7) are functionally associated with schizophrenia.

    Science.gov (United States)

    Winchester, Catherine L; Ohzeki, Hiromitsu; Vouyiouklis, Demetrius A; Thompson, Rhiannon; Penninger, Josef M; Yamagami, Keiji; Norrie, John D; Hunter, Robert; Pratt, Judith A; Morris, Brian J

    2012-11-15

    Schizophrenia is a debilitating psychiatric disease with a strong genetic contribution, potentially linked to altered glutamatergic function in brain regions such as the prefrontal cortex (PFC). Here, we report converging evidence to support a functional candidate gene for schizophrenia. In post-mortem PFC from patients with schizophrenia, we detected decreased expression of MKK7/MAP2K7-a kinase activated by glutamatergic activity. While mice lacking one copy of the Map2k7 gene were overtly normal in a variety of behavioural tests, these mice showed a schizophrenia-like cognitive phenotype of impaired working memory. Additional support for MAP2K7 as a candidate gene came from a genetic association study. A substantial effect size (odds ratios: ~1.9) was observed for a common variant in a cohort of case and control samples collected in the Glasgow area and also in a replication cohort of samples of Northern European descent (most significant P-value: 3 × 10(-4)). While some caution is warranted until these association data are further replicated, these results are the first to implicate the candidate gene MAP2K7 in genetic risk for schizophrenia. Complete sequencing of all MAP2K7 exons did not reveal any non-synonymous mutations. However, the MAP2K7 haplotype appeared to have functional effects, in that it influenced the level of expression of MAP2K7 mRNA in human PFC. Taken together, the results imply that reduced function of the MAP2K7-c-Jun N-terminal kinase (JNK) signalling cascade may underlie some of the neurochemical changes and core symptoms in schizophrenia.

  5. Complete genome sequence provides insights into the biodrying-related microbial function of Bacillus thermoamylovorans isolated from sewage sludge biodrying material.

    Science.gov (United States)

    Cai, Lu; Zheng, Sheng-Wei; Shen, Yu-Jun; Zheng, Guo-Di; Liu, Hong-Tao; Wu, Zhi-Ying

    2018-07-01

    To enable the development of microbial agents and identify suitable candidate used for biodrying, the existence and function of Bacillus thermoamylovorans during sewage sludge biodrying merits investigation. This study isolated a strain of B. thermoamylovorans during sludge biodrying, submitted it for complete genome sequencing and analyzed its potential microbial functions. After biodrying, the moisture content of the biodrying material decreased from 66.33% to 50.18%, and B. thermoamylovorans was the ecologically dominant Bacillus, with the primary annotations associated with amino acid transport and metabolism (9.53%) and carbohydrate transport and metabolism (8.14%). It contains 96 carbohydrate-active- enzyme-encoding gene counts, mainly distributed in glycoside hydrolases (33.3%) and glycosyl transferases (27.1%). The virulence factors are mainly associated with biosynthesis of capsule and polysaccharide capsule. This work indicates that among the biodrying microorganisms, B. thermoamylovorans has good potential for degrading recalcitrant and readily degradable components, thus being a potential microbial agent used to improve biodrying. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. Identification and functional characterization of effectors in expressed sequence tags from various life cycle stages of the potato cyst nematode Globodera pallida.

    Science.gov (United States)

    Jones, John T; Kumar, Amar; Pylypenko, Liliya A; Thirugnanasambandam, Amarnath; Castelli, Lydia; Chapman, Sean; Cock, Peter J A; Grenier, Eric; Lilley, Catherine J; Phillips, Mark S; Blok, Vivian C

    2009-11-01

    In this article, we describe the analysis of over 9000 expressed sequence tags (ESTs) from cDNA libraries obtained from various life cycle stages of Globodera pallida. We have identified over 50 G. pallida effectors from this dataset using bioinformatics analysis, by screening clones in order to identify secreted proteins up-regulated after the onset of parasitism and using in situ hybridization to confirm the expression in pharyngeal gland cells. A substantial gene family encoding G. pallida SPRYSEC proteins has been identified. The expression of these genes is restricted to the dorsal pharyngeal gland cell. Different members of the SPRYSEC family of proteins from G. pallida show different subcellular localization patterns in plants, with some localized to the cytoplasm and others to the nucleus and nucleolus. Differences in subcellular localization may reflect diverse functional roles for each individual protein or, more likely, variety in the compartmentalization of plant proteins targeted by the nematode. Our data are therefore consistent with the suggestion that the SPRYSEC proteins suppress host defences, as suggested previously, and that they achieve this through interaction with a range of host targets.

  7. Open questions in origin of life: experimental studies on the origin of nucleic acids and proteins with specific and functional sequences by a chemical synthetic biology approach

    DEFF Research Database (Denmark)

    Adamala, K.; Anella, F.; Wieczorek, R.

    2014-01-01

    sequences among a vast array of possible ones, the huge "sequence space", leading to the question "why these macromolecules, and not the others?" We have recently addressed these questions by using a chemical synthetic biology approach. In particular, we have tested the catalytic activity of small peptides...

  8. Hydrophobic cluster analysis of G protein-coupled receptors: a powerful tool to derive structural and functional information from 2D-representation of protein sequences

    NARCIS (Netherlands)

    Lentes, K.U.; Mathieu, E.; Bischoff, Rainer; Rasmussen, U.B.; Pavirani, A.

    1993-01-01

    Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%,

  9. Mapping sequences by parts

    Directory of Open Access Journals (Sweden)

    Guziolowski Carito

    2007-09-01

    Full Text Available Abstract Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N using O (|s| × |t| × N memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

  10. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

    Directory of Open Access Journals (Sweden)

    Haznedaroglu Berat Z

    2012-07-01

    Full Text Available Abstract Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG ortholog identifiers (KOIs, and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63. For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs in the assemblies of individual k-mers (k-19 to k-63 that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities

  11. Functional analysis of the interdependence between DNA uptake sequence and its cognate ComP receptor during natural transformation in Neisseria species.

    Directory of Open Access Journals (Sweden)

    Jamie-Lee Berry

    Full Text Available Natural transformation is the widespread biological process by which "competent" bacteria take up free DNA, incorporate it into their genomes, and become genetically altered or "transformed". To curb often deleterious transformation by foreign DNA, several competent species preferentially take up their own DNA that contains specific DUS (DNA uptake sequence watermarks. Our recent finding that ComP is the long sought DUS receptor in Neisseria species paves the way for the functional analysis of the DUS-ComP interdependence which is reported here. By abolishing/modulating ComP levels in Neisseria meningitidis, we show that the enhancement of transformation seen in the presence of DUS is entirely dependent on ComP, which also controls transformation in the absence of DUS. While peripheral bases in the DUS were found to be less important, inner bases are essential since single base mutations led to dramatically impaired interaction with ComP and transformation. Strikingly, naturally occurring DUS variants in the genomes of human Neisseria commensals differing from DUS by only one or two bases were found to be similarly impaired for transformation of N. meningitidis. By showing that ComPsub from the N. subflava commensal specifically binds its cognate DUS variant and mediates DUS-enhanced transformation when expressed in a comP mutant of N. meningitidis, we confirm that a similar mechanism is used by all Neisseria species to promote transformation by their own, or closely related DNA. Together, these findings shed new light on the molecular events involved in the earliest step in natural transformation, and reveal an elegant mechanism for modulating horizontal gene transfer between competent species sharing the same niche.

  12. The regulatory network of cluster-root function and development in phosphate-deficient white lupin (Lupinus albus) identified by transcriptome sequencing.

    Science.gov (United States)

    Wang, Zhengrui; Straub, Daniel; Yang, Huaiyu; Kania, Angelika; Shen, Jianbo; Ludewig, Uwe; Neumann, Günter

    2014-07-01

    Lupinus albus serves as model plant for root-induced mobilization of sparingly soluble soil phosphates via the formation of cluster-roots (CRs) that mediate secretion of protons, citrate, phenolics and acid phosphatases (APases). This study employed next-generation sequencing to investigate the molecular mechanisms behind these complex adaptive responses at the transcriptome level. We compared different stages of CR development, including pre-emergent (PE), juvenile (JU) and the mature (MA) stages. The results confirmed that the primary metabolism underwent significant modifications during CR maturation, promoting the biosynthesis of organic acids, as had been deduced from physiological studies. Citrate catabolism was downregulated, associated with citrate accumulation in MA clusters. Upregulation of the phenylpropanoid pathway reflected the accumulation of phenolics. Specific transcript expression of ALMT and MATE transporter genes correlated with the exudation of citrate and flavonoids. The expression of transcripts related to nucleotide degradation and APases in MA clusters coincided with the re-mobilization and hydrolysis of organic phosphate resources. Most interestingly, hormone-related gene expression suggested a central role of ethylene during CR maturation. This was associated with the upregulation of the iron (Fe)-deficiency regulated network that mediates ethylene-induced expression of Fe-deficiency responses in other species. Finally, transcripts related to abscisic acid and jasmonic acid were upregulated in MA clusters, while auxin- and brassinosteroid-related genes and cytokinin receptors were most strongly expressed during CR initiation. Key regulations proposed by the RNA-seq data were confirmed by quantitative real-time polymerase chain reaction (RT-qPCR) and some physiological analyses. A model for the gene network regulating CR development and function is presented. © 2014 Scandinavian Plant Physiology Society.

  13. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  14. Intergenic sequence between Arabidopsis caseinolytic protease B-cytoplasmic/heat shock protein100 and choline kinase genes functions as a heat-inducible bidirectional promoter.

    Science.gov (United States)

    Mishra, Ratnesh Chandra; Grover, Anil

    2014-11-01

    In Arabidopsis (Arabidopsis thaliana), the At1g74310 locus encodes for caseinolytic protease B-cytoplasmic (ClpB-C)/heat shock protein100 protein (AtClpB-C), which is critical for the acquisition of thermotolerance, and At1g74320 encodes for choline kinase (AtCK2) that catalyzes the first reaction in the Kennedy pathway for phosphatidylcholine biosynthesis. Previous work has established that the knockout mutants of these genes display heat-sensitive phenotypes. While analyzing the AtClpB-C promoter and upstream genomic regions in this study, we noted that AtClpB-C and AtCK2 genes are head-to-head oriented on chromosome 1 of the Arabidopsis genome. Expression analysis showed that transcripts of these genes are rapidly induced in response to heat stress treatment. In stably transformed Arabidopsis plants harboring this intergenic sequence between head-to-head oriented green fluorescent protein and β-glucuronidase reporter genes, both transcripts and proteins of the two reporters were up-regulated upon heat stress. Four heat shock elements were noted in the intergenic region by in silico analysis. In the homozygous transfer DNA insertion mutant Salk_014505, 4,393-bp transfer DNA is inserted at position -517 upstream of ATG of the AtClpB-C gene. As a result, AtCk2 loses proximity to three of the four heat shock elements in the mutant line. Heat-inducible expression of the AtCK2 transcript was completely lost, whereas the expression of AtClpB-C was not affected in the mutant plants. Our results suggest that the 1,329-bp intergenic fragment functions as a heat-inducible bidirectional promoter and the region governing the heat inducibility is possibly shared between the two genes. We propose a model in which AtClpB-C shares its regulatory region with heat-induced choline kinase, which has a possible role in heat signaling. © 2014 American Society of Plant Biologists. All Rights Reserved.

  15. Simulation of the Basin Effects in the Po Plain During the Emilia-Romagna Seismic Sequence (2012) Using Empirical Green's Functions

    Science.gov (United States)

    Dujardin, Alain; Causse, Mathieu; Courboulex, Françoise; Traversa, Paola

    2016-06-01

    The two main earthquakes that occurred in 2012 (May 20 and 29) in the Reggio-Emiliano region (Northern Italy) were relatively small (Mw 6.1 and Mw 5.9) but they generated unexpected damages in a large area around the epicenter. On some stations, the observed seismic levels exceeded design levels recommended by the EC8 seismic code for buildings and civil engineering works. The ground motions generated by the two mainshocks have specific characteristics: the waveforms are mainly controlled by surface waves generated by the deep sedimentary Po plain, by local site effects and also, on some stations, by non-linear behaviors. In this particular context, we test the ability of an empirical Green's function (EGF) simulation approach to reproduce the recorded seismograms in a large frequency band without any knowledge of the underground medium. We focus on the possibility to reproduce the strong surface waves generated by the basin at distances between 25 and 90 km. We choose to work on the second mainshock of the sequence (Mw 5.9), which occurred on May 29, 2012, because it is better recorded by the seismological networks than the May 20th first mainshock. We use a k-2 kinematic source model to generate a set of 100 slip distributions on the fault plane and choose the recordings of a close-by Mw 3.9 event as EGF. We then generate a set of broad-band seismograms (from 0.2 to 35 Hz) and compare them to the mainshock signals at 15 stations (Seismograms, Fourier spectra, PGA, PGV, duration, Stockwell Transforms) at epicentral distances from 5 to 160 km. We find that the main specific features of the signals are very well reproduced for all the stations within and beyond the basin. Nevertheless, at nearby stations, the PGA values are over-evaluated, which could be explained by the fact that non- linear effects are not taken into account in the simulation process. A better fit was found for a position of the nucleation point to the bottom west of the fault, that suggest a

  16. Farey sequences and resistor networks

    Indian Academy of Sciences (India)

    Green's function, while the perturbation of a network is investigated in [3]. ... In Theorem 1 below, we employ the Farey sequence to establish a strict .... We next show that the Farey sequence method is applicable for circuits with n or fewer.

  17. Identification of Functional Variants for Cleft Lip with or without Cleft Palate in or near PAX7, FGFR2, and NOG by Targeted Sequencing of GWAS Loci

    DEFF Research Database (Denmark)

    Leslie, Elizabeth J; Taub, Margaret A; Liu, Huan

    2015-01-01

    Although genome-wide association studies (GWASs) for nonsyndromic orofacial clefts have identified multiple strongly associated regions, the causal variants are unknown. To address this, we selected 13 regions from GWASs and other studies, performed targeted sequencing in 1,409 Asian and European...

  18. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  19. Identification of DNA-binding protein target sequences by physical effective energy functions: free energy analysis of lambda repressor-DNA complexes.

    Directory of Open Access Journals (Sweden)

    Caselle Michele

    2007-09-01

    Full Text Available Abstract Background Specific binding of proteins to DNA is one of the most common ways gene expression is controlled. Although general rules for the DNA-protein recognition can be derived, the ambiguous and complex nature of this mechanism precludes a simple recognition code, therefore the prediction of DNA target sequences is not straightforward. DNA-protein interactions can be studied using computational methods which can complement the current experimental methods and offer some advantages. In the present work we use physical effective potentials to evaluate the DNA-protein binding affinities for the λ repressor-DNA complex for which structural and thermodynamic experimental data are available. Results The binding free energy of two molecules can be expressed as the sum of an intermolecular energy (evaluated using a molecular mechanics forcefield, a solvation free energy term and an entropic term. Different solvation models are used including distance dependent dielectric constants, solvent accessible surface tension models and the Generalized Born model. The effect of conformational sampling by Molecular Dynamics simulations on the computed binding energy is assessed; results show that this effect is in general negative and the reproducibility of the experimental values decreases with the increase of simulation time considered. The free energy of binding for non-specific complexes, estimated using the best energetic model, agrees with earlier theoretical suggestions. As a results of these analyses, we propose a protocol for the prediction of DNA-binding target sequences. The possibility of searching regulatory elements within the bacteriophage λ genome using this protocol is explored. Our analysis shows good prediction capabilities, even in absence of any thermodynamic data and information on the naturally recognized sequence. Conclusion This study supports the conclusion that physics-based methods can offer a completely complementary

  20. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  1. Development of a functional cell-based assay that probes the specific interaction between influenza A virus NP and its packaging signal sequence RNA.

    Science.gov (United States)

    Woo, Jiwon; Yu, Kyung Lee; Lee, Sun Hee; You, Ji Chang

    2015-02-06

    Although cis-acting packaging signal RNA sequences for the influenza virus NP encoding vRNA have been identified recently though genetic studies, little is known about the interaction between NP and the vRNA packaging signals either in vivo or in vitro. Here, we provide evidence that NP is able to interact specifically with the vRNA packaging sequence RNA within living cells and that the specific RNA binding activity of NP in vivo requires both the N-terminal and central region of the protein. This assay established would be a valuable tool for further detailed studies of the NP-packaging signal RNA interaction in living cells. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. A Role for the Fifth G-Track in G-Quadruplex Forming Oncogene Promoter Sequences during Oxidative Stress: Do These "Spare Tires" Have an Evolved Function?

    Science.gov (United States)

    Fleming, Aaron M; Zhou, Jia; Wallace, Susan S; Burrows, Cynthia J

    2015-08-26

    Uncontrolled inflammation or oxidative stress generates electron-deficient species that oxidize the genome increasing its instability in cancer. The G-quadruplex (G4) sequences regulating the c-MYC , KRAS , VEGF , BCL-2 , HIF-1α , and RET oncogenes, as examples, are targets for oxidation at loop and 5'-core guanines (G) as showcased in this study by CO 3 •- oxidation of the VEGF G4. Products observed include 8-oxo-7,8-dihydroguanine (OG), spiroiminodihydantoin (Sp), and 5-guanidinohydantoin (Gh). Our previous studies found that OG and Gh, when present in the four G-tracks of the solved structure for VEGF and c-MY C, were not substrates for the base excision repair (BER) DNA glycosylases in biologically relevant KCl solutions. We now hypothesize that a fifth G-track found a few nucleotides distant from the G4 tracks involved in folding can act as a "spare tire," facilitating extrusion of a damaged G-run into a large loop that then becomes a substrate for BER. Thermodynamic, spectroscopic, and DMS footprinting studies verified the fifth domain replacing a damaged G-track with OG or Gh at a loop or core position in the VEGF G4. These new "spare tire"-containing strands with Gh in loops are now found to be substrates for initiation of BER with the NEIL1, NEIL2, and NEIL3 DNA glycosylases. The results support a hypothesis in which regulatory G4s carry a "spare-tire" fifth G-track for aiding in the repair process when these sequences are damaged by radical oxygen species, a feature observed in a large number of these sequences. Furthermore, formation and repair of oxidized bases in promoter regions may constitute an additional example of epigenetic modification, in this case of guanine bases, to regulate gene expression in which the G4 sequences act as sensors of oxidative stress.

  3. Analysis and functional annotation of expressed sequence tags from in vitro cell lines of elasmobranchs: Spiny dogfish shark (Squalus acanthias) and little skate (Leucoraja erinacea).

    Science.gov (United States)

    Parton, Angela; Bayne, Christopher J; Barnes, David W

    2010-09-01

    Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories "envelope" and "oxidoreductase activity" but the SAE transcripts did not. GO analysis of SAE transcripts identified the category "anatomical structure formation" that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes. Copyright 2010 Elsevier Inc. All rights reserved.

  4. Some Generalized Lacunary statistically difference double semi-normed sequence spaces defined by Orlicz function - doi: 10.4025/actascitechnol.v35i1.15523

    Directory of Open Access Journals (Sweden)

    Ayhan Esi

    2013-01-01

    Full Text Available In this article, we have introduced the idea of statistically convergent generalized difference lacunary double sequence spaces [¯w2 (M, Δn, p,q]θ, [¯w2 (M, Δn, p,q]θ and defined over a semi norm space (X, q. Also we have study some basic properties and obtained some inclusion relations between them.  

  5. Universal sequence map (USM of arbitrary discrete sequences

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2002-02-01

    Full Text Available Abstract Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM, is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR. The latter enables the representation of 4 unit type sequences (like DNA as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.

  6. Final Report for Grant No. DE-FG02-98ER62583 ''Functional Analysis of the Genome Sequence of Deinococcus radiodurans''

    International Nuclear Information System (INIS)

    Daly, Michael J.

    2003-01-01

    Extremophiles are nearly always defined with singular characteristics that allow existence within a singular extreme environment. The bacterium Deinococcus radiodurans qualifies as a polyextremeophile, showing remarkable resistance to a range of damage caused by ionizing radiation, dessication, ultraviolet radiation, oxidizing agents, and electrophilic mutagens. D. radiodurans is most famous for its extreme resistance to ionizing radiation; it not only can grow continuously in the presence of chronic radiation (6,000 rad per hour), but it can survive acute exposures to gamma radiation that exceed 1,500,000 rads without lethality or induced mutation. These characteristics were the impetus for sequencing its genome. We completed an extensive comparative sequence analysis of the Deinococcus radiodurans (strain R1) genome. Deinococcus is the first representative with a completely sequenced genome from a bacterial branch of extremophiles - the Thermus/Deinococcus group. Phylogenetic tree analysis, combined with the identification of several synapomorphies between Thermus and Deinococcus, support that it is a very ancient branch localized in the vicinity of the bacterial tree root. Distinctive features of the Deinoccoccus genome as well as features shared with other free-living bacteria were revealed by comparison of its proteome to a collection of Clusters of Orthologous Groups of proteins (COGs). Analysis of paralogs in Deinococcus has revealed some unique protein families. In addition, specific expansions of several protein families including phosphatases, proteases, acyl transferases and MutT pyrophosphohydrolases, were detected. Genes that potentially affect DNA repair and recombination were investigated in detail. Some proteins appear to have been horizontally transferred from eukaryotes, and are not present in other bacteria. For example, three proteins homologous to plant desiccation-resistance proteins were identified and these are particularly interesting

  7. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias.

    Science.gov (United States)

    Kjær, Jonas; Belsham, Graham J

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long), which induces a nonproteolytic, cotranslational "cleavage" at its own C terminus. A conserved feature among variants of 2A is the C-terminal motif N 16 P 17 G 18 /P 19 , where P 19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E 14 , S 15 , and N 16 within the 2A sequence of infectious FMDVs, but no variants at residues P 17 , G 18 , or P 19 have been identified. In this study, using highly degenerate primers, we analyzed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after two, three, or four passages. However, surprisingly, a clear codon preference for the wt nucleotide sequence encoding the NPGP motif within these viruses was observed. Indeed, the codons selected to code for P 17 and P 19 within this motif were distinct; thus the synonymous codons are not equivalent. © 2018 Kjær and Belsham; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  8. DNA sequence modeling based on context trees

    NARCIS (Netherlands)

    Kusters, C.J.; Ignatenko, T.; Roland, J.; Horlin, F.

    2015-01-01

    Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such

  9. Multineuronal Spike Sequences Repeat with Millisecond Precision

    Directory of Open Access Journals (Sweden)

    Koki eMatsumoto

    2013-06-01

    Full Text Available Cortical microcircuits are nonrandomly wired by neurons. As a natural consequence, spikes emitted by microcircuits are also nonrandomly patterned in time and space. One of the prominent spike organizations is a repetition of fixed patterns of spike series across multiple neurons. However, several questions remain unsolved, including how precisely spike sequences repeat, how the sequences are spatially organized, how many neurons participate in sequences, and how different sequences are functionally linked. To address these questions, we monitored spontaneous spikes of hippocampal CA3 neurons ex vivo using a high-speed functional multineuron calcium imaging technique that allowed us to monitor spikes with millisecond resolution and to record the location of spiking and nonspiking neurons. Multineuronal spike sequences were overrepresented in spontaneous activity compared to the statistical chance level. Approximately 75% of neurons participated in at least one sequence during our observation period. The participants were sparsely dispersed and did not show specific spatial organization. The number of sequences relative to the chance level decreased when larger time frames were used to detect sequences. Thus, sequences were precise at the millisecond level. Sequences often shared common spikes with other sequences; parts of sequences were subsequently relayed by following sequences, generating complex chains of multiple sequences.

  10. MR-sialography: optimisation and evaluation of an ultra-fast sequence in parallel acquisition technique and different functional conditions of salivary glands

    International Nuclear Information System (INIS)

    Habermann, C.R.; Cramer, M.C.; Aldefeld, D.; Weiss, F.; Kaul, M.G.; Adam, G.; Graessner, J.; Reitmeier, F.; Jaehne, M.; Petersen, K.U.

    2005-01-01

    Purpose: To optimise a fast sequence for MR-sialography and to compare a parallel and non-parallel acquisition technique. Additionally, the effect of oral stimulation regarding the image quality was evaluated. Material and Methods: All examinations were performed by using a 1.5-T superconducting system. After developing a sufficient sequence for MR-sialography, a single-shot turbo-spin-echo sequence (ss-TSE) with an acquisition time of 2.8 sec was used in transverse and oblique sagittal orientation in 27 healthy volunteers. All images were performed with and without parallel imaging technique. The assessment of the ductal system of the submandibular and parotid gland was performed using a 1 to 5 visual scale for each side separately. Images were evaluated by four independent experienced radiologists. For statistical evaluation, an ANOVA with post-hoc comparisons was used with an overall two-tailed significance level of P=.05. For evaluation of interobserver variability, an intraclass correlation was computed and correlation >.08 was determined to indicate a high correlation. Results: All parts of salivary excretal ducts could be visualised in all volunteers, with an overall rating for all ducts of 2.26 (SD±1.09). Between the four observers a high correlation could be obtained with an intraclass correlation of 0.9475. A significant influence regarding the slice angulations could not be obtained (p=0.74). In all healthy volunteers the visibility of excretory ducts improved significantly after oral application of a Sialogogum (p 2 =0.049). The use of a parallel imaging technique did not lead to an improvement of visualisation, showing a significant loss of image quality compared to an acquistion technique without parallel imaging (p 2 =0.013). Conclusion: The optimised ss-TSE MR-sialography seems to be a fast and sufficient technique for visualisation of excretory ducts of the main salivary glands, with no elaborate post-processing needed. To improve results of MR

  11. MR sialography: evaluation of an ultra-fast sequence in consideration of a parallel acquisition technique and different functional conditions in patients with salivary gland diseases

    International Nuclear Information System (INIS)

    Petridis, C.; Ries, T.; Cramer, M.C.; Graessner, J.; Petersen, K.U.; Reitmeier, F.; Jaehne, M.; Weiss, F.; Adam, G.; Habermann, C.R.

    2007-01-01

    Purpose: To evaluate an ultra-fast sequence for MR sialography requiring no post-processing and to compare the acquisition technique regarding the effect of oral stimulation with a parallel acquisition technique in patients with salivary gland diseases. Materials and Methods: 128 patients with salivary gland disease were prospectively examined using a 1.5-T superconducting system with a 30 mT/m maximum gradient capability and a maximum slew rate of 125 mT/m/sec. A single-shot turbo-spin-echo sequence (ss-TSE) with an acquisition time of 2.8 sec was used in transverse and oblique sagittal orientation. All images were obtained with and without a parallel imaging technique. The evaluation of the ductal system of the parotid and submandibular gland was performed using a visual scale of 1-5 for each side. The images were assessed by two independent experienced radiologists. An ANOVA with posthoc comparisons and an overall two tailed significance level of p=0.05 was used for the statistical evaluation. An intraclass correlation was computed to evaluate interobserver variability and a correlation of >0.8 was determined, thereby indicating a high correlation. Results: Depending on the diagnosed diseases and the absence of abruption of the ducts, all parts of excretory ducts were able to be visualized in all patients using the developed technique with an overall rating for all ducts of 2.70 (SD±0.89). A high correlation was achieved between the two observers with an intraclass correlation of 0.73. Oral application of a sialogogum improved the visibility of excretory ducts significantly (p<0.001). In contrast, the use of a parallel imaging technique led to a significant decrease in image quality (p=0,011). (orig.)

  12. Case Report: Exome sequencing reveals recurrent RETSAT mutations and a loss-of-function POLDIP2 mutation in a rare undifferentiated tongue sarcoma [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Jason Y. K. Chan

    2018-04-01

    Full Text Available Soft tissue sarcoma of the tongue represents a very rare head and neck cancer with connective tissue features, and the genetics underlying this rare cancer are largely unknown. There are less than 20 cases reported in the literature thus far. Here, we reported the first whole-exome characterization (>×200 depth of an undifferentiated sarcoma of the tongue in a 31-year-old male. Even with a very good sequencing depth, only 19 nonsynonymous mutations were found, indicating a relatively low mutation rate of this rare cancer (lower than that of human papillomavirus (HPV-positive head and neck cancer. Yet, among the few genes that are somatically mutated in this HPV-negative undifferentiated tongue sarcoma, a noticeable deleterious frameshift mutation (with a very high allele frequency of >93% of a gene for DNA replication and repair, namely POLDIP2 (DNA polymerase delta interacting protein 2, and two recurrent mutations of the adipogenesis and adipocyte differentiation gene RETSAT (retinol saturase, were identified. Thus, somatic events likely affecting adipogenesis and differentiation, as well as potential stem mutations to POLDIP2, may be implicated in the formation of this rare cancer. This identified somatic whole-exome sequencing profile appears to be distinct from that of other reported adult sarcomas from The Cancer Genome Atlas, suggesting a potential unique genetic profile for this rare sarcoma of the tongue. Interestingly, this low somatic mutation rate is unexpectedly found to be accompanied by multiple tumor protein p53 and NOTCH1 germline mutations of the patient’s blood DNA. This may explain the very early age of onset of head and neck cancer, with likely hereditary predisposition. Our findings are, to our knowledge, the first to reveal a unique genetic profile of this very rare undifferentiated sarcoma of the tongue.

  13. LPTAU, Quasi Random Sequence Generator

    International Nuclear Information System (INIS)

    Sobol, Ilya M.

    1993-01-01

    1 - Description of program or function: LPTAU generates quasi random sequences. These are uniformly distributed sets of L=M N points in the N-dimensional unit cube: I N =[0,1]x...x[0,1]. These sequences are used as nodes for multidimensional integration; as searching points in global optimization; as trial points in multi-criteria decision making; as quasi-random points for quasi Monte Carlo algorithms. 2 - Method of solution: Uses LP-TAU sequence generation (see references). 3 - Restrictions on the complexity of the problem: The number of points that can be generated is L 30 . The dimension of the space cannot exceed 51

  14. Perfect sequences over the real quaternions

    OpenAIRE

    Kuznetsov, Oleg

    2017-01-01

    In this Thesis, perfect sequences over the real quaternions are first considered. Definitions for the right and left periodic autocorrelation functions are given, and right and left perfect sequences introduced. It is shown that the right (left) perfection of any sequence implies the left (right) perfection, so concepts of right and left perfect sequences over the real quaternions are equivalent. Unitary transformations of the quaternion space ℍ are then considered. Using the equivalence of t...

  15. deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data.

    Science.gov (United States)

    Zheng, Ling-Ling; Li, Jun-Hao; Wu, Jie; Sun, Wen-Ju; Liu, Shun; Wang, Ze-Lin; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2016-01-04

    Small non-coding RNAs (e.g. miRNAs) and long non-coding RNAs (e.g. lincRNAs and circRNAs) are emerging as key regulators of various cellular processes. However, only a very small fraction of these enigmatic RNAs have been well functionally characterized. In this study, we describe deepBase v2.0 (http://biocenter.sysu.edu.cn/deepBase/), an updated platform, to decode evolution, expression patterns and functions of diverse ncRNAs across 19 species. deepBase v2.0 has been updated to provide the most comprehensive collection of ncRNA-derived small RNAs generated from 588 sRNA-Seq datasets. Moreover, we developed a pipeline named lncSeeker to identify 176 680 high-confidence lncRNAs from 14 species. Temporal and spatial expression patterns of various ncRNAs were profiled. We identified approximately 24 280 primate-specific, 5193 rodent-specific lncRNAs, and 55 highly conserved lncRNA orthologs between human and zebrafish. We annotated 14 867 human circRNAs, 1260 of which are orthologous to mouse circRNAs. By combining expression profiles and functional genomic annotations, we developed lncFunction web-server to predict the function of lncRNAs based on protein-lncRNA co-expression networks. This study is expected to provide considerable resources to facilitate future experimental studies and to uncover ncRNA functions. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Functional analysis of sequences adjacent to dapE of Corynebacterium glutamicum reveals the presence of aroP, which encodes the aromatic amino acid transporter.

    Science.gov (United States)

    Wehrmann, A; Morakkabati, S; Krämer, R; Sahm, H; Eggeling, L

    1995-10-01

    An initially nonclonable DNA locus close to a gene of L-lysine biosynthesis in Corynebacterium glutamicum was analyzed in detail. Its stepwise cloning and its functional identification by monitoring the amino acid uptakes of defined mutants, together with mechanistic studies, identified the corresponding structure as aroP, the general aromatic amino acid uptake system.

  17. Deep sequencing of RNA from immune cell-derived vesicles uncovers the selective incorporation of small non-coding RNA biotypes with potential regulatory functions.

    NARCIS (Netherlands)

    Nolte-'t Hoen, E.N.M.; Buermans, H.P.; Waasdorp, M.; Stoorvogel, W.; Wauben, M.H.M.; `t Hoen, P.A.C.

    2012-01-01

    Cells release RNA-carrying vesicles and membrane-free RNA/protein complexes into the extracellular milieu. Horizontal vesicle-mediated transfer of such shuttle RNA between cells allows dissemination of genetically encoded messages, which may modify the function of target cells. Other studies used

  18. Microbial Population Dynamics and Ecosystem Functions of Anoxic/Aerobic Granular Sludge in Sequencing Batch Reactors Operated at Different Organic Loading Rates

    Directory of Open Access Journals (Sweden)

    Enikö Szabó

    2017-05-01

    Full Text Available The granular sludge process is an effective, low-footprint alternative to conventional activated sludge wastewater treatment. The architecture of the microbial granules allows the co-existence of different functional groups, e.g., nitrifying and denitrifying communities, which permits compact reactor design. However, little is known about the factors influencing community assembly in granular sludge, such as the effects of reactor operation strategies and influent wastewater composition. Here, we analyze the development of the microbiomes in parallel laboratory-scale anoxic/aerobic granular sludge reactors operated at low (0.9 kg m-3d-1, moderate (1.9 kg m-3d-1 and high (3.7 kg m-3d-1 organic loading rates (OLRs and the same ammonium loading rate (0.2 kg NH4-N m-3d-1 for 84 days. Complete removal of organic carbon and ammonium was achieved in all three reactors after start-up, while the nitrogen removal (denitrification efficiency increased with the OLR: 0% at low, 38% at moderate, and 66% at high loading rate. The bacterial communities at different loading rates diverged rapidly after start-up and showed less than 50% similarity after 6 days, and below 40% similarity after 84 days. The three reactor microbiomes were dominated by different genera (mainly Meganema, Thauera, Paracoccus, and Zoogloea, but these genera have similar ecosystem functions of EPS production, denitrification and polyhydroxyalkanoate (PHA storage. Many less abundant but persistent taxa were also detected within these functional groups. The bacterial communities were functionally redundant irrespective of the loading rate applied. At steady-state reactor operation, the identity of the core community members was rather stable, but their relative abundances changed considerably over time. Furthermore, nitrifying bacteria were low in relative abundance and diversity in all reactors, despite their large contribution to nitrogen turnover. The results suggest that the OLR has

  19. Structural and functional characterization of the exonuclease I (sbcB) gene and gene product from Escherichia coli and a Markov chain analysis of DNA sequences

    International Nuclear Information System (INIS)

    Phillips, G.J.

    1987-01-01

    The nucleotide sequence for the structural gene for exonuclease I (sbcB) from Escherichia coli was determined. Two putative promotes for this gene were identified and were predicted to have weak transcription initiation activity. In addition, the sbcB coding region contains many non-optimal codons. These observations are consistent with the suggestions that sbcB is a poorly expressed gene. Several mutant exonuclease I genes were cloned onto pBR322 plasmids. These genes represented both sbcB and xonA mutation. One of the xonA mutation (xonA6) was associated with a 1.2-kb insertion of an IS-30 related mobile genetic element in the 3'-region of the gene. Two of the mutations (xonA2 and xonA6) encode unstable polypeptides. Determination of exonucleolytic activity on single-stranded DNA from cell extracts containing each of the cloned mutant genes revealed no correlation between residual exonucleolytic activity and the pheno-types of sbcB and xonA mutants. A proposal that the exonuclease I protein contains an additional activity besides its ability to degrade single-stranded DNA is presented. Characterization of E. coli strains which overproduce exonuclease I showed increased sensitivity to UV irradiation

  20. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias

    DEFF Research Database (Denmark)

    Kjær, Jonas; Belsham, Graham J.

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long) which induces a non-proteolytic, co-translational, "cleavage" at its own C......-terminus. A conserved feature among variants of 2A is the C-terminal motif N16P17G18/P19 where P19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E14, S15 and N16 within the 2A sequence of infectious FMDVs but no variants at residues P17, G18...... or P19 have been identified. In this study, using highly degenerate primers, we analysed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after 2, 3 or 4 passages. However...

  1. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  2. Functional Analyses of a Novel Splice Variant in the CHD7 Gene, Found by Next Generation Sequencing, Confirm Its Pathogenicity in a Spanish Patient and Diagnose Him with CHARGE Syndrome

    Directory of Open Access Journals (Sweden)

    Olatz Villate

    2018-01-01

    Full Text Available Mutations in CHD7 have been shown to be a major cause of CHARGE syndrome, which presents many symptoms and features common to other syndromes making its diagnosis difficult. Next generation sequencing (NGS of a panel of intellectual disability related genes was performed in an adult patient without molecular diagnosis. A splice donor variant in CHD7 (c.5665 + 1G > T was identified. To study its potential pathogenicity, exons and flanking intronic sequences were amplified from patient DNA and cloned into the pSAD® splicing vector. HeLa cells were transfected with this construct and a wild-type minigene and functional analysis were performed. The construct with the c.5665 + 1G > T variant produced an aberrant transcript with an insert of 63 nucleotides of intron 28 creating a premature termination codon (TAG 25 nucleotides downstream. This would lead to the insertion of 8 new amino acids and therefore a truncated 1896 amino acid protein. As a result of this, the patient was diagnosed with CHARGE syndrome. Functional analyses underline their usefulness for studying the pathogenicity of variants found by NGS and therefore its application to accurately diagnose patients.

  3. Functional Analyses of a Novel Splice Variant in the CHD7 Gene, Found by Next Generation Sequencing, Confirm Its Pathogenicity in a Spanish Patient and Diagnose Him with CHARGE Syndrome.

    Science.gov (United States)

    Villate, Olatz; Ibarluzea, Nekane; Fraile-Bethencourt, Eugenia; Valenzuela, Alberto; Velasco, Eladio A; Grozeva, Detelina; Raymond, F L; Botella, María P; Tejada, María-Isabel

    2018-01-01

    Mutations in CHD7 have been shown to be a major cause of CHARGE syndrome, which presents many symptoms and features common to other syndromes making its diagnosis difficult. Next generation sequencing (NGS) of a panel of intellectual disability related genes was performed in an adult patient without molecular diagnosis. A splice donor variant in CHD7 (c.5665 + 1G > T) was identified. To study its potential pathogenicity, exons and flanking intronic sequences were amplified from patient DNA and cloned into the pSAD ® splicing vector. HeLa cells were transfected with this construct and a wild-type minigene and functional analysis were performed. The construct with the c.5665 + 1G > T variant produced an aberrant transcript with an insert of 63 nucleotides of intron 28 creating a premature termination codon (TAG) 25 nucleotides downstream. This would lead to the insertion of 8 new amino acids and therefore a truncated 1896 amino acid protein. As a result of this, the patient was diagnosed with CHARGE syndrome. Functional analyses underline their usefulness for studying the pathogenicity of variants found by NGS and therefore its application to accurately diagnose patients.

  4. Functional diversification upon leader protease domain duplication in the Citrus tristeza virus genome: Role of RNA sequences and the encoded proteins.

    Science.gov (United States)

    Kang, Sung-Hwan; Atallah, Osama O; Sun, Yong-Duo; Folimonova, Svetlana Y

    2018-01-15

    Viruses from the family Closteroviridae show an example of intra-genome duplications of more than one gene. In addition to the hallmark coat protein gene duplication, several members possess a tandem duplication of papain-like leader proteases. In this study, we demonstrate that domains encoding the L1 and L2 proteases in the Citrus tristeza virus genome underwent a significant functional divergence at the RNA and protein levels. We show that the L1 protease is crucial for viral accumulation and establishment of initial infection, whereas its coding region is vital for virus transport. On the other hand, the second protease is indispensable for virus infection of its natural citrus host, suggesting that L2 has evolved an important adaptive function that mediates virus interaction with the woody host. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Multimodal sequence learning.

    Science.gov (United States)

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  7. T2{sup *} mapping from multi-echo dixon sequence on gadoxetic acid-enhanced magnetic resonance imaging for the hepatic fat quantification: Can it be used for hepatic function assessment?

    Energy Technology Data Exchange (ETDEWEB)

    Yoo, Hyun Suk; Lee, Jeong Min; Yoon, Jeong Hee; Kang, Hyo Jin; Lee, Sang Min; Yang, Hyun Kyung; Han, Joon Koo [Dept. of Radiology, Seoul National University Hospital, Seoul (Korea, Republic of)

    2017-08-01

    To evaluate the diagnostic value of T2{sup *} mapping using 3D multi-echo Dixon gradient echo acquisition on gadoxetic acid-enhanced liver magnetic resonance imaging (MRI) as a tool to evaluate hepatic function. This retrospective study was approved by the IRB and the requirement of informed consent was waived. 242 patients who underwent liver MRIs, including 3D multi-echo Dixon fast gradient-recalled echo (GRE) sequence at 3T, before and after administration of gadoxetic acid, were included. Based on clinico-laboratory manifestation, the patients were classified as having normal liver function (NLF, n = 50), mild liver damage (MLD, n = 143), or severe liver damage (SLD, n = 30). The 3D multi-echo Dixon GRE sequence was obtained before, and 10 minutes after, gadoxetic acid administration. Pre- and post-contrast T2{sup *} values, as well as T2{sup *} reduction rates, were measured from T2{sup *} maps, and compared among the three groups. There was a significant difference in T2{sup *} reduction rates between the NLF and SLD groups (−0.2 ± 4.9% vs. 5.0 ± 6.9%, p = 0.002), and between the MLD and SLD groups (3.2 ± 6.0% vs. 5.0 ± 6.9%, p = 0.003). However, there was no significant difference in both the pre- and post-contrast T2{sup *} values among different liver function groups (p = 0.735 and 0.131, respectively). A receiver operating characteristic (ROC) curve analysis showed that the area under the ROC curve for using T2{sup *} reduction rates to differentiate the SLD group from the NLF group was 0.74 (95% confidence interval: 0.63–0.83). Incorporation of T2{sup *} mapping using 3D multi-echo Dixon GRE sequence in gadoxetic acid-enhanced liver MRI protocol may provide supplemental information for liver function deterioration in patients with SLD.

  8. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  9. Analysis and functional characterization of sequence variations in ligand binding domain of thyroid hormone receptors in autism spectrum disorder (ASD) patients.

    Science.gov (United States)

    Kalikiri, Mahesh Kumar; Mamidala, Madhu Poornima; Rao, Ananth N; Rajesh, Vidya

    2017-12-01

    Autism spectrum disorder (ASD) is a neuro developmental disorder, reported to be on a rise in the past two decades. Thyroid hormone-T3 plays an important role in early embryonic and central nervous system development. T3 mediates its function by binding to thyroid hormone receptors, TRα and TRβ. Alterations in T3 levels and thyroid receptor mutations have been earlier implicated in neuropsychiatric disorders and have been linked to environmental toxins. Limited reports from earlier studies have shown the effectiveness of T3 treatment with promising results in children with ASD and that the thyroid hormone levels in these children was also normal. This necessitates the need to explore the genetic variations in the components of the thyroid hormone pathway in ASD children. To achieve this objective, we performed genetic analysis of ligand binding domain of THRA and THRB receptor genes in 30 ASD subjects and in age matched controls from India. Our study for the first time reports novel single nucleotide polymorphisms in the THRA and THRB receptor genes of ASD individuals. Autism Res 2017, 10: 1919-1928. ©2017 International Society for Autism Research, Wiley Periodicals, Inc. Thyroid hormone (T3) and thyroid receptors (TRα and TRβ) are the major components of the thyroid hormone pathway. The link between thyroid pathway and neuronal development is proven in clinical medicine. Since the thyroid hormone levels in Autistic children are normal, variations in their receptors needs to be explored. To achieve this objective, changes in THRA and THRB receptor genes was studied in 30 ASD and normal children from India. The impact of some of these mutations on receptor function was also studied. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.

  10. Sequencing and characterization of mixed function monooxygenase genes CYP1A1 and CYP1A2 of Mink (Mustela vison) to facilitate study of dioxin-like compounds

    International Nuclear Information System (INIS)

    Zhang Xiaowei; Moore, Jeremy N.; Newsted, John L.; Hecker, Markus; Zwiernik, Matthew J.; Jones, Paul D.; Bursian, Steven J.

    2009-01-01

    As part of an ongoing effort to understand aryl hydrocarbon receptor (AhR) mediated toxicity in mink, cDNAs encoding for CYP1A1 and the CYP1A2 mixed function monooxygenases were cloned and characterized. In addition, the effects of selected dibenzofurans on the expression of these genes and the presence of their respective proteins (P4501A) were investigated, and then correlated with the catalytic activities of these proteins as measured by ethoxyresorufin O-deethylase (EROD) and methoxyresorufin O-deethylase (MROD) activities. The predicted protein sequences for CYP1A1 and CYP1A2 comprise 517 and 512 amino acid residues, respectively. The phylogenetic analysis of the mink CYP1As with protein sequences of other mammals revealed high sequence homology with sea otter, seals and the dog, with amino acid identities ranging from 89 to 95% for CYP1A1 and 81 to 93% for CYP1A2. Since exposure to both 2,3,7,8-Tetrachlorodibenzofuran (TCDF) and 2,3,4,7,8-Pentachlorodibenzofuran (PeCDF) resulted in dose-dependent increases of CYP1A1 mRNA, CYP1A2 mRNA and CYP1A protein levels an underlying AhR-mediated mechanism is suggested. The up-regulation of CYP1A mRNA in liver was more consistent to the sum adipose TEQ concentration than to the liver TEQ concentration in minks treated with TCDF or PeCDF. The result suggested that the hepatic-sequestered fraction of PeCDF was biologically inactive to the induction of CYP1A1 and CYP1A2

  11. Non-inductive components of electromagnetic signals associated with L'Aquila earthquake sequences estimated by means of inter-station impulse response functions

    Directory of Open Access Journals (Sweden)

    C. Di Lorenzo

    2011-04-01

    Full Text Available On 6 April 2009 at 01:32:39 UT a strong earthquake occurred west of L'Aquila at the very shallow depth of 9 km. The main shock local magnitude was Ml = 5.8 (Mw = 6.3. Several powerful aftershocks occurred the following days. The epicentre of the main shock occurred 6 km away from the Geomagnetic Observatory of L'Aquila, on a fault 15 km long having a NW-SE strike, about 140°, and a SW dip of about 42°. For this reason, L'Aquila seismic events offered very favourable conditions to detect possible electromagnetic emissions related to the earthquake. The data used in this work come from the permanent geomagnetic Observatories of L'Aquila and Duronia. Here the results concerning the analysis of the residual magnetic field estimated by means of the inter-station impulse response functions in the frequency band from 0.3 Hz to 3 Hz are shown.

  12. The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction.

    Science.gov (United States)

    Li, Hongjian; Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J

    2018-03-14

    It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.

  13. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  14. The C-terminal tail of the gp41 transmembrane envelope glycoprotein of HIV-1 clades A, B, C, and D may exist in two conformations: an analysis of sequence, structure, and function

    International Nuclear Information System (INIS)

    Hollier, Mark J.; Dimmock, Nigel J.

    2005-01-01

    In addition to the major ectodomain, the gp41 transmembrane glycoprotein of HIV-1 is now known to have a minor ectodomain that is part of the long C-terminal tail. Both ectodomains are highly antigenic, carry neutralizing and non-neutralizing epitopes, and are involved in virus-mediated fusion activity. However, data have so far been biologically based, and derived solely from T cell line-adapted (TCLA), B clade viruses. Here we have carried out sequence and theoretically based structural analyses of 357 gp41 C-terminal sequences of mainly primary isolates of HIV-1 clades A, B, C, and D. Data show that all these viruses have the potential to form a tail loop structure (the minor ectodomain) supported by three, β-sheet, membrane-spanning domains (MSDs). This means that the first (N-terminal) tyrosine-based sorting signal of the gp41 tail is situated outside the cell membrane and is non-functional, and that gp41 that reaches the cell surface may be recycled back into the cytoplasm through the activity of the second tyrosine-sorting signal. However, we suggest that only a minority of cell-associated gp41 molecules - those destined for incorporation into virions - has 3 MSDs and the minor ectodomain. Most intracellular gp41 has the conventional single MSD, no minor ectodomain, a functional first tyrosine-based sorting signal, and in line with current thinking is degraded intracellularly. The gp41 structural diversity suggested here can be viewed as an evolutionary strategy to minimize HIV-1 envelope glycoprotein expression on the cell surface, and hence possible cytotoxicity and immune attack on the infected cell

  15. Nonparametric combinatorial sequence models.

    Science.gov (United States)

    Wauthier, Fabian L; Jordan, Michael I; Jojic, Nebojsa

    2011-11-01

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This article presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three biological sequence families which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution over sequence representations induced by the prior. By integrating out the posterior, our method compares favorably to leading binding predictors.

  16. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  17. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  18. Copper-zinc-superoxide dismutase (CuZnSOD), an antioxidant gene from seahorse (Hippocampus abdominalis); molecular cloning, sequence characterization, antioxidant activity and potential peroxidation function of its recombinant protein.

    Science.gov (United States)

    Perera, N C N; Godahewa, G I; Lee, Jehee

    2016-10-01

    Copper-zinc-superoxide dismutase (CuZnSOD) from Hippocampus abdominalis (HaCuZnSOD) is a metalloenzyme which belongs to the ubiquitous family of SODs. Here, we determined the characteristic structural features of HaCuZnSOD, analyzed its evolutionary relationships, and identified its potential immune responses and biological functions in relation to antioxidant defense mechanisms in the seahorse. The gene had a 5' untranslated region (UTR) of 67 bp, a coding sequence of 465 bp and a 3' UTR of 313 bp. The putative peptide consists of 154 amino acids. HaCuZnSOD had a predicted molecular mass of 15.94 kDa and a theoretical pI value of 5.73, which is favorable for copper binding activity. In silico analysis revealed that HaCuZnSOD had a prominent Cu-Zn_superoxide_dismutase domain, two Cu/Zn signature sequences, a putative N-glycosylation site, and several active sites including Cu(2+) and Zn(2+) binding sites. The three dimensional structure indicated a β-sheet barrel with 8 β-sheets and two short α-helical regions. Multiple alignment analyses revealed many conserved regions and active sites among its orthologs. The highest amino acid identity to HaCuZnSOD was found in Siniperca chuatsi (87.4%), while Maylandia zebra shared a close relationship in the phylogenetic analysis. Functional assays were performed to assess the antioxidant, biophysical and biochemical properties of overexpressed recombinant (r) HaCuZnSOD. A xanthine/XOD assay gave optimum results at pH 9 and 25 °C indicating these may be the best conditions for its antioxidant action in the seahorse. An MTT assay and flow cytometry confirmed that rHaCuZnSOD showed peroxidase activity in the presence of HCO3(-). In all the functional assays, the level of antioxidant activity of rHaCuZnSOD was concentration dependent; metal ion supplementation also increased its activity. The highest mRNA expressional level of HaCuZnSOD was found in blood. Temporal assessment under pathological stress showed a delay

  19. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  20. Long sequence correlation coprocessor

    Science.gov (United States)

    Gage, Douglas W.

    1994-09-01

    A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.

  1. Anomaly Detection in Sequences

    Data.gov (United States)

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  2. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  3. sequenceMiner algorithm

    Data.gov (United States)

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  4. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  5. Bunches of random cross-correlated sequences

    International Nuclear Information System (INIS)

    Maystrenko, A A; Melnik, S S; Pritula, G M; Usatenko, O V

    2013-01-01

    The statistical properties of random cross-correlated sequences constructed by the convolution method (likewise referred to as the Rice or the inverse Fourier transformation) are examined. We clarify the meaning of the filtering function—the kernel of the convolution operator—and show that it is the value of the cross-correlation function which describes correlations between the initial white noise and constructed correlated sequences. The matrix generalization of this method for constructing a bunch of N cross-correlated sequences is presented. Algorithms for their generation are reduced to solving the problem of decomposition of the Fourier transform of the correlation matrix into a product of two mutually conjugate matrices. Different decompositions are considered. The limits of weak and strong correlations for the one-point probability and pair correlation functions of sequences generated by the method under consideration are studied. Special cases of heavy-tailed distributions of the generated sequences are analyzed. We show that, if the filtering function is rather smooth, the distribution function of generated variables has the Gaussian or Lévy form depending on the analytical properties of the distribution (or characteristic) functions of the initial white noise. Anisotropic properties of statistically homogeneous random sequences related to the asymmetry of a filtering function are revealed and studied. These asymmetry properties are expressed in terms of the third- or fourth-order correlation functions. Several examples of the construction of correlated chains with a predefined correlation matrix are given. (paper)

  6. Structural and functional analysis of an enhancer GPEI having a phorbol 12-O-tetradecanoate 13-acetate responsive element-like sequence found in the rat glutathione transferase P gene.

    Science.gov (United States)

    Okuda, A; Imagawa, M; Maeda, Y; Sakai, M; Muramatsu, M

    1989-10-05

    We have recently identified a typical enhancer, termed GPEI, located about 2.5 kilobases upstream from the transcription initiation site of the rat glutathione transferase P gene. Analyses of 5' and 3' deletion mutants revealed that the cis-acting sequence of GPEI contained the phorbol 12-O-tetradecanoate 13-acetate responsive element (TRE)-like sequence in it. For the maximal activity, however, GPEI required an adjacent upstream sequence of about 19 base pairs in addition to the TRE-like sequence. With the DNA binding gel-shift assay, we could detect protein(s) that specifically binds to the TRE-like sequence of GPEI fragment, which was possibly c-jun.c-fos complex or a similar protein complex. The sequence immediately upstream of the TRE-like sequence did not have any activity by itself, but augmented the latter activity by about 5-fold.

  7. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  8. Genome Sequence of Lactobacillus plantarum Strain UCMA 3037

    OpenAIRE

    Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias; Vernoux, Jean-Paul

    2013-01-01

    Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem.

  9. Genome Sequence of Lactobacillus plantarum Strain UCMA 3037.

    Science.gov (United States)

    Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias; Vernoux, Jean-Paul

    2013-05-23

    Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem.

  10. Optimization of a sequence of reactors

    DEFF Research Database (Denmark)

    Vidal, Rene Victor Valqui

    1991-01-01

    Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach...

  11. Polynomial sequences generated by infinite Hessenberg matrices

    Directory of Open Access Journals (Sweden)

    Verde-Star Luis

    2017-01-01

    Full Text Available We show that an infinite lower Hessenberg matrix generates polynomial sequences that correspond to the rows of infinite lower triangular invertible matrices. Orthogonal polynomial sequences are obtained when the Hessenberg matrix is tridiagonal. We study properties of the polynomial sequences and their corresponding matrices which are related to recurrence relations, companion matrices, matrix similarity, construction algorithms, and generating functions. When the Hessenberg matrix is also Toeplitz the polynomial sequences turn out to be of interpolatory type and we obtain additional results. For example, we show that every nonderogative finite square matrix is similar to a unique Toeplitz-Hessenberg matrix.

  12. Functional and RNA-sequencing analysis revealed expression of a novel stay-green gene from Zoysia japonica (ZjSGR caused chlorophyll degradation and accelerated senescence in Arabidopsis

    Directory of Open Access Journals (Sweden)

    Ke Teng

    2016-12-01

    Full Text Available Senescence is not only an important developmental process, but also a responsive regulation to abiotic and biotic stress for plants. Stay-green protein plays crucial roles in plant senescence and chlorophyll degradation. However, the underlying mechanisms were not well studied, particularly in non-model plants. In this study, a novel stay-green gene, ZjSGR, was isolated from Zoysia japonica. Subcellular localization result demonstrated that ZjSGR was localized in the chloroplasts. Quantitative real-time PCR results together with promoter activity determination using transgenic Arabidopsis confirmed that ZjSGR could be induced by darkness, ABA and MeJA. Its expression levels could also be up-regulated by natural senescence, but suppressed by SA treatments. Overexpression of ZjSGR in Arabidopsis resulted in a rapid yellowing phenotype; complementary experiments proved that ZjSGR was a functional homologue of AtNYE1 from Arabidopsis thaliana. Overexpression of ZjSGR accelerated chlorophyll degradation and impaired photosynthesis in Arabidopsis. Transmission electron microscopy observation revealed that overexpression of ZjSGR decomposed the chloroplasts structure. RNA sequencing analysis showed that ZjSGR could play multiple roles in senescence and chlorophyll degradation by regulating hormone signal transduction and the expression of a large number of senescence and environmental stress related genes. Our study provides a better understanding of the roles of SGRs, and new insight into the senescence and chlorophyll degradation mechanisms in plants.

  13. Sequencing Information Management System (SIMS). Final report

    Energy Technology Data Exchange (ETDEWEB)

    Fields, C.

    1996-02-15

    A feasibility study to develop a requirements analysis and functional specification for a data management system for large-scale DNA sequencing laboratories resulted in a functional specification for a Sequencing Information Management System (SIMS). This document reports the results of this feasibility study, and includes a functional specification for a SIMS relational schema. The SIMS is an integrated information management system that supports data acquisition, management, analysis, and distribution for DNA sequencing laboratories. The SIMS provides ad hoc query access to information on the sequencing process and its results, and partially automates the transfer of data between laboratory instruments, analysis programs, technical personnel, and managers. The SIMS user interfaces are designed for use by laboratory technicians, laboratory managers, and scientists. The SIMS is designed to run in a heterogeneous, multiplatform environment in a client/server mode. The SIMS communicates with external computational and data resources via the internet.

  14. Sequences for Student Investigation

    Science.gov (United States)

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  15. Sequence History Update Tool

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  16. Building the sequence map of the human pan-genome

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Zheng, Hancheng

    2010-01-01

    analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain approximately 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing...

  17. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  18. SVX Sequencer Board

    International Nuclear Information System (INIS)

    Utes, M.

    1997-01-01

    The SVX Sequencer boards are 9U by 280mm circuit boards that reside in slots 2 through 21 of each of eight Eurocard crates in the D0 Detector Platform. The basic purpose is to control the SVX chips for data acquisition and when a trigger occurs, to gather the SVX data and relay the data to the VRB boards in the Movable Counting House. Functions and features are as follows: (1) Initialization of eight SVX chip strings using the MIL-STD-1553 data bus; (2) Real time manipulation of the SVX control lines to effect data acquisition, digitization, and readout based on the NRZ/Clock signals from the Controller; (3) Conversion of 8-bit electrical SVX readout data to an optical signal operating at 1.062 Gbit/sec, sent to the VRB. Eight HDIs will be serviced per board; (4) Built-in logic analyzer which can record the most important control and data lines during a data acquisition cycle and put this recorded information onto the 1553 bus; (5) Identification header and end of data trailer tacked onto data stream; (6) 1553 register which can read the current values of the control and data lines; (7) 1553 register which can test the optical link; (8) 1553 registers for crossing pulse width, calibration pulse voltage, and calibration pipeline select; (9) 1553 register for reading the optical drivers status link; (10) 1553 register for power control of SVX chips and ignoring bad SVX strings; (11) Front panel displays and LEDs show the board status at a glance; (12) In-system programmable EPLDs are programmed via 1553 or Altera's 'Bitblaster'; (13) Automatic readout abort after 45us; (14) Supplies BUSY signal back to Trigger Framework; (15) Supports a heartbeat system to prevent excessive SVX current draw; and (16) Supports a SVX power trip feature if heartbeat failure occurs.

  19. Amino-terminal domain of the v-fms oncogene product includes a functional signal peptide that directs synthesis of a transforming glycoprotein in the absence of feline leukemia virus gag sequences

    International Nuclear Information System (INIS)

    Wheeler, E.F.; Roussel, M.F.; Hampe, A.; Walker, M.H.; Fried, V.A.; Look, A.T.; Rettenmier, C.W.; Sherr, C.J.

    1986-01-01

    The nucleotide sequence of a 5' segment of the human genomic c-fms proto-oncogene suggested that recombination between feline leukemia virus and feline c-fms sequences might have occurred in a region encoding the 5' untranslated portion of c-fms mRNA. The polyprotein precursor gP180/sup gag-fms/ encoded by the McDonough strain of feline sarcoma virus was therefore predicted to contain 34 v-fms-coded amino acids derived from sequences of the c-fms gene that are not ordinarily translated from the proto-oncogene mRNA. The (gP180/sup gag-fms/) polyprotein was cotranslationally cleaved near the gag-fms junction to remove its gag gene-coded portion. Determination of the amino-terminal sequence of the resulting v-fms-coded glycoprotein, gp120/sup v-fms/, showed that the site of proteolysis corresponded to a predicted signal peptidase cleavage site within the c-fms gene product. Together, these analyses suggested that the linked gag sequences may not be necessary for expression of a biologically active v-fms gene product. The gag-fms sequences of feline sarcoma virus strain McDonough and the v-fms sequences alone were inserted into a murine retroviral vector containing a neomycin resistance gene. The authors conclude that a cryptic hydrophobic signal peptide sequence in v-fms was unmasked by gag deletion, thereby allowing the correct orientation and transport of the v-fms was unmasked by gag deletion, thereby allowing the correct orientation and transport of the v-fms gene product within membranous organelles. It seems likely that the proteolytic cleavage of gP180/gag-fms/ is mediated by signal peptidase and that the amino termini of gp140/sup v-fms/ and the c-fms gene product are identical

  20. Amino-terminal domain of the v-fms oncogene product includes a functional signal peptide that directs synthesis of a transforming glycoprotein in the absence of feline leukemia virus gag sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wheeler, E.F.; Roussel, M.F.; Hampe, A.; Walker, M.H.; Fried, V.A.; Look, A.T.; Rettenmier, C.W.; Sherr, C.J.

    1986-08-01

    The nucleotide sequence of a 5' segment of the human genomic c-fms proto-oncogene suggested that recombination between feline leukemia virus and feline c-fms sequences might have occurred in a region encoding the 5' untranslated portion of c-fms mRNA. The polyprotein precursor gP180/sup gag-fms/ encoded by the McDonough strain of feline sarcoma virus was therefore predicted to contain 34 v-fms-coded amino acids derived from sequences of the c-fms gene that are not ordinarily translated from the proto-oncogene mRNA. The (gP180/sup gag-fms/) polyprotein was cotranslationally cleaved near the gag-fms junction to remove its gag gene-coded portion. Determination of the amino-terminal sequence of the resulting v-fms-coded glycoprotein, gp120/sup v-fms/, showed that the site of proteolysis corresponded to a predicted signal peptidase cleavage site within the c-fms gene product. Together, these analyses suggested that the linked gag sequences may not be necessary for expression of a biologically active v-fms gene product. The gag-fms sequences of feline sarcoma virus strain McDonough and the v-fms sequences alone were inserted into a murine retroviral vector containing a neomycin resistance gene. The authors conclude that a cryptic hydrophobic signal peptide sequence in v-fms was unmasked by gag deletion, thereby allowing the correct orientation and transport of the v-fms was unmasked by gag deletion, thereby allowing the correct orientation and transport of the v-fms gene product within membranous organelles. It seems likely that the proteolytic cleavage of gP180/gag-fms/ is mediated by signal peptidase and that the amino termini of gp140/sup v-fms/ and the c-fms gene product are identical.

  1. The Colliding Beams Sequencer

    International Nuclear Information System (INIS)

    Johnson, D.E.; Johnson, R.P.

    1989-01-01

    The Colliding Beam Sequencer (CBS) is a computer program used to operate the pbar-p Collider by synchronizing the applications programs and simulating the activities of the accelerator operators during filling and storage. The Sequencer acts as a meta-program, running otherwise stand alone applications programs, to do the set-up, beam transfers, acceleration, low beta turn on, and diagnostics for the transfers and storage. The Sequencer and its operational performance will be described along with its special features which include a periodic scheduler and command logger. 14 refs., 3 figs

  2. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  3. A comparative evaluation of sequence classification programs

    Directory of Open Access Journals (Sweden)

    Bazinet Adam L

    2012-05-01

    Full Text Available Abstract Background A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics. Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. Results We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.

  4. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  5. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  6. Dynamic Sequence Assignment.

    Science.gov (United States)

    1983-12-01

    D-136 548 DYNAMIIC SEQUENCE ASSIGNMENT(U) ADVANCED INFORMATION AND 1/2 DECISION SYSTEMS MOUNTAIN YIELW CA C A 0 REILLY ET AL. UNCLSSIIED DEC 83 AI/DS...I ADVANCED INFORMATION & DECISION SYSTEMS Mountain View. CA 94040 84 u ,53 V,..’. Unclassified _____ SCURITY CLASSIFICATION OF THIS PAGE REPORT...reviews some important heuristic algorithms developed for fas- ter solution of the sequence assignment problem. 3.1. DINAMIC MOGRAMUNIG FORMULATION FOR

  7. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  8. General LTE Sequence

    OpenAIRE

    Billal, Masum

    2015-01-01

    In this paper,we have characterized sequences which maintain the same property described in Lifting the Exponent Lemma. Lifting the Exponent Lemma is a very powerful tool in olympiad number theory and recently it has become very popular. We generalize it to all sequences that maintain a property like it i.e. if p^{\\alpha}||a_k and p^\\b{eta}||n, then p^{{\\alpha}+\\b{eta}}||a_{nk}.

  9. Pairwise Sequence Alignment Library

    Energy Technology Data Exchange (ETDEWEB)

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  10. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  11. Interpolating and sampling sequences in finite Riemann surfaces

    OpenAIRE

    Ortega-Cerda, Joaquim

    2007-01-01

    We provide a description of the interpolating and sampling sequences on a space of holomorphic functions on a finite Riemann surface, where a uniform growth restriction is imposed on the holomorphic functions.

  12. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  13. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

    2012-01-01

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  14. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  15. Main sequence mass loss

    International Nuclear Information System (INIS)

    Brunish, W.M.; Guzik, J.A.; Willson, L.A.; Bowen, G.

    1987-01-01

    It has been hypothesized that variable stars may experience mass loss, driven, at least in part, by oscillations. The class of stars we are discussing here are the δ Scuti variables. These are variable stars with masses between about 1.2 and 2.25 M/sub θ/, lying on or very near the main sequence. According to this theory, high rotation rates enhance the rate of mass loss, so main sequence stars born in this mass range would have a range of mass loss rates, depending on their initial rotation velocity and the amplitude of the oscillations. The stars would evolve rapidly down the main sequence until (at about 1.25 M/sub θ/) a surface convection zone began to form. The presence of this convective region would slow the rotation, perhaps allowing magnetic braking to occur, and thus sharply reduce the mass loss rate. 7 refs

  16. Electricity sequence control

    International Nuclear Information System (INIS)

    Shin, Heung Ryeol

    2010-03-01

    The contents of the book are introduction of control system, like classification and control signal, introduction of electricity power switch, such as push-button and detection switch sensor for induction type and capacitance type machinery for control, solenoid valve, expression of sequence and type of electricity circuit about using diagram, time chart, marking and term, logic circuit like Yes, No, and, or and equivalence logic, basic electricity circuit, electricity sequence control, added condition, special program control about choice and jump of program, motor control, extra circuit on repeat circuit, pause circuit in a conveyer, safety regulations and rule about classification of electricity disaster and protective device for insulation.

  17. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars

    2013-01-01

    , Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...... information obtained allows well for statistical analysis of the data. This general approach can be integrated into current laboratory practice and has numerous applications. Besides DNA-based predictions of blood group phenotypes, platelet phenotypes, or sickle cell anemia, and the determination of zygosity...

  18. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  19. THE RHIC SEQUENCER

    International Nuclear Information System (INIS)

    VAN ZEIJTS, J.; DOTTAVIO, T.; FRAK, B.; MICHNOFF, R.

    2001-01-01

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience

  20. Twin anemia polycythemia sequence

    NARCIS (Netherlands)

    Slaghekke, Femke

    2014-01-01

    In this thesis we describe that Twin Anemia Polycythemia Sequence (TAPS) is a form of chronic feto-fetal transfusion in monochorionic (identical) twins based on a small amount of blood transfusion through very small anastomoses. For the antenatal diagnosis of TAPS, Middle Cerebral Artery – Peak

  1. simple sequence repeat (SSR)

    African Journals Online (AJOL)

    In the present study, 78 mapped simple sequence repeat (SSR) markers representing 11 linkage groups of adzuki bean were evaluated for transferability to mungbean and related Vigna spp. 41 markers amplified characteristic bands in at least one Vigna species. The transferability percentage across the genotypes ranged ...

  2. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  3. Targeted sequencing of plant genomes

    Science.gov (United States)

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  4. Almost convergence of triple sequences

    OpenAIRE

    Ayhan Esi; M.Necdet Catalbas

    2013-01-01

    In this paper we introduce and study the concepts of almost convergence and almost Cauchy for triple sequences. Weshow that the set of almost convergent triple sequences of 0's and 1's is of the first category and also almost everytriple sequence of 0's and 1's is not almost convergent.Keywords: almost convergence, P-convergent, triple sequence.

  5. A few Smarandache Integer Sequences

    OpenAIRE

    Ibstedt, Henry

    2010-01-01

    This paper deals with the analysis of a few Smarandache Integer Sequences which first appeared in Properties or the Numbers, F. Smarandache, University or Craiova Archives, 1975. The first four sequences are recurrence generated sequences while the last three are concatenation sequences.

  6. Automatic Sequences and Zip-Specifications

    NARCIS (Netherlands)

    Grabmayer, C.A.; Endrullis, J.; Hendriks, D.; Klop, J.W.; Moss, L.S.

    2012-01-01

    We consider infinite sequences of symbols, also known as streams, and the decidability question for equality of streams defined in a restricted format. This restricted format consists of prefixing a symbol at the head of a stream, of the stream function `zip', and recursion variables. Here `zip'

  7. Implicity Defined Neural Networks for Sequence Labeling

    Science.gov (United States)

    2017-02-13

    assumption - that a hid- den variable changes its state based only on its current state and observables. In finding maximum likelihood state sequences...this setup, we have the following variables : data X labels Y parameters θ and functions: implicit hidden layer definition H = F (θ, ξ,H) loss function L...tagging task. In future work, we intend to consider implicit varia - tions of other archetectures, such as the LSTM, as well as additional, more challenging

  8. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information....

  9. Variable depth recursion algorithm for leaf sequencing

    International Nuclear Information System (INIS)

    Siochi, R. Alfredo C.

    2007-01-01

    The processes of extraction and sweep are basic segmentation steps that are used in leaf sequencing algorithms. A modified version of a commercial leaf sequencer changed the way that the extracts are selected and expanded the search space, but the modification maintained the basic search paradigm of evaluating multiple solutions, each one consisting of up to 12 extracts and a sweep sequence. While it generated the best solutions compared to other published algorithms, it used more computation time. A new, faster algorithm selects one extract at a time but calls itself as an evaluation function a user-specified number of times, after which it uses the bidirectional sweeping window algorithm as the final evaluation function. To achieve a performance comparable to that of the modified commercial leaf sequencer, 2-3 calls were needed, and in all test cases, there were only slight improvements beyond two calls. For the 13 clinical test maps, computation speeds improved by a factor between 12 and 43, depending on the constraints, namely the ability to interdigitate and the avoidance of the tongue-and-groove under dose. The new algorithm was compared to the original and modified versions of the commercial leaf sequencer. It was also compared to other published algorithms for 1400, random, 15x15, test maps with 3-16 intensity levels. In every single case the new algorithm provided the best solution

  10. Multilocus Sequence Typing

    OpenAIRE

    Belén, Ana; Pavón, Ibarz; Maiden, Martin C.J.

    2009-01-01

    Multilocus sequence typing (MLST) was first proposed in 1998 as a typing approach that enables the unambiguous characterization of bacterial isolates in a standardized, reproducible, and portable manner using the human pathogen Neisseria meningitidis as the exemplar organism. Since then, the approach has been applied to a large and growing number of organisms by public health laboratories and research institutions. MLST data, shared by investigators over the world via the Internet, have been ...

  11. Achalasia Carcinoma Sequence

    OpenAIRE

    Makmun, Dadang

    2001-01-01

    We report a case of carcinoma of the esophagus in a 58 years old woman with achalasia, who has been diagnosed since 30 years ago, which initiated by surgical treatment (myotomy) and the symptoms recurred since 3 years ago. According to the progress of the disease, Malignancy was strongly suspected due to prolonged stasis and mucosal irritation caused by achalasia (achalasia carcinoma sequence). Because of these contributing factors for the development of serious complications such as Malignan...

  12. Sequencing BPS spectra

    Energy Technology Data Exchange (ETDEWEB)

    Gukov, Sergei [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Max-Planck-Institut für Mathematik,Vivatsgasse 7, D-53111 Bonn (Germany); Nawata, Satoshi [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Centre for Quantum Geometry of Moduli Spaces, University of Aarhus,Nordre Ringgade 1, DK-8000 (Denmark); Saberi, Ingmar [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Stošić, Marko [CAMGSD, Departamento de Matemática, Instituto Superior Técnico,Av. Rovisco Pais, 1049-001 Lisbon (Portugal); Mathematical Institute SANU,Knez Mihajlova 36, 11000 Belgrade (Serbia); Sułkowski, Piotr [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Faculty of Physics, University of Warsaw,ul. Pasteura 5, 02-093 Warsaw (Poland)

    2016-03-02

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  13. Sequencing BPS spectra

    International Nuclear Information System (INIS)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-01-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  14. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  15. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  16. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  17. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    OpenAIRE

    Bolton, Michael J; Garry, Robert F

    2011-01-01

    Abstract Background The HIV surface glycoprotein gp120 (SU, gp120) and the Plasmodium vivax Duffy binding protein (PvDBP) bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM). Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infectio...

  18. A Role for the Fifth G-Track in G-Quadruplex Forming Oncogene Promoter Sequences during Oxidative Stress: Do These “Spare Tires” Have an Evolved Function?

    Science.gov (United States)

    2015-01-01

    Uncontrolled inflammation or oxidative stress generates electron-deficient species that oxidize the genome increasing its instability in cancer. The G-quadruplex (G4) sequences regulating the c-MYC, KRAS, VEGF, BCL-2, HIF-1α, and RET oncogenes, as examples, are targets for oxidation at loop and 5′-core guanines (G) as showcased in this study by CO3•– oxidation of the VEGF G4. Products observed include 8-oxo-7,8-dihydroguanine (OG), spiroiminodihydantoin (Sp), and 5-guanidinohydantoin (Gh). Our previous studies found that OG and Gh, when present in the four G-tracks of the solved structure for VEGF and c-MYC, were not substrates for the base excision repair (BER) DNA glycosylases in biologically relevant KCl solutions. We now hypothesize that a fifth G-track found a few nucleotides distant from the G4 tracks involved in folding can act as a “spare tire,” facilitating extrusion of a damaged G-run into a large loop that then becomes a substrate for BER. Thermodynamic, spectroscopic, and DMS footprinting studies verified the fifth domain replacing a damaged G-track with OG or Gh at a loop or core position in the VEGF G4. These new “spare tire”-containing strands with Gh in loops are now found to be substrates for initiation of BER with the NEIL1, NEIL2, and NEIL3 DNA glycosylases. The results support a hypothesis in which regulatory G4s carry a “spare-tire” fifth G-track for aiding in the repair process when these sequences are damaged by radical oxygen species, a feature observed in a large number of these sequences. Furthermore, formation and repair of oxidized bases in promoter regions may constitute an additional example of epigenetic modification, in this case of guanine bases, to regulate gene expression in which the G4 sequences act as sensors of oxidative stress. PMID:26405692

  19. Sequencing genes in silico using single nucleotide polymorphisms

    Directory of Open Access Journals (Sweden)

    Zhang Xinyi

    2012-01-01

    Full Text Available Abstract Background The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive. Results To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS, which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%. This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes. Conclusions Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate

  20. Applying Next Generation Sequencing to Skeletal Development and Disease

    OpenAIRE

    Bowen, Margot Elizabeth

    2013-01-01

    Next Generation Sequencing (NGS) technologies have dramatically increased the throughput and lowered the cost of DNA sequencing. In this thesis, I apply these technologies to unresolved questions in skeletal development and disease. Firstly, I use targeted re-sequencing of genomic DNA to identify the genetic cause of the cartilage tumor syndrome, metachondromatosis (MC). I show that the majority of MC patients carry heterozygous loss-of-function mutations in the PTPN11 gene, which encodes a p...

  1. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  2. Foundations of Sequence-to-Sequence Modeling for Time Series

    OpenAIRE

    Kuznetsov, Vitaly; Mariet, Zelda

    2018-01-01

    The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practiti...

  3. Revealing the functional structure of a new PLA2 K49 from Bothriopsis taeniata snake venom employing automatic "de novo" sequencing using CID/HCD/ETD MS/MS analyses.

    Science.gov (United States)

    Carregari, Victor Corasolla; Dai, Jie; Verano-Braga, Thiago; Rocha, Thalita; Ponce-Soto, Luis Alberto; Marangoni, Sergio; Roepstorff, Peter

    2016-01-10

    Snake venoms are composed of approximately 90% of proteins with several pharmacological activities having high potential in research as biological tools. One of the most abundant compounds is phospholipases A2 (PLA2), which are the most studied venom protein due to their wide pharmacological activity. Using a combination of chromatographic steps, a new PLA2 K49 was isolated and purified from the whole venom of the Bothriopsis taeniata and submitted to analyses mass spectrometry. An automatic “de novo” sequencing of this new PLA2 K49 denominated Btt-TX was performed using Peaks Studio 6 for analysis of the spectra. Additionally, a triplex approach CID/HCD/ETD has been performed, to generate higher coverage of the sequence of the protein. Structural studies correlating biological activities were made associating specific Btt-TX regions and myotoxic activity. Lysine acetylation was performed to better understand the mechanism of membrane interaction, identifying the extreme importance of the highly hydrophobic amino acids L, P and F for disruption of the membrane. Our myotoxical studies show a possible membrane disruption mechanism by Creatine Kinase release without a noticeable muscle damage, that probably occurred without phospholipid hydrolyses, but with a probable penetration of the hydrophobic amino acids present in the C-terminal region of the protein.

  4. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  5. Infinite sequences and series

    CERN Document Server

    Knopp, Konrad

    1956-01-01

    One of the finest expositors in the field of modern mathematics, Dr. Konrad Knopp here concentrates on a topic that is of particular interest to 20th-century mathematicians and students. He develops the theory of infinite sequences and series from its beginnings to a point where the reader will be in a position to investigate more advanced stages on his own. The foundations of the theory are therefore presented with special care, while the developmental aspects are limited by the scope and purpose of the book. All definitions are clearly stated; all theorems are proved with enough detail to ma

  6. Sequencing of a Cultivated Diploid Cotton Genome-Gossypium arboreum

    Institute of Scientific and Technical Information of China (English)

    WILKINS; Thea; A

    2008-01-01

    Sequencing the genomes of crop species and model systems contributes significantly to our understanding of the organization,structure and function of plant genomes.In a `white paper' published in 2007,the cotton community set forth a strategic plan for sequencing the AD genome of cultivated upland cotton that initially targets less complex diploid genomes.This strategy banks on the high degree

  7. Invariant and Absolute Invariant Means of Double Sequences

    Directory of Open Access Journals (Sweden)

    Abdullah Alotaibi

    2012-01-01

    Full Text Available We examine some properties of the invariant mean, define the concepts of strong σ-convergence and absolute σ-convergence for double sequences, and determine the associated sublinear functionals. We also define the absolute invariant mean through which the space of absolutely σ-convergent double sequences is characterized.

  8. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    DEFF Research Database (Denmark)

    Carlton, Jane M.; Hirt, Robert P.; Silva, Joana C.

    2007-01-01

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion...... environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria....

  9. Complete Genome Sequence of the Human Gut Symbiont Roseburia hominis

    DEFF Research Database (Denmark)

    Travis, Anthony J.; Kelly, Denise; Flint, Harry J

    2015-01-01

    We report here the complete genome sequence of the human gut symbiont Roseburia hominis A2-183(T) (= DSM 16839(T) = NCIMB 14029(T)), isolated from human feces. The genome is represented by a 3,592,125-bp chromosome with 3,405 coding sequences. A number of potential functions contributing to host...

  10. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    fast alignment algorithm, called 'Alignment By Scanning' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the 'GAP' (which is heuristic) and the 'Needleman

  11. Next-Generation Sequencing Platforms

    Science.gov (United States)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  12. MR-sialography: optimisation and evaluation of an ultra-fast sequence in parallel acquisition technique and different functional conditions of salivary glands; MR-Sialographie: Optimierung und Bewertung ultraschneller Sequenzen mit paralleler Bildgebung und oraler Stimulation

    Energy Technology Data Exchange (ETDEWEB)

    Habermann, C.R.; Cramer, M.C.; Aldefeld, D.; Weiss, F.; Kaul, M.G.; Adam, G. [Radiologisches Zentrum, Klinik und Poliklinik fuer Diagnostische und Interventionelle Radiologie, Universitaetsklinikum Hamburg-Eppendorf (Germany); Graessner, J. [Siemens Medical Systems, Hamburg (Germany); Reitmeier, F.; Jaehne, M. [Kopf- und Hautzentrum, Klinik und Poliklinik fuer Hals-, Nasen- und Ohrenheilkunde, Universitaetsklinikum Hamburg-Eppendorf (Germany); Petersen, K.U. [Zentrum fuer Psychosoziale Medizin, Klinik und Poliklinik fuer Psychiatrie und Psychotherapie, Universitaetsklinikum Hamburg-Eppendorf (Germany)

    2005-04-01

    Purpose: To optimise a fast sequence for MR-sialography and to compare a parallel and non-parallel acquisition technique. Additionally, the effect of oral stimulation regarding the image quality was evaluated. Material and Methods: All examinations were performed by using a 1.5-T superconducting system. After developing a sufficient sequence for MR-sialography, a single-shot turbo-spin-echo sequence (ss-TSE) with an acquisition time of 2.8 sec was used in transverse and oblique sagittal orientation in 27 healthy volunteers. All images were performed with and without parallel imaging technique. The assessment of the ductal system of the submandibular and parotid gland was performed using a 1 to 5 visual scale for each side separately. Images were evaluated by four independent experienced radiologists. For statistical evaluation, an ANOVA with post-hoc comparisons was used with an overall two-tailed significance level of P=.05. For evaluation of interobserver variability, an intraclass correlation was computed and correlation >.08 was determined to indicate a high correlation. Results: All parts of salivary excretal ducts could be visualised in all volunteers, with an overall rating for all ducts of 2.26 (SD{+-}1.09). Between the four observers a high correlation could be obtained with an intraclass correlation of 0.9475. A significant influence regarding the slice angulations could not be obtained (p=0.74). In all healthy volunteers the visibility of excretory ducts improved significantly after oral application of a Sialogogum (p<0.001; {eta}{sup 2}=0.049). The use of a parallel imaging technique did not lead to an improvement of visualisation, showing a significant loss of image quality compared to an acquistion technique without parallel imaging (p<0.001; {eta}{sup 2}=0.013). Conclusion: The optimised ss-TSE MR-sialography seems to be a fast and sufficient technique for visualisation of excretory ducts of the main salivary glands, with no elaborate post

  13. Sequence Classification: 890855 [

    Lifescience Database Archive (English)

    Full Text Available embrane space protein that functions in mitochondrial copper homeostasis, essential for functional cytochrome oxidase express...Non-TMB Non-TMH TMB Non-TMB TMB Non-TMB >gi|6321908|ref|NP_011984.1| Mitochondrial interm

  14. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  15. The advantages of SMRT sequencing

    OpenAIRE

    Roberts, Richard J; Carneiro, Mauricio O; Schatz, Michael C

    2013-01-01

    Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.

  16. Putting instruction sequences into effect

    NARCIS (Netherlands)

    Bergstra, J.A.

    2011-01-01

    An attempt is made to define the concept of execution of an instruction sequence. It is found to be a special case of directly putting into effect of an instruction sequence. Directly putting into effect of an instruction sequences comprises interpretation as well as execution. Directly putting into

  17. Region segmentation along image sequence

    International Nuclear Information System (INIS)

    Monchal, L.; Aubry, P.

    1995-01-01

    A method to extract regions in sequence of images is proposed. Regions are not matched from one image to the following one. The result of a region segmentation is used as an initialization to segment the following and image to track the region along the sequence. The image sequence is exploited as a spatio-temporal event. (authors). 12 refs., 8 figs

  18. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    Directory of Open Access Journals (Sweden)

    Bolton Michael J

    2011-11-01

    Full Text Available Abstract Background The HIV surface glycoprotein gp120 (SU, gp120 and the Plasmodium vivax Duffy binding protein (PvDBP bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM. Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infection of erythrocytes and DBP binding to the Duffy Antigen Receptor for Chemokines (DARC. A peptide including the HBM of PvDBP had similar affinity for heparin as RANTES and V3 loop peptides, and could be specifically inhibited from heparin binding by the same polyanions that inhibit DBP binding to DARC. However, some V3 peptides can competitively inhibit RANTES binding to heparin, but not the PvDBP HBM peptide. Three other members of the DBP family have an HBM sequence that is necessary for erythrocyte binding, however only the protein which binds to DARC, the P. knowlesi alpha protein, is inhibited by heparin from binding to erythrocytes. Heparitinase digestion does not affect the binding of DBP to erythrocytes. Conclusion The HBMs of DBPs that bind to DARC have similar heparin binding affinities as some V3 loop peptides and chemokines, are responsible for specific sulfated polysaccharide inhibition of parasite binding and invasion of red blood cells, and are more likely to bind to negative charges on the receptor than cell surface glycosaminoglycans.

  19. Hierarchically nested river landform sequences

    Science.gov (United States)

    Pasternack, G. B.; Weber, M. D.; Brown, R. A.; Baig, D.

    2017-12-01

    River corridors exhibit landforms nested within landforms repeatedly down spatial scales. In this study we developed, tested, and implemented a new way to create river classifications by mapping domains of fluvial processes with respect to the hierarchical organization of topographic complexity that drives fluvial dynamism. We tested this approach on flow convergence routing, a morphodynamic mechanism with different states depending on the structure of nondimensional topographic variability. Five nondimensional landform types with unique functionality (nozzle, wide bar, normal channel, constricted pool, and oversized) represent this process at any flow. When this typology is nested at base flow, bankfull, and floodprone scales it creates a system with up to 125 functional types. This shows how a single mechanism produces complex dynamism via nesting. Given the classification, we answered nine specific scientific questions to investigate the abundance, sequencing, and hierarchical nesting of these new landform types using a 35-km gravel/cobble river segment of the Yuba River in California. The nested structure of flow convergence routing landforms found in this study revealed that bankfull landforms are nested within specific floodprone valley landform types, and these types control bankfull morphodynamics during moderate to large floods. As a result, this study calls into question the prevailing theory that the bankfull channel of a gravel/cobble river is controlled by in-channel, bankfull, and/or small flood flows. Such flows are too small to initiate widespread sediment transport in a gravel/cobble river with topographic complexity.

  20. Extended sequence diagram for human system interaction

    International Nuclear Information System (INIS)

    Hwang, Jong Rok; Choi, Sun Woo; Ko, Hee Ran; Kim, Jong Hyun

    2012-01-01

    Unified Modeling Language (UML) is a modeling language in the field of object oriented software engineering. The sequence diagram is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a message sequence chart. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. This paper proposes the Extended Sequence Diagram (ESD), which is capable of depicting human system interaction for nuclear power plants, as well as cognitive process of operators analysis. In the conventional sequence diagram, there is a limit to only identify the activities of human and systems interactions. The ESD is extended to describe operators' cognitive process in more detail. The ESD is expected to be used as a task analysis method for describing human system interaction. The ESD can also present key steps causing abnormal operations or failures and diverse human errors based on cognitive condition

  1. Exploring the correlations between sequence evolution rate and ...

    Indian Academy of Sciences (India)

    2012-10-15

    Oct 15, 2012 ... The vast functional divergence within mammalian lineages that ... Keywords. Phylogenetics; molecular clock; sequence evolutionary rate; phenotypic evolution; morphology; genomics .... entire lineages during periods with ecosystem-level commu- ... increases from fish to amphibians to birds to mammals.

  2. The entire sequence over Musielak p-metric space

    Directory of Open Access Journals (Sweden)

    C. Murugesan

    2016-04-01

    Full Text Available In this paper, we introduce fibonacci numbers of Γ2(F sequence space over p-metric spaces defined by Musielak function and examine some topological properties of the resulting these spaces.

  3. Log-balanced combinatorial sequences

    Directory of Open Access Journals (Sweden)

    Tomislav Došlic

    2005-01-01

    Full Text Available We consider log-convex sequences that satisfy an additional constraint imposed on their rate of growth. We call such sequences log-balanced. It is shown that all such sequences satisfy a pair of double inequalities. Sufficient conditions for log-balancedness are given for the case when the sequence satisfies a two- (or more- term linear recurrence. It is shown that many combinatorially interesting sequences belong to this class, and, as a consequence, that the above-mentioned double inequalities are valid for all of them.

  4. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    Science.gov (United States)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  5. Sequence Classification: 891809 [

    Lifescience Database Archive (English)

    Full Text Available unknown function, expressed during sporulation; not required for sporulation, but gene exhibits genetic int...eractions with other genes required for sporulation; Spr6p || http://www.ncbi.nlm.nih.gov/protein/6320961 ...

  6. Sequence Classification: 889686 [

    Lifescience Database Archive (English)

    Full Text Available ivities, functions in formaldehyde detoxification and formation of long chain and complex alcohols, regulated by Hog1p-Sko1p; Sfa1p || http://www.ncbi.nlm.nih.gov/protein/6320033 ...

  7. Biomolecule Sequencer: Next-Generation DNA Sequencing Technology for In-Flight Environmental Monitoring, Research, and Beyond

    Science.gov (United States)

    Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.

    2016-01-01

    On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human

  8. New MR pulse sequence

    International Nuclear Information System (INIS)

    Harms, S.E.; Flamig, D.P.; Griffey, R.H.

    1990-01-01

    This paper describes a method for fat suppression for three-dimensional MR imaging. The FATS (fat-suppressed acquisition with echo time shortened) sequence employs a pair of opposing adiabatic half-passage RF pulses tuned on fat resonance. The imaging parameters are as follows: TR, 20 msec; TE, 21.7-3.2 msec; 1,024 x 128 x 128 acquired matrix; imaging time, approximately 11 minutes. A series of 54 examinations were performed. Excellent fat suppression with water excitation is achieved in all cases. The orbital images demonstrate superior resolution of small orbital lesions. The high signal-to-noise ratio (SNR) in cranial studies demonstrates excellent petrous bone and internal auditory canal anatomy

  9. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  10. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore...... advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery....... a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs...

  11. Inverse statistical physics of protein sequences: a key issues review.

    Science.gov (United States)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  12. Prediction of novel archaeal enzymes from sequence-derived features

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Skovgaard, Marie; Brunak, Søren

    2002-01-01

    The completely sequenced archaeal genomes potentially encode, among their many functionally uncharacterized genes, novel enzymes of biotechnological interest. We have developed a prediction method for detection and classification of enzymes from sequence alone (available at http://www.cbs.dtu.dk/......The completely sequenced archaeal genomes potentially encode, among their many functionally uncharacterized genes, novel enzymes of biotechnological interest. We have developed a prediction method for detection and classification of enzymes from sequence alone (available at http......://www.cbs.dtu.dk/services/ArchaeaFun/). The method does not make use of sequence similarity; rather, it relies on predicted protein features like cotranslational and posttranslational modifications, secondary structure, and simple physical/chemical properties....

  13. Adenovirus sequences required for replication in vivo.

    OpenAIRE

    Wang, K; Pearson, G D

    1985-01-01

    We have studied the in vivo replication properties of plasmids carrying deletion mutations within cloned adenovirus terminal sequences. Deletion mapping located the adenovirus DNA replication origin entirely within the first 67 bp of the adenovirus inverted terminal repeat. This region could be further subdivided into two functional domains: a minimal replication origin and an adjacent auxillary region which boosted the efficiency of replication by more than 100-fold. The minimal origin occup...

  14. Image registration method for medical image sequences

    Science.gov (United States)

    Gee, Timothy F.; Goddard, James S.

    2013-03-26

    Image registration of low contrast image sequences is provided. In one aspect, a desired region of an image is automatically segmented and only the desired region is registered. Active contours and adaptive thresholding of intensity or edge information may be used to segment the desired regions. A transform function is defined to register the segmented region, and sub-pixel information may be determined using one or more interpolation methods.

  15. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  16. Sequence memory based on coherent spin-interaction neural networks.

    Science.gov (United States)

    Xia, Min; Wong, W K; Wang, Zhijie

    2014-12-01

    Sequence information processing, for instance, the sequence memory, plays an important role on many functions of brain. In the workings of the human brain, the steady-state period is alterable. However, in the existing sequence memory models using heteroassociations, the steady-state period cannot be changed in the sequence recall. In this work, a novel neural network model for sequence memory with controllable steady-state period based on coherent spininteraction is proposed. In the proposed model, neurons fire collectively in a phase-coherent manner, which lets a neuron group respond differently to different patterns and also lets different neuron groups respond differently to one pattern. The simulation results demonstrating the performance of the sequence memory are presented. By introducing a new coherent spin-interaction sequence memory model, the steady-state period can be controlled by dimension parameters and the overlap between the input pattern and the stored patterns. The sequence storage capacity is enlarged by coherent spin interaction compared with the existing sequence memory models. Furthermore, the sequence storage capacity has an exponential relationship to the dimension of the neural network.

  17. Use of designed sequences in protein structure recognition.

    Science.gov (United States)

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  18. Heuristics for multiobjective multiple sequence alignment.

    Science.gov (United States)

    Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

    2016-07-15

    Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show

  19. Sequence Classification: 893673 [

    Lifescience Database Archive (English)

    Full Text Available n of unknown function; deletion heterozygote is sensitive to compounds that target ergosterol biosynthesis, ...Non-TMB Non-TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|6325092|ref|NP_015160.1| Protei

  20. Sequence Classification: 892145 [

    Lifescience Database Archive (English)

    Full Text Available Non-TMB TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|6323774|ref|NP_013845.1| Protein of unknown function, deleti...on causes sensitivity to thermal stress; Dlt1p || http://www.ncbi.nlm.nih.gov/protein/6323774 ...

  1. Sequence Classification: 891278 [

    Lifescience Database Archive (English)

    Full Text Available TMB Non-TMH TMB TMB TMB Non-TMB >gi|6321518|ref|NP_011595.1| Protein of unknown function; deletion... mutant has synthetic fitness defect with an sgs1 deletion mutant; Slx9p || http://www.ncbi.nlm.nih.gov/protein/6321518 ...

  2. Sequence Classification: 892983 [

    Lifescience Database Archive (English)

    Full Text Available r protein of unknown function; deletion results in sensitivity to anticancer drugs oxaliplatin and cisplatin..., but not mitomycin C; deletion is synthetically lethal with a chitin synthase (CHS1) null mutant; Psy2p || http://www.ncbi.nlm.nih.gov/protein/6324128 ...

  3. Sequence Classification: 889737 [

    Lifescience Database Archive (English)

    Full Text Available Non-TMB Non-TMH Non-TMB Non-TMB TMB Non-TMB >gi|44829553|ref|NP_010168.3| Protein of unknown function, delet...ion causes hypersensitivity to the K1 killer toxin; Iwr1p || http://www.ncbi.nlm.nih.gov/protein/44829553 ...

  4. Sequence Classification: 892768 [

    Lifescience Database Archive (English)

    Full Text Available on results in a mutator phenotype suggesting a role for this protein as a mutational suppressor; deletion...Non-TMB Non-TMH Non-TMB Non-TMB Non-TMB TMB >gi|6323408|ref|NP_013480.1| Protein of unknown function; deleti

  5. Sequence Classification: 893543 [

    Lifescience Database Archive (English)

    Full Text Available rich protein with a role in preribosome assembly or transport; may function as a chaperone of small nucleolar ribonucleoprotein parti...cles (snoRNPs); immunologically and structurally to rat Nopp140; Srp40p || http://www.ncbi.nlm.nih.gov/protein/6322945 ...

  6. Sequence Classification: 893846 [

    Lifescience Database Archive (English)

    Full Text Available ds zinc, found both on membranes and in the cytosol; guanine nucleotide dissociation stimulator; Dss4p || http://www.ncbi.nlm.nih.gov/protein/6325274 ... ...tide release factor functioning in the post-Golgi secretory pathway, required for ER-to-Golgi transport, bin

  7. Sequence Classification: 892232 [

    Lifescience Database Archive (English)

    Full Text Available TMB Non-TMH Non-TMB TMB Non-TMB TMB >gi|6323867|ref|NP_013938.1| Essential protein involved in mtDNA inherit...ance, may also function in the partitioning of the mitochondrial organelle or in th

  8. Sequence Classification: 889222 [

    Lifescience Database Archive (English)

    Full Text Available protein required for chromosome condensation, likely to function as an intrinsic component of the condensation machinery, may influe...nce multiple aspects of chromosome transmission and dynamics; Brn1p || http://www.ncbi.nlm.nih.gov/protein/41629672 ...

  9. Sequence Classification: 889842 [

    Lifescience Database Archive (English)

    Full Text Available region of Rcr1p in conferring Congo Red resistance when overexpressed; Rcr2p || http://www.ncbi.nlm.nih.gov/protein/6320206 ... ...lum membrane protein with similarity to Rcr1p; C-terminal region can functionally replace the corresponding

  10. Sequence Classification: 894820 [

    Lifescience Database Archive (English)

    Full Text Available ruption does not increase the rate of spontaneous mutagenesis; Ham1p || http://www.ncbi.nlm.nih.gov/protein/6322529 ... ...n of unknown function that is involved in DNA repair; mutant is sensitive to the base analog, 6-N-hydroxylaminopurine, while gene dis

  11. Sequence Classification: 890890 [

    Lifescience Database Archive (English)

    Full Text Available lar protein of unknown function, positive regulator of exit from mitosis; involved in regulating the release of Cdc14p from the nucle...olus in early anaphase; proposed to play similar role in meiosis; Spo12p || http://www.ncbi.nlm.nih.gov/protein/6321946 ...

  12. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  13. Genomic sequencing in clinical trials

    OpenAIRE

    Mestan, Karen K; Ilkhanoff, Leonard; Mouli, Samdeep; Lin, Simon

    2011-01-01

    Abstract Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to fin...

  14. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  15. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the 'Needleman-Wunsch' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  16. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  17. Left ventricular function assessment using a fast 3D gradient echo pulse sequence: comparison to standard multi-breath hold 2D steady state free precession imaging and accounting for papillary muscles and trabeculations.

    Science.gov (United States)

    Sievers, Burkhard; Schrader, Sebastian; Rehwald, Wolfgang; Hunold, Peter; Barkhausen, Joerg; Erbel, Raimund

    2011-06-01

    Papillary muscles and trabeculae for ventricular function analysis are known to significantly contribute to accurate volume and mass measurements. Fast imaging techniques such as three-dimensional steady-state free precession (3D SSFP) are increasingly being used to speed up imaging time, but sacrifice spatial resolution. It is unknown whether 3D SSFP, despite its reduced spatial resolution, allows for exact delineation of papillary muscles and trabeculations. We therefore compared 3D SSFP ventricular function measurements to those measured from standard multi-breath hold two-dimensional steady-state free precession cine images (standard 2D SSFP). 14 healthy subjects and 14 patients with impaired left ventricularfunction underwent 1.5 Tesla cine imaging. A stack of short axis images covering the left ventricle was acquired with 2D SSFP and 3D SSFP. Left ventricular volumes, ejection fraction, and mass were determined. Analysis was performed by substracting papillary muscles and trabeculae from left ventricular volumes. In addition, reproducibility was assessed. EDV, ESV, EF, and mass were not significantly different between 2D SSFP and 3D SSFP (mean difference healthy subjects: -0.06 +/- 3.2 ml, 0.54 +/- 2.2 ml, -0.45 +/- 1.8%, and 1.13 +/- 0.8 g, respectively; patients: 1.36 +/- 2.8 ml, -0.15 3.5 ml, 0.86 +/- 2.5%, and 0.91 +/- 0.9 g, respectively; P > or = 0.095). Intra- and interobserver variability was not different for 2D SSFP (P > or = 0.64 and P > or = 0.397) and 3D SSFP (P > or = 0.53 and P > or = 0.47). Differences in volumes, EF, and mass measurements between 3D SSFP and standard 2D SSFP are very small, and not statistically significant. 3D SSFP may be used for accurate ventricular function assessment when papillary muscles and trabeculations are to be taken into account.

  18. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    Science.gov (United States)

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  19. ϕ-statistically quasi Cauchy sequences

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2016-04-01

    Full Text Available Let P denote the space whose elements are finite sets of distinct positive integers. Given any element σ of P, we denote by p(σ the sequence {pn(σ} such that pn(σ=1 for n ∈ σ and pn(σ=0 otherwise. Further Ps={σ∈P:∑n=1∞pn(σ≤s}, i.e. Ps is the set of those σ whose support has cardinality at most s. Let (ϕn be a non-decreasing sequence of positive integers such that nϕn+1≤(n+1ϕn for all n∈N and the class of all sequences (ϕn is denoted by Φ. Let E⊆N. The number δϕ(E=lims→∞1ϕs|{k∈σ,σ∈Ps:k∈E}| is said to be the ϕ-density of E. A sequence (xn of points in R is ϕ-statistically convergent (or Sϕ-convergent to a real number ℓ for every ε > 0 if the set {n∈N:|xn−ℓ|≥ɛ} has ϕ-density zero. We introduce ϕ-statistically ward continuity of a real function. A real function is ϕ-statistically ward continuous if it preserves ϕ-statistically quasi Cauchy sequences where a sequence (xn is called to be ϕ-statistically quasi Cauchy (or Sϕ-quasi Cauchy when (Δxn=(xn+1−xn is ϕ-statistically convergent to 0. i.e. a sequence (xn of points in R is called ϕ-statistically quasi Cauchy (or Sϕ-quasi Cauchy for every ε > 0 if {n∈N:|xn+1−xn|≥ɛ} has ϕ-density zero. Also we introduce the concept of ϕ-statistically ward compactness and obtain results related to ϕ-statistically ward continuity, ϕ-statistically ward compactness, statistically ward continuity, ward continuity, ward compactness, ordinary compactness, uniform continuity, ordinary continuity, δ-ward continuity, and slowly oscillating continuity.

  20. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  1. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  2. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  3. Sequence Algebra, Sequence Decision Diagrams and Dynamic Fault Trees

    International Nuclear Information System (INIS)

    Rauzy, Antoine B.

    2011-01-01

    A large attention has been focused on the Dynamic Fault Trees in the past few years. By adding new gates to static (regular) Fault Trees, Dynamic Fault Trees aim to take into account dependencies among events. Merle et al. proposed recently an algebraic framework to give a formal interpretation to these gates. In this article, we extend Merle et al.'s work by adopting a slightly different perspective. We introduce Sequence Algebras that can be seen as Algebras of Basic Events, representing failures of non-repairable components. We show how to interpret Dynamic Fault Trees within this framework. Finally, we propose a new data structure to encode sets of sequences of Basic Events: Sequence Decision Diagrams. Sequence Decision Diagrams are very much inspired from Minato's Zero-Suppressed Binary Decision Diagrams. We show that all operations of Sequence Algebras can be performed on this data structure.

  4. eShadow: A tool for comparing closely related sequences

    Energy Technology Data Exchange (ETDEWEB)

    Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.

    2004-01-15

    Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualization of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/

  5. Unified Deep Learning Architecture for Modeling Biology Sequence.

    Science.gov (United States)

    Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

    2017-10-09

    Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.

  6. DSAP: deep-sequencing small RNA analysis pipeline.

    Science.gov (United States)

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  7. Functional analysis of protein N-myristoylation: Metabolic labeling studies using three oxygen-substituted analogs of myristic acid and cultured mammalian cells provide evidence for protein-sequence-specific incorporation and analog-specific redistribution

    International Nuclear Information System (INIS)

    Johnson, D.R.; Heuckeroth, R.O.; Gordon, J.I.; Cox, A.D.; Solski, P.A.; Buss, J.E.; Devadas, B.; Adams, S.P.; Leimgruber, R.M.

    1990-01-01

    Covalent attachment of myristic acid (C14:0) to the NH 2 -terminal glycine residue of a number of cellular, viral, and oncogene-encoded proteins is essential for full expression of their biological function. Substitution of oxygen for methylene groups in this fatty acid does not produce a significant change in chain length or stereochemistry but does result in a reduction in hydrophobicity. These heteroatom-containing analogs serve as alternative substrates for mammalian myristoyl-CoA: protein N-myristoyltransferase and offer the opportunity to explore structure/function relationships of myristate in N-myristoyltransferase proteins. The authors have synthesized three tritiated analogs of myristate with oxygen substituted for methylene groups at C6, C11, and C13. Metabolic labeling studies were performed with these compounds and (i) a murine myocyte cell line (BC 3 H1), (ii) a rat fibroblast cell that produces p60 v-src (3Xsrc), or (iii) NIH 3T3 cells that have been engineered to express a fusion protein consisting of an 11-residue myristoylation signal from the Rasheed sarcoma virus (RaSV) gag protein linked to c-Ha-ras with a Cys → Ser-186 mutation. Two-dimensional gel electrophoresis of membrane and soluble fractions prepared from cell lysates revealed different patterns of incorporation of the analogs into cellular N-myristoyl proteins. The demonstration that these analogs differ in the extent to which they are incorporated and in their ability to cause redistribution of any single protein suggests that they may also have sufficient selectivity to be of potential therapeutic value

  8. Chameleon sequences in neurodegenerative diseases

    International Nuclear Information System (INIS)

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-01-01

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  9. Direct, rapid RNA sequence analysis

    International Nuclear Information System (INIS)

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  10. Chameleon sequences in neurodegenerative diseases.

    Science.gov (United States)

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  12. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  13. Chameleon sequences in neurodegenerative diseases

    Energy Technology Data Exchange (ETDEWEB)

    Bahramali, Golnaz [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Goliaei, Bahram, E-mail: goliaei@ut.ac.ir [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of); Salari, Ali [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of)

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  14. Commercial Art: Scope and Sequence.

    Science.gov (United States)

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a commercial art vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  15. The Release 6 reference sequence of the Drosophila melanogaster genome.

    Science.gov (United States)

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. © 2015 Hoskins et al.; Published by Cold Spring Harbor Laboratory Press.

  16. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  17. Accident sequence quantification with KIRAP

    International Nuclear Information System (INIS)

    Kim, Tae Un; Han, Sang Hoon; Kim, Kil You; Yang, Jun Eon; Jeong, Won Dae; Chang, Seung Cheol; Sung, Tae Yong; Kang, Dae Il; Park, Jin Hee; Lee, Yoon Hwan; Hwang, Mi Jeong.

    1997-01-01

    The tasks of probabilistic safety assessment(PSA) consists of the identification of initiating events, the construction of event tree for each initiating event, construction of fault trees for event tree logics, the analysis of reliability data and finally the accident sequence quantification. In the PSA, the accident sequence quantification is to calculate the core damage frequency, importance analysis and uncertainty analysis. Accident sequence quantification requires to understand the whole model of the PSA because it has to combine all event tree and fault tree models, and requires the excellent computer code because it takes long computation time. Advanced Research Group of Korea Atomic Energy Research Institute(KAERI) has developed PSA workstation KIRAP(Korea Integrated Reliability Analysis Code Package) for the PSA work. This report describes the procedures to perform accident sequence quantification, the method to use KIRAP's cut set generator, and method to perform the accident sequence quantification with KIRAP. (author). 6 refs

  18. Accident sequence quantification with KIRAP

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Tae Un; Han, Sang Hoon; Kim, Kil You; Yang, Jun Eon; Jeong, Won Dae; Chang, Seung Cheol; Sung, Tae Yong; Kang, Dae Il; Park, Jin Hee; Lee, Yoon Hwan; Hwang, Mi Jeong

    1997-01-01

    The tasks of probabilistic safety assessment(PSA) consists of the identification of initiating events, the construction of event tree for each initiating event, construction of fault trees for event tree logics, the analysis of reliability data and finally the accident sequence quantification. In the PSA, the accident sequence quantification is to calculate the core damage frequency, importance analysis and uncertainty analysis. Accident sequence quantification requires to understand the whole model of the PSA because it has to combine all event tree and fault tree models, and requires the excellent computer code because it takes long computation time. Advanced Research Group of Korea Atomic Energy Research Institute(KAERI) has developed PSA workstation KIRAP(Korea Integrated Reliability Analysis Code Package) for the PSA work. This report describes the procedures to perform accident sequence quantification, the method to use KIRAP`s cut set generator, and method to perform the accident sequence quantification with KIRAP. (author). 6 refs.

  19. Repeated DNA sequences in fungi

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K

    1974-11-01

    Several fungal species, representatives of all broad groups like basidiomycetes, ascomycetes and phycomycetes, were examined for the nature of repeated DNA sequences by DNA:DNA reassociation studies using hydroxyapatite chromatography. All of the fungal species tested contained 10 to 20 percent repeated DNA sequences. There are approximately 100 to 110 copies of repeated DNA sequences of approximately 4 x 10/sup 7/ daltons piece size of each. Repeated DNA sequence homoduplexes showed on average 5/sup 0/C difference of T/sub e/50 (temperature at which 50 percent duplexes dissociate) values from the corresponding homoduplexes of unfractionated whole DNA. It is suggested that a part of repetitive sequences in fungi constitutes mitochondrial DNA and a part of it constitutes nuclear DNA. (auth)

  20. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  1. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    Directory of Open Access Journals (Sweden)

    Kevin R Ramkissoon

    Full Text Available The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  2. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    Science.gov (United States)

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  3. Logic verification system for power plant sequence diagrams

    International Nuclear Information System (INIS)

    Fukuda, Mitsuko; Yamada, Naoyuki; Teshima, Toshiaki; Kan, Ken-ichi; Utsunomiya, Mitsugu.

    1994-01-01

    A logic verification system for sequence diagrams of power plants has been developed. The system's main function is to verify correctness of the logic realized by sequence diagrams for power plant control systems. The verification is based on a symbolic comparison of the logic of the sequence diagrams with the logic of the corresponding IBDs (interlock Block Diagrams) in combination with reference to design knowledge. The developed system points out the sub-circuit which is responsible for any existing mismatches between the IBD logic and the logic realized by the sequence diagrams. Applications to the verification of actual sequence diagrams of power plants confirmed that the developed system is practical and effective. (author)

  4. Dynamics of domain coverage of the protein sequence universe

    Science.gov (United States)

    2012-01-01

    Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439

  5. QUASAR--scoring and ranking of sequence-structure alignments.

    Science.gov (United States)

    Birzele, Fabian; Gewehr, Jan E; Zimmer, Ralf

    2005-12-15

    Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.

  6. Dynamics of domain coverage of the protein sequence universe

    Directory of Open Access Journals (Sweden)

    Rekapalli Bhanu

    2012-11-01

    Full Text Available Abstract Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data.

  7. Transcription blockage by homopurine DNA sequences: role of sequence composition and single-strand breaks

    Science.gov (United States)

    Belotserkovskii, Boris P.; Neil, Alexander J.; Saleh, Syed Shayon; Shin, Jane Hae Soo; Mirkin, Sergei M.; Hanawalt, Philip C.

    2013-01-01

    The ability of DNA to adopt non-canonical structures can affect transcription and has broad implications for genome functioning. We have recently reported that guanine-rich (G-rich) homopurine-homopyrimidine sequences cause significant blockage of transcription in vitro in a strictly orientation-dependent manner: when the G-rich strand serves as the non-template strand [Belotserkovskii et al. (2010) Mechanisms and implications of transcription blockage by guanine-rich DNA sequences., Proc. Natl Acad. Sci. USA, 107, 12816–12821]. We have now systematically studied the effect of the sequence composition and single-stranded breaks on this blockage. Although substitution of guanine by any other base reduced the blockage, cytosine and thymine reduced the blockage more significantly than adenine substitutions, affirming the importance of both G-richness and the homopurine-homopyrimidine character of the sequence for this effect. A single-strand break in the non-template strand adjacent to the G-rich stretch dramatically increased the blockage. Breaks in the non-template strand result in much weaker blockage signals extending downstream from the break even in the absence of the G-rich stretch. Our combined data support the notion that transcription blockage at homopurine-homopyrimidine sequences is caused by R-loop formation. PMID:23275544

  8. GROUPING WEB ACCESS SEQUENCES uSING SEQUENCE ALIGNMENT METHOD

    OpenAIRE

    BHUPENDRA S CHORDIA; KRISHNAKANT P ADHIYA

    2011-01-01

    In web usage mining grouping of web access sequences can be used to determine the behavior or intent of a set of users. Grouping websessions is how to measure the similarity between web sessions. There are many shortcomings in traditional measurement methods. The taskof grouping web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-groupsimilarity is done using sequence alignment method. This paper introduces a new method to group we...

  9. Weak disorder in Fibonacci sequences

    Energy Technology Data Exchange (ETDEWEB)

    Ben-Naim, E [Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545 (United States); Krapivsky, P L [Department of Physics and Center for Molecular Cybernetics, Boston University, Boston, MA 02215 (United States)

    2006-05-19

    We study how weak disorder affects the growth of the Fibonacci series. We introduce a family of stochastic sequences that grow by the normal Fibonacci recursion with probability 1 - {epsilon}, but follow a different recursion rule with a small probability {epsilon}. We focus on the weak disorder limit and obtain the Lyapunov exponent that characterizes the typical growth of the sequence elements, using perturbation theory. The limiting distribution for the ratio of consecutive sequence elements is obtained as well. A number of variations to the basic Fibonacci recursion including shift, doubling and copying are considered. (letter to the editor)

  10. Genome Sequence of the Biocontrol Strain Pseudomonas fluorescens F113

    Science.gov (United States)

    Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P.; Germaine, Kieran; Martínez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sánchez-Contreras, María; Moynihan, Jennifer A.; Giddens, Stephen R.; Coppoolse, Eric R.; Muriel, Candela; Stiekema, Willem J.; Rainey, Paul B.; Dowling, David; O'Gara, Fergal; Martín, Marta

    2012-01-01

    Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms. PMID:22328765

  11. The evolution of coronal activity in main sequence cool stars

    International Nuclear Information System (INIS)

    Stern, R.A.

    1984-01-01

    Stars spend most of their lifetime and show the least amount of nuclear evolution on the main sequence. However, the x-ray luminosities of cool star coronas change by orders of magnitude as a function of main sequence age. Such coronal evolution is discussed in relation to our knowledge of the solar corona, solar and stellar flares, stellar rotation and binarity. The relevance of X-ray observations to current speculations on stellar dynamos is also considered

  12. Formation of a Multiple Protein Complex on the Adenovirus Packaging Sequence by the IVa2 Protein▿

    OpenAIRE

    Tyler, Ryan E.; Ewing, Sean G.; Imperiale, Michael J.

    2007-01-01

    During adenovirus virion assembly, the packaging sequence mediates the encapsidation of the viral genome. This sequence is composed of seven functional units, termed A repeats. Recent evidence suggests that the adenovirus IVa2 protein binds the packaging sequence and is involved in packaging of the genome. Study of the IVa2-packaging sequence interaction has been hindered by difficulty in purifying the protein produced in virus-infected cells or by recombinant techniques. We report the first ...

  13. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  14. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Science.gov (United States)

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  15. A computational genomics pipeline for prokaryotic sequencing projects.

    Science.gov (United States)

    Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King

    2010-08-01

    New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.

  16. Robustness of ancestral sequence reconstruction to phylogenetic uncertainty.

    Science.gov (United States)

    Hanson-Smith, Victor; Kolaczkowski, Bryan; Thornton, Joseph W

    2010-09-01

    Ancestral sequence reconstruction (ASR) is widely used to formulate and test hypotheses about the sequences, functions, and structures of ancient genes. Ancestral sequences are usually inferred from an alignment of extant sequences using a maximum likelihood (ML) phylogenetic algorithm, which calculates the most likely ancestral sequence assuming a probabilistic model of sequence evolution and a specific phylogeny--typically the tree with the ML. The true phylogeny is seldom known with certainty, however. ML methods ignore this uncertainty, whereas Bayesian methods incorporate it by integrating the likelihood of each ancestral state over a distribution of possible trees. It is not known whether Bayesian approaches to phylogenetic uncertainty improve the accuracy of inferred ancestral sequences. Here, we use simulation-based experiments under both simplified and empirically derived conditions to compare the accuracy of ASR carried out using ML and Bayesian approaches. We show that incorporating phylogenetic uncertainty by integrating over topologies very rarely changes the inferred ancestral state and does not improve the accuracy of the reconstructed ancestral sequence. Ancestral state reconstructions are robust to uncertainty about the underlying tree because the conditions that produce phylogenetic uncertainty also make the ancestral state identical across plausible trees; conversely, the conditions under which different phylogenies yield different inferred ancestral states produce little or no ambiguity about the true phylogeny. Our results suggest that ML can produce accurate ASRs, even in the face of phylogenetic uncertainty. Using Bayesian integration to incorporate this uncertainty is neither necessary nor beneficial.

  17. Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment

    Directory of Open Access Journals (Sweden)

    Daniels Noah M

    2012-10-01

    Full Text Available Abstract Background The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. Results We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. Conclusions Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.

  18. Measuring the distance between multiple sequence alignments.

    Science.gov (United States)

    Blackburne, Benjamin P; Whelan, Simon

    2012-02-15

    Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.

  19. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  20. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  1. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  2. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  3. ADDRESS SEQUENCES FOR MULTI RUN RAM TESTING

    Directory of Open Access Journals (Sweden)

    V. N. Yarmolik

    2014-01-01

    Full Text Available A universal approach for generation of address sequences with specified properties is proposed and analyzed. A modified version of the Antonov and Saleev algorithm for Sobol sequences genera-tion is chosen as a mathematical description of the proposed method. Within the framework of the proposed universal approach, the Sobol sequences form a subset of the address sequences. Other sub-sets are also formed, which are Gray sequences, anti-Gray sequences, counter sequences and sequenc-es with specified properties.

  4. Protein sequence comparison and protein evolution

    Energy Technology Data Exchange (ETDEWEB)

    Pearson, W.R. [Univ. of Virginia, Charlottesville, VA (United States). Dept. of Biochemistry

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  5. Brain activation during anticipation of sound sequences.

    Science.gov (United States)

    Leaver, Amber M; Van Lare, Jennifer; Zielinski, Brandon; Halpern, Andrea R; Rauschecker, Josef P

    2009-02-25

    Music consists of sound sequences that require integration over time. As we become familiar with music, associations between notes, melodies, and entire symphonic movements become stronger and more complex. These associations can become so tight that, for example, hearing the end of one album track can elicit a robust image of the upcoming track while anticipating it in total silence. Here, we study this predictive "anticipatory imagery" at various stages throughout learning and investigate activity changes in corresponding neural structures using functional magnetic resonance imaging. Anticipatory imagery (in silence) for highly familiar naturalistic music was accompanied by pronounced activity in rostral prefrontal cortex (PFC) and premotor areas. Examining changes in the neural bases of anticipatory imagery during two stages of learning conditional associations between simple melodies, however, demonstrates the importance of fronto-striatal connections, consistent with a role of the basal ganglia in "training" frontal cortex (Pasupathy and Miller, 2005). Another striking change in neural resources during learning was a shift between caudal PFC earlier to rostral PFC later in learning. Our findings regarding musical anticipation and sound sequence learning are highly compatible with studies of motor sequence learning, suggesting common predictive mechanisms in both domains.

  6. Duplex scanning using sparse data sequences

    DEFF Research Database (Denmark)

    Møllenbach, S. K.; Jensen, Jørgen Arendt

    2008-01-01

    reconstruction of the missing samples possible. The periodic pattern has the length T = M + A samples, where M are for B-mode and A for velocity estimation. The missing samples can now be reconstructed using a filter bank. One filter bank reconstructs one missing sample, so the number of filter banks corresponds...... to M. The number of sub filters in every filter bank is the same as A. Every sub filter contains fractional delay (FD) filter and an interpolation function. Many different sequences can be selected to adapt the B-mode frame rate needed. The drawback of the method is that the maximum velocity detectable......, the fprf and the resolution are 15 MHz, 3.5 kHz, and 12 bit sample (8 kHz and 16 bit for the Carotid artery). The resulting data contains 8000 RF lines with 128 samples at a depth of 45 mm for the vein and 50 mm for Aorta. Sparse sequences are constructed from the full data sequences to have both...

  7. Constrained Optimization of MIMO Training Sequences

    Directory of Open Access Journals (Sweden)

    Coon Justin P

    2007-01-01

    Full Text Available Multiple-input multiple-output (MIMO systems have shown a huge potential for increased spectral efficiency and throughput. With an increasing number of transmitting antennas comes the burden of providing training for channel estimation for coherent detection. In some special cases optimal, in the sense of mean-squared error (MSE, training sequences have been designed. However, in many practical systems it is not feasible to analytically find optimal solutions and numerical techniques must be used. In this paper, two systems (unique word (UW single carrier and OFDM with nulled subcarriers are considered and a method of designing near-optimal training sequences using nonlinear optimization techniques is proposed. In particular, interior-point (IP algorithms such as the barrier method are discussed. Although the two systems seem unrelated, the cost function, which is the MSE of the channel estimate, is shown to be effectively the same for each scenario. Also, additional constraints, such as peak-to-average power ratio (PAPR, are considered and shown to be easily included in the optimization process. Numerical examples illustrate the effectiveness of the designed training sequences, both in terms of MSE and bit-error rate (BER.

  8. Fast and secure retrieval of DNA sequences

    NARCIS (Netherlands)

    2014-01-01

    Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are

  9. Decidability of uniform recurrence of morphic sequences

    OpenAIRE

    Durand , Fabien

    2012-01-01

    We prove that the uniform recurrence of morphic sequences is decidable. For this we show that the number of derived sequences of uniformly recurrent morphic sequences is bounded. As a corollary we obtain that uniformly recurrent morphic sequences are primitive substitutive sequences.

  10. Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species.

    Directory of Open Access Journals (Sweden)

    Pietro Liò

    Full Text Available A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms.

  11. De novo prediction of structured RNAs from genomic sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.; Þórarinsson, Elfar

    2010-01-01

    currently available, because evolutionary conservation highlights functionally important regions. Conserved secondary structure, rather than primary sequence, is the hallmark of many functionally important RNAs, because compensatory substitutions in base-paired regions preserve structure. Unfortunately...

  12. On Linear Combinations of Two Orthogonal Polynomial Sequences on the Unit Circle

    Directory of Open Access Journals (Sweden)

    Suárez C

    2010-01-01

    Full Text Available Let be a monic orthogonal polynomial sequence on the unit circle. We define recursively a new sequence of polynomials by the following linear combination: , , . In this paper, we give necessary and sufficient conditions in order to make be an orthogonal polynomial sequence too. Moreover, we obtain an explicit representation for the Verblunsky coefficients and in terms of and . Finally, we show the relation between their corresponding Carathéodory functions and their associated linear functionals.

  13. Irreducible Tests for Space Mission Sequencing Software

    Science.gov (United States)

    Ferguson, Lisa

    2012-01-01

    As missions extend further into space, the modeling and simulation of their every action and instruction becomes critical. The greater the distance between Earth and the spacecraft, the smaller the window for communication becomes. Therefore, through modeling and simulating the planned operations, the most efficient sequence of commands can be sent to the spacecraft. The Space Mission Sequencing Software is being developed as the next generation of sequencing software to ensure the most efficient communication to interplanetary and deep space mission spacecraft. Aside from efficiency, the software also checks to make sure that communication during a specified time is even possible, meaning that there is not a planet or moon preventing reception of a signal from Earth or that two opposing commands are being given simultaneously. In this way, the software not only models the proposed instructions to the spacecraft, but also validates the commands as well.To ensure that all spacecraft communications are sequenced properly, a timeline is used to structure the data. The created timelines are immutable and once data is as-signed to a timeline, it shall never be deleted nor renamed. This is to prevent the need for storing and filing the timelines for use by other programs. Several types of timelines can be created to accommodate different types of communications (activities, measurements, commands, states, events). Each of these timeline types requires specific parameters and all have options for additional parameters if needed. With so many combinations of parameters available, the robustness and stability of the software is a necessity. Ther