WorldWideScience

Sample records for base sequence

  1. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  2. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  3. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    cophaga ranges from 0.037–0.106 and 0.049–0.207 for COI and ND5 genes, respectively (tables 2 and 3). Analysis of genetic distance on the basis of sequence difference for both the mitochondrial genes shows very little genetic difference. The discrepancy in the phylogenetic trees based on individ- ual genes may be due ...

  4. DNA sequence modeling based on context trees

    NARCIS (Netherlands)

    Kusters, C.J.; Ignatenko, T.; Roland, J.; Horlin, F.

    2015-01-01

    Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such

  5. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  6. Function-Based Algorithms for Biological Sequences

    Science.gov (United States)

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  7. Sequence memory based on coherent spin-interaction neural networks.

    Science.gov (United States)

    Xia, Min; Wong, W K; Wang, Zhijie

    2014-12-01

    Sequence information processing, for instance, the sequence memory, plays an important role on many functions of brain. In the workings of the human brain, the steady-state period is alterable. However, in the existing sequence memory models using heteroassociations, the steady-state period cannot be changed in the sequence recall. In this work, a novel neural network model for sequence memory with controllable steady-state period based on coherent spininteraction is proposed. In the proposed model, neurons fire collectively in a phase-coherent manner, which lets a neuron group respond differently to different patterns and also lets different neuron groups respond differently to one pattern. The simulation results demonstrating the performance of the sequence memory are presented. By introducing a new coherent spin-interaction sequence memory model, the steady-state period can be controlled by dimension parameters and the overlap between the input pattern and the stored patterns. The sequence storage capacity is enlarged by coherent spin interaction compared with the existing sequence memory models. Furthermore, the sequence storage capacity has an exponential relationship to the dimension of the neural network.

  8. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  9. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Indian Academy of Sciences (India)

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  10. Swarm-based Sequencing Recommendations in E-learning

    NARCIS (Netherlands)

    Van den Berg, Bert; Tattersall, Colin; Janssen, José; Brouns, Francis; Kurvers, Hub; Koper, Rob

    2005-01-01

    Van den Berg, B., Tattersall, C., Janssen, J., Brouns, F., Kurvers, H., & Koper, R. (2006). Swarm-based Sequencing Recommendations in E-learning. International Journal of Computer Science & Applications, III(III), 1-11.

  11. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  12. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    Science.gov (United States)

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  13. An assembly sequence planning method based on composite algorithm

    Directory of Open Access Journals (Sweden)

    Enfu LIU

    2016-02-01

    Full Text Available To solve the combination explosion problem and the blind searching problem in assembly sequence planning of complex products, an assembly sequence planning method based on composite algorithm is proposed. In the composite algorithm, a sufficient number of feasible assembly sequences are generated using formalization reasoning algorithm as the initial population of genetic algorithm. Then fuzzy knowledge of assembly is integrated into the planning process of genetic algorithm and ant algorithm to get the accurate solution. At last, an example is conducted to verify the feasibility of composite algorithm.

  14. An optical CDMA system based on chaotic sequences

    Science.gov (United States)

    Liu, Xiao-lei; En, De; Wang, Li-guo

    2014-03-01

    In this paper, a coherent asynchronous optical code division multiple access (OCDMA) system is proposed, whose encoder/decoder is an all-optical generator. This all-optical generator can generate analog and bipolar chaotic sequences satisfying the logistic maps. The formula of bit error rate (BER) is derived, and the relationship of BER and the number of simultaneous transmissions is analyzed. Due to the good property of correlation, this coherent OCDMA system based on these bipolar chaotic sequences can support a large number of simultaneous users, which shows that these chaotic sequences are suitable for asynchronous OCDMA system.

  15. Automation tools for accelerator control a network based sequencer

    International Nuclear Information System (INIS)

    Clout, P.; Geib, M.; Westervelt, R.

    1991-01-01

    In conjunction with a major client, Vista Control Systems has developed a sequencer for control systems which works in conjunction with its realtime, distributed Vsystem database. Vsystem is a network-based data acquisition, monitoring and control system which has been applied successfully to both accelerator projects and projects outside this realm of research. The network-based sequencer allows a user to simply define a thread of execution in any supported computer on the network. The script defining a sequence has a simple syntax designed for non-programmers, with facilities for selectively abbreviating the channel names for easy reference. The semantics of the script contains most of the familiar capabilities of conventional programming languages, including standard stream I/O and the ability to start other processes with parameters passed. The script is compiled to threaded code for execution efficiency. The implementation is described in some detail and examples are given of applications for which the sequencer has been used

  16. Thermodynamics-based models of transcriptional regulation with gene sequence.

    Science.gov (United States)

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  17. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  18. (Brassicaceae) based on nuclear ribosomal ITS DNA sequences

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 93; Issue 2. Phylogeny and biogeography of Alyssum (Brassicaceae) based on nuclear ribosomal ITS DNA sequences. Yan Li Yan Kong Zhe Zhang Yanqiang Yin Bin Liu Guanghui Lv Xiyong Wang. Research Article Volume 93 Issue 2 August 2014 pp 313-323 ...

  19. A New Images Hiding Scheme Based on Chaotic Sequences

    Institute of Scientific and Technical Information of China (English)

    LIU Nian-sheng; GUO Dong-hui; WU Bo-xi; Parr G

    2005-01-01

    We propose a data hidding technique in a still image. This technique is based on chaotic sequence in the transform domain of covert image. We use different chaotic random sequences multiplied by multiple sensitive images, respectively, to spread the spectrum of sensitive images. Multiple sensitive images are hidden in a covert image as a form of noise. The results of theoretical analysis and computer simulation show the new hiding technique have better properties with high security, imperceptibility and capacity for hidden information in comparison with the conventional scheme such as LSB (Least Significance Bit).

  20. Skeleton-based human action recognition using multiple sequence alignment

    Science.gov (United States)

    Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong

    2015-05-01

    Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.

  1. antaRNA: ant colony-based RNA sequence design.

    Science.gov (United States)

    Kleinkauf, Robert; Mann, Martin; Backofen, Rolf

    2015-10-01

    RNA sequence design is studied at least as long as the classical folding problem. Although for the latter the functional fold of an RNA molecule is to be found ,: inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology ,: reliable RNA sequence design becomes a crucial step to generate novel biochemical components. In this article ,: the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GC-content distribution ,: specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets. http://www.bioinf.uni-freiburg.de/Software/antaRNA CONTACT: backofen@informatik.uni-freiburg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  2. A sequence-dependent rigid-base model of DNA

    Science.gov (United States)

    Gonzalez, O.; Petkevičiutė, D.; Maddocks, J. H.

    2013-02-01

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  3. A sequence-dependent rigid-base model of DNA.

    Science.gov (United States)

    Gonzalez, O; Petkevičiūtė, D; Maddocks, J H

    2013-02-07

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  4. Crash sequence based risk matrix for motorcycle crashes.

    Science.gov (United States)

    Wu, Kun-Feng; Sasidharan, Lekshmi; Thor, Craig P; Chen, Sheng-Yin

    2018-04-05

    Considerable research has been conducted related to motorcycle and other powered-two-wheeler (PTW) crashes; however, it always has been controversial among practitioners concerning with types of crashes should be first targeted and how to prioritize resources for the implementation of mitigating actions. Therefore, there is a need to identify types of motorcycle crashes that constitute the greatest safety risk to riders - most frequent and most severe crashes. This pilot study seeks exhibit the efficacy of a new approach for prioritizing PTW crash causation sequences as they relate to injury severity to better inform the application of mitigating countermeasures. To accomplish this, the present study constructed a crash sequence-based risk matrix to identify most frequent and most severe motorcycle crashes in an attempt to better connect causes and countermeasures of PTW crashes. Although the frequency of each crash sequence can be computed from crash data, a crash severity model is needed to compare the levels of crash severity among different crash sequences, while controlling for other factors that also have effects on crash severity such drivers' age, use of helmet, etc. The construction of risk matrix based on crash sequences involve two tasks: formulation of crash sequence and the estimation of a mixed-effects (ME) model to adjust the levels of severities for each crash sequence to account for other crash contributing factors that would have an effect on the maximum level of crash severity in a crash. Three data elements from the National Automotive Sampling System - General Estimating System (NASS-GES) data were utilized to form a crash sequence: critical event, crash types, and sequence of events. A mixed-effects model was constructed to model the severity levels for each crash sequence while accounting for the effects of those crash contributing factors on crash severity. A total of 8039 crashes involving 8208 motorcycles occurred during 2011 and 2013 were

  5. Phylogeny of the Serrasalmidae (Characiformes based on mitochondrial DNA sequences

    Directory of Open Access Journals (Sweden)

    Guillermo Ortí

    2008-01-01

    Full Text Available Previous studies based on DNA sequences of mitochondrial (mt rRNA genes showed three main groups within the subfamily Serrasalminae: (1 a "pacu" clade of herbivores (Colossoma, Mylossoma, Piaractus; (2 the "Myleus" clade (Myleus, Mylesinus, Tometes, Ossubtus; and (3 the "piranha" clade (Serrasalmus, Pygocentrus, Pygopristis, Pristobrycon, Catoprion, Metynnis. The genus Acnodon was placed as the sister taxon of clade (2+3. However, poor resolution within each clade was obtained due to low levels of variation among rRNA gene sequences. Complete sequences of the hypervariable mtDNA control region for a total of 45 taxa, and additional sequences of 12S and 16S rRNA from a total of 74 taxa representing all genera in the family are now presented to address intragroup relationships. Control region sequences of several serrasalmid species exhibit tandem repeats of short motifs (12 to 33 bp in the 3' end of this region, accounting for substantial length variation. Bayesian inference and maximum parsimony analyses of these sequences identify the same groupings as before and provide further evidence to support the following observations: (a Serrasalmus gouldingi and species of Pristobrycon (non-striolatus form a monophyletic group that is the sister group to other species of Serrasalmus and Pygocentrus; (b Catoprion, Pygopristis, and Pristobrycon striolatus form a well supported clade, sister to the group described above; (c some taxa assigned to the genus Myloplus (M. asterias, M tiete, M ternetzi, and M rubripinnis form a well supported group whereas other Myloplus species remain with uncertain affinities (d Mylesinus, Tometes and Myleus setiger form a monophyletic group.

  6. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    KAUST Repository

    Brown, Judith K.

    2015-04-18

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  7. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    KAUST Repository

    Brown, Judith K.; Zerbini, F. Murilo; Navas-Castillo, Jesú s; Moriones, Enrique; Ramos-Sobrinho, Roberto; Silva, José C. F.; Fiallo-Olivé , Elvira; Briddon, Rob W.; Herná ndez-Zepeda, Cecilia; Idris, Ali; Malathi, V. G.; Martin, Darren P.; Rivera-Bustamante, Rafael; Ueda, Shigenori; Varsani, Arvind

    2015-01-01

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  8. Speeding disease gene discovery by sequence based candidate prioritization

    Directory of Open Access Journals (Sweden)

    Porteous David J

    2005-03-01

    Full Text Available Abstract Background Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. Results We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. Conclusion PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.

  9. Disk-based compression of data from genome sequencing.

    Science.gov (United States)

    Grabowski, Szymon; Deorowicz, Sebastian; Roguski, Łukasz

    2015-05-01

    High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since the redundancy between overlapping reads cannot be easily captured in the (relatively small) main memory. More interesting solutions for this problem are disk based, where the better of these two, from Cox et al. (2012), is based on the Burrows-Wheeler transform (BWT) and achieves 0.518 bits per base for a 134.0 Gbp human genome sequencing collection with almost 45-fold coverage. We propose overlapping reads compression with minimizers, a compression algorithm dedicated to sequencing reads (DNA only). Our method makes use of a conceptually simple and easily parallelizable idea of minimizers, to obtain 0.317 bits per base as the compression ratio, allowing to fit the 134.0 Gbp dataset into only 5.31 GB of space. http://sun.aei.polsl.pl/orcom under a free license. sebastian.deorowicz@polsl.pl Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Protein sequencing via nanopore based devices: a nanofluidics perspective

    Science.gov (United States)

    Chinappi, Mauro; Cecconi, Fabio

    2018-05-01

    Proteins perform a huge number of central functions in living organisms, thus all the new techniques allowing their precise, fast and accurate characterization at single-molecule level certainly represent a burst in proteomics with important biomedical impact. In this review, we describe the recent progresses in the developing of nanopore based devices for protein sequencing. We start with a critical analysis of the main technical requirements for nanopore protein sequencing, summarizing some ideas and methodologies that have recently appeared in the literature. In the last sections, we focus on the physical modelling of the transport phenomena occurring in nanopore based devices. The multiscale nature of the problem is discussed and, in this respect, some of the main possible computational approaches are illustrated.

  11. Spike-Based Bayesian-Hebbian Learning of Temporal Sequences.

    Directory of Open Access Journals (Sweden)

    Philip J Tully

    2016-05-01

    Full Text Available Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN learning rule. We find that the formation of distributed memories, embodied by increased periods of firing in pools of excitatory neurons, together with asymmetrical associations between these distinct network states, can be acquired through plasticity. The model's feasibility is demonstrated using simulations of adaptive exponential integrate-and-fire model neurons (AdEx. We show that the learning and speed of sequence replay depends on a confluence of biophysically relevant parameters including stimulus duration, level of background noise, ratio of synaptic currents, and strengths of short-term depression and adaptation. Moreover, sequence elements are shown to flexibly participate multiple times in the sequence, suggesting that spiking attractor networks of this type can support an efficient combinatorial code. The model provides a principled approach towards understanding how multiple interacting plasticity mechanisms can coordinate hetero-associative learning in unison.

  12. Digital chaotic sequence generator based on coupled chaotic systems

    International Nuclear Information System (INIS)

    Shu-Bo, Liu; Jing, Sun; Jin-Shuo, Liu; Zheng-Quan, Xu

    2009-01-01

    Chaotic systems perform well as a new rich source of cryptography and pseudo-random coding. Unfortunately their digital dynamical properties would degrade due to the finite computing precision. Proposed in this paper is a modified digital chaotic sequence generator based on chaotic logistic systems with a coupling structure where one chaotic subsystem generates perturbation signals to disturb the control parameter of the other one. The numerical simulations show that the length of chaotic orbits, the output distribution of chaotic system, and the security of chaotic sequences have been greatly improved. Moreover the chaotic sequence period can be extended at least by one order of magnitude longer than that of the uncoupled logistic system and the difficulty in decrypting increases 2 128 *2 128 times indicating that the dynamical degradation of digital chaos is effectively improved. A field programmable gate array (FPGA) implementation of an algorithm is given and the corresponding experiment shows that the output speed of the generated chaotic sequences can reach 571.4 Mbps indicating that the designed generator can be applied to the real-time video image encryption. (general)

  13. Sequence-based classification and identification of Fungi.

    Science.gov (United States)

    Hibbett, David; Abarenkov, Kessy; Kõljalg, Urmas; Öpik, Maarja; Chai, Benli; Cole, James; Wang, Qiong; Crous, Pedro; Robert, Vincent; Helgason, Thorunn; Herr, Joshua R; Kirk, Paul; Lueschow, Shiloh; O'Donnell, Kerry; Nilsson, R Henrik; Oono, Ryoko; Schoch, Conrad; Smyth, Christopher; Walker, Donald M; Porras-Alfaro, Andrea; Taylor, John W; Geiser, David M

    Fungal taxonomy and ecology have been revolutionized by the application of molecular methods and both have increasing connections to genomics and functional biology. However, data streams from traditional specimen- and culture-based systematics are not yet fully integrated with those from metagenomic and metatranscriptomic studies, which limits understanding of the taxonomic diversity and metabolic properties of fungal communities. This article reviews current resources, needs, and opportunities for sequence-based classification and identification (SBCI) in fungi as well as related efforts in prokaryotes. To realize the full potential of fungal SBCI it will be necessary to make advances in multiple areas. Improvements in sequencing methods, including long-read and single-cell technologies, will empower fungal molecular ecologists to look beyond ITS and current shotgun metagenomics approaches. Data quality and accessibility will be enhanced by attention to data and metadata standards and rigorous enforcement of policies for deposition of data and workflows. Taxonomic communities will need to develop best practices for molecular characterization in their focal clades, while also contributing to globally useful datasets including ITS. Changes to nomenclatural rules are needed to enable validPUBLICation of sequence-based taxon descriptions. Finally, cultural shifts are necessary to promote adoption of SBCI and to accord professional credit to individuals who contribute to community resources.

  14. Prediction of potential drug targets based on simple sequence properties

    Directory of Open Access Journals (Sweden)

    Lai Luhua

    2007-09-01

    Full Text Available Abstract Background During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets. Results Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research. Conclusion We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.

  15. Base Sequence Context Effects on Nucleotide Excision Repair

    Directory of Open Access Journals (Sweden)

    Yuqin Cai

    2010-01-01

    Full Text Available Nucleotide excision repair (NER plays a critical role in maintaining the integrity of the genome when damaged by bulky DNA lesions, since inefficient repair can cause mutations and human diseases notably cancer. The structural properties of DNA lesions that determine their relative susceptibilities to NER are therefore of great interest. As a model system, we have investigated the major mutagenic lesion derived from the environmental carcinogen benzo[a]pyrene (B[a]P, 10S (+-trans-anti-B[a]P-2-dG in six different sequence contexts that differ in how the lesion is positioned in relation to nearby guanine amino groups. We have obtained molecular structural data by NMR and MD simulations, bending properties from gel electrophoresis studies, and NER data obtained from human HeLa cell extracts for our six investigated sequence contexts. This model system suggests that disturbed Watson-Crick base pairing is a better recognition signal than a flexible bend, and that these can act in concert to provide an enhanced signal. Steric hinderance between the minor groove-aligned lesion and nearby guanine amino groups determines the exact nature of the disturbances. Both nearest neighbor and more distant neighbor sequence contexts have an impact. Regardless of the exact distortions, we hypothesize that they provide a local thermodynamic destabilization signal for repair.

  16. Studies of base pair sequence effects on DNA solvation based on all

    Indian Academy of Sciences (India)

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization ...

  17. Single-base resolution and long-coverage sequencing based on single-molecule nanomanipulation

    International Nuclear Information System (INIS)

    An Hongjie; Huang Jiehuan; Lue Ming; Li Xueling; Lue Junhong; Li Haikuo; Zhang Yi; Li Minqian; Hu Jun

    2007-01-01

    We show new approaches towards a novel single-molecule sequencing strategy which consists of high-resolution positioning isolation of overlapping DNA fragments with atomic force microscopy (AFM), subsequent single-molecule PCR amplification and conventional Sanger sequencing. In this study, a DNA labelling technique was used to guarantee the accuracy in positioning the target DNA. Single-molecule multiplex PCR was carried out to test the contamination. The results showed that the two overlapping DNA fragments isolated by AFM could be successfully sequenced with high quality and perfect contiguity, indicating that single-base resolution and long-coverage sequencing have been achieved simultaneously

  18. Spike-Based Bayesian-Hebbian Learning of Temporal Sequences

    DEFF Research Database (Denmark)

    Tully, Philip J; Lindén, Henrik; Hennig, Matthias H

    2016-01-01

    Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed...... in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN) learning rule. We find that the formation of distributed memories, embodied by increased periods...

  19. Centroid based clustering of high throughput sequencing reads based on n-mer counts.

    Science.gov (United States)

    Solovyov, Alexander; Lipkin, W Ian

    2013-09-08

    Many problems in computational biology require alignment-free sequence comparisons. One of the common tasks involving sequence comparison is sequence clustering. Here we apply methods of alignment-free comparison (in particular, comparison using sequence composition) to the challenge of sequence clustering. We study several centroid based algorithms for clustering sequences based on word counts. Study of their performance shows that using k-means algorithm with or without the data whitening is efficient from the computational point of view. A higher clustering accuracy can be achieved using the soft expectation maximization method, whereby each sequence is attributed to each cluster with a specific probability. We implement an open source tool for alignment-free clustering. It is publicly available from github: https://github.com/luscinius/afcluster. We show the utility of alignment-free sequence clustering for high throughput sequencing analysis despite its limitations. In particular, it allows one to perform assembly with reduced resources and a minimal loss of quality. The major factor affecting performance of alignment-free read clustering is the length of the read.

  20. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  1. Streaming support for data intensive cloud-based sequence analysis.

    Science.gov (United States)

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  2. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Shadi A. Issa

    2013-01-01

    Full Text Available Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  3. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  4. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  5. A trace display and editing program for data from fluorescence based sequencing machines.

    Science.gov (United States)

    Gleeson, T; Hillier, L

    1991-12-11

    'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.

  6. Heart rate measurement based on face video sequence

    Science.gov (United States)

    Xu, Fang; Zhou, Qin-Wu; Wu, Peng; Chen, Xing; Yang, Xiaofeng; Yan, Hong-jian

    2015-03-01

    This paper proposes a new non-contact heart rate measurement method based on photoplethysmography (PPG) theory. With this method we can measure heart rate remotely with a camera and ambient light. We collected video sequences of subjects, and detected remote PPG signals through video sequences. Remote PPG signals were analyzed with two methods, Blind Source Separation Technology (BSST) and Cross Spectral Power Technology (CSPT). BSST is a commonly used method, and CSPT is used for the first time in the study of remote PPG signals in this paper. Both of the methods can acquire heart rate, but compared with BSST, CSPT has clearer physical meaning, and the computational complexity of CSPT is lower than that of BSST. Our work shows that heart rates detected by CSPT method have good consistency with the heart rates measured by a finger clip oximeter. With good accuracy and low computational complexity, the CSPT method has a good prospect for the application in the field of home medical devices and mobile health devices.

  7. Development of Sequence-Based Microsatellite Marker for Phalaenopsis Orchid

    Directory of Open Access Journals (Sweden)

    FATIMAH

    2011-06-01

    Full Text Available Phalaenopsis is one of the most interesting genera of orchids due to the members are often used as parents to produce hybrids. The establishment and development of highly reliable and discriminatory methods for identifying species and cultivars has become increasingly more important to plant breeders and members of the nursery industry. The aim of this research was to develop sequence-based microsatellite (eSSR markers for the Phalaenopsis orchid designed from the sequence of GenBank NCBI. Seventeen primers were designed and thirteen primers pairs could amplify the DNA giving the expected PCR product with polymorphism. A total of 51 alleles, with an average of 3 alleles per locus and polymorphism information content (PIC values at 0.674, were detected at the 16 SSR loci. Therefore, these markers could be used for identification of the Phalaenopsis orchid used in this study. Genetic similarity and principle coordinate analysis identified five major groups of Phalaenopsis sp. the first group consisted of P. amabilis, P. fuscata, P. javanica, and P. zebrine. The second group consisted of P. amabilis, P. amboinensis, P. bellina, P. floresens, and P. mannii. The third group consisted of P. bellina, P. cornucervi, P. cornucervi, P. violaceae sumatra, P. modesta. The forth group consisted of P. cornucervi and P. lueddemanniana, and the fifth group was P. amboinensis.

  8. The sequence relay selection strategy based on stochastic dynamic programming

    Science.gov (United States)

    Zhu, Rui; Chen, Xihao; Huang, Yangchao

    2017-07-01

    Relay-assisted (RA) network with relay node selection is a kind of effective method to improve the channel capacity and convergence performance. However, most of the existing researches about the relay selection did not consider the statically channel state information and the selection cost. This shortage limited the performance and application of RA network in practical scenarios. In order to overcome this drawback, a sequence relay selection strategy (SRSS) was proposed. And the performance upper bound of SRSS was also analyzed in this paper. Furthermore, in order to make SRSS more practical, a novel threshold determination algorithm based on the stochastic dynamic program (SDP) was given to work with SRSS. Numerical results are also presented to exhibit the performance of SRSS with SDP.

  9. Model-based quality assessment and base-calling for second-generation sequencing data.

    Science.gov (United States)

    Bravo, Héctor Corrada; Irizarry, Rafael A

    2010-09-01

    Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in

  10. Amelioration of the cooling load based chiller sequencing control

    International Nuclear Information System (INIS)

    Huang, Sen; Zuo, Wangda; Sohn, Michael D.

    2016-01-01

    Highlights: • We developed a new approach for the optimal load distribution for chillers. • We proposed a new approach to optimize the number of operating chillers. • We provided a holistic solution to address chiller sequencing control problems. - Abstract: Cooling Load based Control (CLC) for the chiller sequencing is a commonly used control strategy for multiple-chiller plants. To improve the energy efficiency of these chiller plants, researchers proposed various CLC optimization approaches, which can be divided into two groups: studies to optimize the load distribution and studies to identify the optimal number of operating chillers. However, both groups have their own deficiencies and do not consider the impact of each other. This paper aims to improve the CLC by proposing three new approaches. The first optimizes the load distribution by adjusting the critical points for the chiller staging, which is easier to be implemented than existing approaches. In addition, by considering the impact of the load distribution on the cooling tower energy consumption and the pump energy consumption, this approach can achieve a better energy saving. The second optimizes the number of operating chillers by modulating the critical points and the condenser water set point in order to achieve the minimal energy consumption of the entire chiller plant that may not be guaranteed by existing approaches. The third combines the first two approaches to provide a holistic solution. The proposed three approaches were evaluated via a case study. The results show that the total energy consumption saving for the studied chiller plant is 0.5%, 5.3% and 5.6% by the three approaches, respectively. An energy saving of 4.9–11.8% can be achieved for the chillers at the cost of more energy consumption by the cooling towers (increases of 5.8–43.8%). The pumps’ energy saving varies from −8.6% to 2.0%, depending on the approach.

  11. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Directory of Open Access Journals (Sweden)

    Jason D Thompson

    Full Text Available Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  12. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Science.gov (United States)

    Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  13. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure...... that the data produced is optimal. Although much of the procedure can be followed directly from the manufacturer's protocols, the key differences lie in the library preparation steps. This chapter presents an optimized protocol for the sequencing of fossil remains and museum specimens, commonly referred...

  14. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    Directory of Open Access Journals (Sweden)

    Yunyun Liang

    2015-01-01

    Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM. Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS, segmented PsePSSM, and segmented autocovariance transformation (ACT based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640 are adopted in this paper. Then a 700-dimensional (700D feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA. To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  15. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

    Czech Academy of Sciences Publication Activity Database

    Novák, Petr; Neumann, Pavel; Macas, Jiří

    2010-01-01

    Roč. 11, č. 1 (2010), s. 378-389 ISSN 1471-2105 R&D Projects: GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004 Institutional research plan: CEZ:AV0Z50510513 Keywords : repetitive DNA * plant genome * next generation sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.028, year: 2010

  16. PHARMACOGENETIC TESTING OPPORTUNITIES IN CARDIOLOGY BASED ON EXOME SEQUENCING

    Directory of Open Access Journals (Sweden)

    N. V. Shcherbakova

    2014-01-01

    Full Text Available Aim. To study what cardiac drugs currently have any comments on biomarkers and what information can be obtained by pharmacogenetic testing using data exome sequencing in patients with cardiac diseases.Material and methods. Exome sequencing in random participant of the ATEROGEN IVANOVO study and bioinformatics analysis of the data were performed. Point mutations were annotated using ANNOVAR program, as well as comparison with a number of specialized databases was done on the basis of user protocols.Results. 11 cardiac drugs and 7 genes which variants can influence cardiac drug metabolism were analyzed. According to exome sequencing of the participant we did not reveal allelic variants that require dose regime correction and careful efficacy control.Conclusion. The exome sequencing application is the next step to a wide range of personalized therapy. Future opportunities for improvement of the risk-benefit ratio in each patient are the main purpose of the collection and analysis of pharmacogenetic data.

  17. Spreadsheet-based program for alignment of overlapping DNA sequences.

    Science.gov (United States)

    Anbazhagan, R; Gabrielson, E

    1999-06-01

    Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.

  18. Hybridization and sequencing of nucleic acids using base pair mismatches

    Science.gov (United States)

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  19. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Science.gov (United States)

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  20. Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

    Science.gov (United States)

    Chen, Zitian; Zhou, Wenxiong; Qiao, Shuo; Kang, Li; Duan, Haifeng; Xie, X Sunney; Huang, Yanyi

    2017-12-01

    Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory-based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.

  1. BPP: a sequence-based algorithm for branch point prediction.

    Science.gov (United States)

    Zhang, Qing; Fan, Xiaodan; Wang, Yejun; Sun, Ming-An; Shao, Jianlin; Guo, Dianjing

    2017-10-15

    Although high-throughput sequencing methods have been proposed to identify splicing branch points in the human genome, these methods can only detect a small fraction of the branch points subject to the sequencing depth, experimental cost and the expression level of the mRNA. An accurate computational model for branch point prediction is therefore an ongoing objective in human genome research. We here propose a novel branch point prediction algorithm that utilizes information on the branch point sequence and the polypyrimidine tract. Using experimentally validated data, we demonstrate that our proposed method outperforms existing methods. Availability and implementation: https://github.com/zhqingit/BPP. djguo@cuhk.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. Predicting tissue-specific expressions based on sequence characteristics

    KAUST Repository

    Paik, Hyojung; Ryu, Tae Woo; Heo, Hyoungsam; Seo, Seungwon; Lee, Doheon; Hur, Cheolgoo

    2011-01-01

    In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.

  3. Predicting tissue-specific expressions based on sequence characteristics

    KAUST Repository

    Paik, Hyojung

    2011-04-30

    In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.

  4. Autonomously generating operations sequences for a Mars Rover using AI-based planning

    Science.gov (United States)

    Sherwood, Rob; Mishkin, Andrew; Estlin, Tara; Chien, Steve; Backes, Paul; Cooper, Brian; Maxwell, Scott; Rabideau, Gregg

    2001-01-01

    This paper discusses a proof-of-concept prototype for ground-based automatic generation of validated rover command sequences from highlevel science and engineering activities. This prototype is based on ASPEN, the Automated Scheduling and Planning Environment. This Artificial Intelligence (AI) based planning and scheduling system will automatically generate a command sequence that will execute within resource constraints and satisfy flight rules.

  5. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  6. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  7. An EM based approach for motion segmentation of video sequence

    NARCIS (Netherlands)

    Zhao, Wei; Roos, Nico; Pan, Zhigeng; Skala, Vaclav

    2016-01-01

    Motions are important features for robot vision as we live in a dynamic world. Detecting moving objects is crucial for mobile robots and computer vision systems. This paper investigates an architecture for the segmentation of moving objects from image sequences. Objects are represented as groups of

  8. Phylogenetic relationships of Salmonella based on rRNA sequences

    DEFF Research Database (Denmark)

    Christensen, H.; Nordentoft, Steen; Olsen, J.E.

    1998-01-01

    separated by 16S rRNA analysis and found to be closely related to the Escherichia coli and Shigella complex by both 16S and 23S rRNA analyses. The diphasic serotypes S. enterica subspp. I and VI were separated from the monophasic serotypes subspp. IIIa and IV, including S. bongori, by 23S rRNA sequence...

  9. Instruction sequence based non-uniform complexity classes

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2013-01-01

    We present an approach to non-uniform complexity in which single-pass instruction sequences play a key part, and answer various questions that arise from this approach. We introduce several kinds of non-uniform complexity classes. One kind includes a counterpart of the well-known non-uniform

  10. Simple sequence repeat (SSR)-based genetic variability among ...

    African Journals Online (AJOL)

    The objective of this study was to compare if simple sequence repeat (SSR) markers could correctly identify peanut genotypes with difference in specific leaf weight (SLW) and relative water content (RWC). Four peanut genotypes and two water regimes (FC and 1/3 available water; 1/3 AW) were arranged in factorial ...

  11. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  12. MR-based attenuation correction in brain PET based on UTE sequences

    Energy Technology Data Exchange (ETDEWEB)

    Cabello, Jorge; Nekolla, Stephan G; Ziegler, Sibylle I [Department of Nuclear Medicine, Klinikum rechts der Isar, Technische Universität München (Germany)

    2014-07-29

    Attenuation correction (AC) in brain PET/MR has recently emerged as one of the challenging tasks in the PET/MR field. It has been shown that to ignore the attenuation produced by bone can lead to errors ranging from 5-30% in regions close to bone structures. Since the information provided by the MR signal is not directly related to tissue attenuation, alternative methods have to be developed. Signal from bone tissue is difficult to measure given its short transverse relaxation time (T2). Ultrashort-echo time (UTE) pulse sequences were developed to measure signal from tissues with short T2. A combination of two consecutive UTE echoes has been used in several works to measure signal from bone tissue. The first echo is able to measure signal from bone tissue in addition to soft tissue, while the second echo contains most of the soft tissue contained in the first echo but not bone. In this work we extract the attenuation information from the difference between the logarithm of two images obtained after applying two consecutive UTE pulse sequences using the mMR scanner (Siemens Healthcare). Subsequently, image processing techniques are applied to reduce the noise and extract air cavities within the head. The resulting image is converted to linear attenuation coefficients, generating what is known as µ-map, to be used during reconstruction. For comparison purposes PET/CT scans of the same patients were acquired prior to the PET/MR scan. Additional µ-maps obtained for comparison were extracted from a Dixon sequence (used in clinical routine) and an additional µ-map calculated by the scanner based on UTE pulse sequences. Preliminary quantitative results measured in the cerebellum, using the value obtained with CT-based AC as reference, show differences of 34% without AC, 13% using the Dixon-based and UTE-based provided by the scanner, and 0.8% with the AC strategy presented here.

  13. Novel DNA sequence detection method based on fluorescence energy transfer

    International Nuclear Information System (INIS)

    Kobayashi, S.; Tamiya, E.; Karube, I.

    1987-01-01

    Recently the detection of specific DNA sequence, DNA analysis, has been becoming more important for diagnosis of viral genomes causing infections disease and human sequences related to inherited disorders. These methods typically involve electrophoresis, the immobilization of DNA on a solid support, hybridization to a complementary probe, the detection using labeled with /sup 32/P or nonisotopically with a biotin-avidin-enzyme system, and so on. These techniques are highly effective, but they are very time-consuming and expensive. A principle of fluorescene energy transfer is that the light energy from an excited donor (fluorophore) is transferred to an acceptor (fluorophore), if the acceptor exists in the vicinity of the donor and the excitation spectrum of donor overlaps the emission spectrum of acceptor. In this study, the fluorescence energy transfer was applied to the detection of specific DNA sequence using the hybridization method. The analyte, single-stranded DNA labeled with the donor fluorophore is hybridized to a probe DNA labeled with the acceptor. Because of the complementary DNA duplex formation, two fluorophores became to be closed to each other, and the fluorescence energy transfer was occurred

  14. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... and computational protocol for detecting the reverse transcription termination sites (RTTS-Seq). This protocol was subsequently applied to hydroxyl radical footprinting of three dimensional RNA structures to give a probing signal that correlates well with the RNA backbone solvent accessibility. Moreover, we applied...

  15. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Directory of Open Access Journals (Sweden)

    Xiaoxia Yang

    Full Text Available Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  16. Elman RNN based classification of proteins sequences on account of their mutual information.

    Science.gov (United States)

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

    Science.gov (United States)

    Choi, Woo-Yong; Chatterjee, Mainak

    2015-03-01

    With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

  18. A base composition analysis of natural patterns for the preprocessing of metagenome sequences.

    Science.gov (United States)

    Bonham-Carter, Oliver; Ali, Hesham; Bastola, Dhundy

    2013-01-01

    On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.

  19. Complex programmable logic device based alarm sequencer for nuclear power plants

    International Nuclear Information System (INIS)

    Khedkar, Ravindra; Solomon, J. Selva; KrishnaKumar, B.

    2001-01-01

    Complex Programmable Logic Device based Alarm Sequencer is an instrument, which detects alarms, memorizes them and displays the sequences of occurrence of alarms. It caters to sixteen alarm signals and distinguishes the sequence among any two alarms with a time resolution of 1 ms. The system described has been designed for continuous operation in process plants, nuclear power plants etc. The system has been tested and found to be working satisfactorily. (author)

  20. LookSeq: A browser-based viewer for deep sequencing data

    OpenAIRE

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P.

    2009-01-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an ov...

  1. Recent advances in nanopore-based nucleic acid analysis and sequencing

    International Nuclear Information System (INIS)

    Shi, Jidong; Fang, Ying; Hou, Junfeng

    2016-01-01

    Nanopore-based sequencing platforms are transforming the field of genomic science. This review (containing 116 references) highlights some recent progress on nanopore-based nucleic acid analysis and sequencing. These studies are classified into three categories, biological, solid-state, and hybrid nanopores, according to their nanoporous materials. We begin with a brief description of the translocation-based detection mechanism of nanopores. Next, specific examples are given in nanopore-based nucleic acid analysis and sequencing, with an emphasis on identifying strategies that can improve the resolution of nanopores. This review concludes with a discussion of future research directions that will advance the practical applications of nanopore technology. (author)

  2. An Analysis of Delay-based and Integrator-based Sequence Detectors for Grid-Connected Converters

    DEFF Research Database (Denmark)

    Khazraj, Hesam; Silva, Filipe Miguel Faria da; Bak, Claus Leth

    2017-01-01

    -signal cancellation operators are the main members of the delay-based sequence detectors. The aim of this paper is to provide a theoretical and experimental comparative study between integrator and delay based sequence detectors. The theoretical analysis is conducted based on the small-signal modelling......Detecting and separating positive and negative sequence components of the grid voltage or current is of vital importance in the control of grid-connected power converters, HVDC systems, etc. To this end, several techniques have been proposed in recent years. These techniques can be broadly...... classified into two main classes: The integrator-based techniques and Delay-based techniques. The complex-coefficient filter-based technique, dual second-order generalized integrator-based method, multiple reference frame approach are the main members of the integrator-based sequence detector and the delay...

  3. Comparison of ompP5 sequence-based typing and pulsed-filed gel ...

    African Journals Online (AJOL)

    In this study, comparison of the outer membrane protein P5 gene (ompP5) sequence-based typing with pulsed-field gel electrophoresis (PFGE) for the genotyping of Haemophilus parasuis, the 15 serovar reference strains and 43 isolates were investigated. When comparing the two methods, 31 ompP5 sequence types ...

  4. pyPaSWAS : Python-based multi-core CPU and GPU sequence alignment

    NARCIS (Netherlands)

    Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

    2018-01-01

    BACKGROUND: Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of

  5. COI (cytochrome oxidase-I) sequence based studies of Carangid fishes from Kakinada coast, India.

    Science.gov (United States)

    Persis, M; Chandra Sekhar Reddy, A; Rao, L M; Khedkar, G D; Ravinder, K; Nasruddin, K

    2009-09-01

    Mitochondrial DNA, cytochrome oxidase-1 gene sequences were analyzed for species identification and phylogenetic relationship among the very high food value and commercially important Indian carangid fish species. Sequence analysis of COI gene very clearly indicated that all the 28 fish species fell into five distinct groups, which are genetically distant from each other and exhibited identical phylogenetic reservation. All the COI gene sequences from 28 fishes provide sufficient phylogenetic information and evolutionary relationship to distinguish the carangid species unambiguously. This study proves the utility of mtDNA COI gene sequence based approach in identifying fish species at a faster pace.

  6. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads

    Directory of Open Access Journals (Sweden)

    Chengxi Ye

    2016-06-01

    Full Text Available Motivation. The third generation sequencing (3GS technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS technologies (less than 1%. Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time. Availability. The source code is available for download at https://github.com/yechengxi/Sparc.

  7. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    Science.gov (United States)

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  8. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

    Science.gov (United States)

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-08-01

    RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.

  9. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

    Science.gov (United States)

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-01-01

    Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465

  10. Graph-based sequence annotation using a data integration approach

    Directory of Open Access Journals (Sweden)

    Pesch Robert

    2008-06-01

    Full Text Available The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.

  11. Graph-based sequence annotation using a data integration approach.

    Science.gov (United States)

    Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

    2008-08-25

    The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

  12. An efficient binomial model-based measure for sequence comparison and its application.

    Science.gov (United States)

    Liu, Xiaoqing; Dai, Qi; Li, Lihua; He, Zerong

    2011-04-01

    Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.

  13. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    Directory of Open Access Journals (Sweden)

    Claros M Gonzalo

    2010-06-01

    Full Text Available Abstract Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used

  14. ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.

    Science.gov (United States)

    Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia

    2017-12-01

    Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.

  15. A Window Into Clinical Next-Generation Sequencing-Based Oncology Testing Practices.

    Science.gov (United States)

    Nagarajan, Rakesh; Bartley, Angela N; Bridge, Julia A; Jennings, Lawrence J; Kamel-Reid, Suzanne; Kim, Annette; Lazar, Alexander J; Lindeman, Neal I; Moncur, Joel; Rai, Alex J; Routbort, Mark J; Vasalos, Patricia; Merker, Jason D

    2017-12-01

    - Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. - To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing-based oncology testing practices. - College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing-based oncology testing. - These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing-based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing-based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. - This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing-based oncology testing, and precision oncology efforts in a data-driven manner.

  16. Multifunctional hybrid networks based on self assembling peptide sequences

    Science.gov (United States)

    Sathaye, Sameer

    The overall aim of this dissertation is to achieve a comprehensive correlation between the molecular level changes in primary amino acid sequences of amphiphilic beta-hairpin peptides and their consequent solution-assembly properties and bulk network hydrogel behavior. This has been accomplished using two broad approaches. In the first approach, amino acid substitutions were made to peptide sequence MAX1 such that the hydrophobic surfaces of the folded beta-hairpins from the peptides demonstrate shape specificity in hydrophobic interactions with other beta-hairpins during the assembly process, thereby causing changes to the peptide nanostructure and bulk rheological properties of hydrogels formed from the peptides. Steric lock and key complementary hydrophobic interactions were designed to occur between two beta-hairpin molecules of a single molecule, LNK1 during beta-sheet fibrillar assembly of LNK1. Experimental results from circular dichroism, transmission electron microscopy and oscillatory rheology collectively indicate that the molecular design of the LNK1 peptide can be assigned the cause of the drastically different behavior of the networks relative to MAX1. The results indicate elimination or significant reduction of fibrillar branching due to steric complementarity in LNK1 that does not exist in MAX1, thus supporting the original hypothesis. As an extension of the designed steric lock and key complementarity between two beta-hairpin molecules of the same peptide molecule. LNK1, three new pairs of peptide molecules LP1-KP1, LP2-KP2 and LP3-KP3 that resemble complementary 'wedge' and 'trough' shapes when folded into beta-hairpins were designed and studied. All six peptides individually and when blended with their corresponding shape complement formed fibrillar nanostructures with non-uniform thickness values. Loose packing in the assembled structures was observed in all the new peptides as compared to the uniform tight packing in MAX1 by SANS analysis. This

  17. High Interlaboratory Reprocucibility of DNA Sequence-based Typing of Bacteria in a Multicenter Study

    DEFF Research Database (Denmark)

    Sousa, MA de; Boye, Kit; Lencastre, H de

    2006-01-01

    Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...... extensive harmonization of protocols for 30 blind-coded S. aureus DNA samples sent to 10 laboratories. Specialized software for automated sequence analysis ensured a common typing nomenclature....

  18. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    Science.gov (United States)

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  19. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    Science.gov (United States)

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  20. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    Science.gov (United States)

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  1. SPiCE : A web-based tool for sequence-based protein classification and exploration

    NARCIS (Netherlands)

    Van den Berg, B.A.; Reinders, M.J.; Roubos, J.A.; De Ridder, D.

    2014-01-01

    Background Amino acid sequences and features extracted from such sequences have been used to predict many protein properties, such as subcellular localization or solubility, using classifier algorithms. Although software tools are available for both feature extraction and classifier construction,

  2. Establishment of screening technique for mutant cell and analysis of base sequence in the mutation

    International Nuclear Information System (INIS)

    Sofuni, Toshio; Nomi, Takehiko; Yamada, Masami; Masumura, Kenichi

    2000-01-01

    This research project aimed to establish an easy and quick detection method for radiation-induced mutation using molecular-biological techniques and an effective analyzing method for the molecular changes in base sequence. In this year, Spi mutants derived from γ-radiation exposed mouse were analyzed by PCR method and DNA sequence method. Male transgenic mice were exposed to γ-ray at 5,10, 50 Gy and the transgene was taken out from the genome DNA from the spleen in vivo packaging method. Spi mutant plaques were obtained by infecting the recovered phage to E. coli. Sequence analysis for the mutants was made using ALFred DNA sequencer and SequiTherm TM Long-Red Cycle sequencing kit. Sequence analysis was carried out for 41 of 50 independent Spi mutants obtained. The deletions were classified into 4 groups; Group 1 included 15 mutants that were characterized with a large deletion (43 bp-10 kb) with a short homologous sequence. Group 2 included 11 mutants of a large deletion having no homologous sequence at the connecting region. Group 3 included 11 mutants having a short deletion of less than 20 bp, which occurred in the non-repetitive sequence of gam gene and possibly caused by oxidative breakage of DNA or recombination of DNA fragment produced by the breakage. Group 4 included 4 mutants having deletions as short as 20 bp or less in the repetitive sequence of gam gene, resulting in an alteration of the reading frame. Thus, the synthesis of Gam protein was terminated by the appearance of TGA between code 13 and 14 of redB gene, leading to inactivation of gam gene and redBA gene. These results indicated that most of Spi mutants had a deletion in red/gam region and the deletions in more than half mutants occurred in homologous sequences as short as 8 bp. (M.N.)

  3. Human Gait Recognition Based on Multiview Gait Sequences

    Directory of Open Access Journals (Sweden)

    Xiaxi Huang

    2008-05-01

    Full Text Available Most of the existing gait recognition methods rely on a single view, usually the side view, of the walking person. This paper investigates the case in which several views are available for gait recognition. It is shown that each view has unequal discrimination power and, therefore, should have unequal contribution in the recognition process. In order to exploit the availability of multiple views, several methods for the combination of the results that are obtained from the individual views are tested and evaluated. A novel approach for the combination of the results from several views is also proposed based on the relative importance of each view. The proposed approach generates superior results, compared to those obtained by using individual views or by using multiple views that are combined using other combination methods.

  4. Context-dependent motor skill: perceptual processing in memory-based sequence production.

    Science.gov (United States)

    Ruitenberg, Marit F L; Abrahamse, Elger L; De Kleine, Elian; Verwey, Willem B

    2012-10-01

    Previous studies have shown that motor sequencing skill can benefit from the reinstatement of the learning context-even with respect to features that are formally not required for appropriate task performance. The present study explored whether such context-dependence develops when sequence execution is fully memory-based-and thus no longer assisted by stimulus-response translations. Specifically, we aimed to distinguish between preparation and execution processes. Participants performed two keying sequences in a go/no-go version of the discrete sequence production task in which the context consisted of the color in which the target keys of a particular sequence were displayed. In a subsequent test phase, these colors either were the same as during practice, were reversed for the two sequences or were novel. Results showed that, irrespective of the amount of practice, performance across all key presses in the reversed context condition was impaired relative to performance in the same and novel contexts. This suggests that the online preparation and/or execution of single key presses of the sequence is context-dependent. We propose that a cognitive processor is responsible both for these online processes and for advance sequence preparation and that combined findings from the current and previous studies build toward the notion that the cognitive processor is highly sensitive to changes in context across the various roles that it performs.

  5. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley.

    Directory of Open Access Journals (Sweden)

    Martin Mascher

    Full Text Available The rapid development of next-generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS, a low-cost, reduced representation sequencing method, is becoming a common approach for whole-genome marker profiling in many species. With quickly developing sequencing technologies, adapting current GBS methodologies to new platforms will leverage these advancements for future studies. To test new semiconductor sequencing platforms for GBS, we genotyped a barley recombinant inbred line (RIL population. Based on a previous GBS approach, we designed bar code and adapter sets for the Ion Torrent platforms. Four sets of 24-plex libraries were constructed consisting of 94 RILs and the two parents and sequenced on two Ion platforms. In parallel, a 96-plex library of the same RILs was sequenced on the Illumina HiSeq 2000. We applied two different computational pipelines to analyze sequencing data; the reference-independent TASSEL pipeline and a reference-based pipeline using SAMtools. Sequence contigs positioned on the integrated physical and genetic map were used for read mapping and variant calling. We found high agreement in genotype calls between the different platforms and high concordance between genetic and reference-based marker order. There was, however, paucity in the number of SNP that were jointly discovered by the different pipelines indicating a strong effect of alignment and filtering parameters on SNP discovery. We show the utility of the current barley genome assembly as a framework for developing very low-cost genetic maps, facilitating high resolution genetic mapping and negating the need for developing de novo genetic maps for future studies in barley. Through demonstration of GBS on semiconductor sequencing platforms, we conclude that the GBS approach is amenable to a range of platforms and can easily be modified as new

  6. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  7. Quality Control of the Traditional Patent Medicine Yimu Wan Based on SMRT Sequencing and DNA Barcoding

    Science.gov (United States)

    Jia, Jing; Xu, Zhichao; Xin, Tianyi; Shi, Linchun; Song, Jingyuan

    2017-01-01

    Substandard traditional patent medicines may lead to global safety-related issues. Protecting consumers from the health risks associated with the integrity and authenticity of herbal preparations is of great concern. Of particular concern is quality control for traditional patent medicines. Here, we establish an effective approach for verifying the biological composition of traditional patent medicines based on single-molecule real-time (SMRT) sequencing and DNA barcoding. Yimu Wan (YMW), a classical herbal prescription recorded in the Chinese Pharmacopoeia, was chosen to test the method. Two reference YMW samples were used to establish a standard method for analysis, which was then applied to three different batches of commercial YMW samples. A total of 3703 and 4810 circular-consensus sequencing (CCS) reads from two reference and three commercial YMW samples were mapped to the ITS2 and psbA-trnH regions, respectively. Moreover, comparison of intraspecific genetic distances based on SMRT sequencing data with reference data from Sanger sequencing revealed an ITS2 and psbA-trnH intergenic spacer that exhibited high intraspecific divergence, with the sites of variation showing significant differences within species. Using the CCS strategy for SMRT sequencing analysis was adequate to guarantee the accuracy of identification. This study demonstrates the application of SMRT sequencing to detect the biological ingredients of herbal preparations. SMRT sequencing provides an affordable way to monitor the legality and safety of traditional patent medicines. PMID:28620408

  8. Typing of canine parvovirus isolates using mini-sequencing based single nucleotide polymorphism analysis.

    Science.gov (United States)

    Naidu, Hariprasad; Subramanian, B Mohana; Chinchkar, Shankar Ramchandra; Sriraman, Rajan; Rana, Samir Kumar; Srinivasan, V A

    2012-05-01

    The antigenic types of canine parvovirus (CPV) are defined based on differences in the amino acids of the major capsid protein VP2. Type specificity is conferred by a limited number of amino acid changes and in particular by few nucleotide substitutions. PCR based methods are not particularly suitable for typing circulating variants which differ in a few specific nucleotide substitutions. Assays for determining SNPs can detect efficiently nucleotide substitutions and can thus be adapted to identify CPV types. In the present study, CPV typing was performed by single nucleotide extension using the mini-sequencing technique. A mini-sequencing signature was established for all the four CPV types (CPV2, 2a, 2b and 2c) and feline panleukopenia virus. The CPV typing using the mini-sequencing reaction was performed for 13 CPV field isolates and the two vaccine strains available in our repository. All the isolates had been typed earlier by full-length sequencing of the VP2 gene. The typing results obtained from mini-sequencing matched completely with that of sequencing. Typing could be achieved with less than 100 copies of standard plasmid DNA constructs or ≤10¹ FAID₅₀ of virus by mini-sequencing technique. The technique was also efficient for detecting multiple types in mixed infections. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. Analyzing Plasmodium falciparum erythrocyte membrane protein 1 gene expression by a next generation sequencing based method

    DEFF Research Database (Denmark)

    Jespersen, Jakob S.; Petersen, Bent; Seguin-Orlando, Andaine

    2013-01-01

    at identifying PfEMP1 features associated with high virulence. Here we present the first effective method for sequence analysis of var genes expressed in field samples: a sequential PCR and next generation sequencing based technique applied on expressed var sequence tags and subsequently on long range PCR......, encoded by ~60 highly variable 'var' genes per haploid genome. PfEMP1 is exported to the surface of infected erythrocytes and is thought to be fundamental to immune evasion by adhesion to host and parasite factors. The highly variable nature has constituted a roadblock in var expression studies aimed...

  10. A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine

    Science.gov (United States)

    Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing

    2017-09-01

    Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.

  11. Predicting effects of noncoding variants with deep learning-based sequence model.

    Science.gov (United States)

    Zhou, Jian; Troyanskaya, Olga G

    2015-10-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning-based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants.

  12. Sequence-based model of gap gene regulatory network.

    Science.gov (United States)

    Kozlov, Konstantin; Gursky, Vitaly; Kulakovskiy, Ivan; Samsonova, Maria

    2014-01-01

    The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3

  13. MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

    DEFF Research Database (Denmark)

    Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund

    2017-01-01

    number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post...... pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets....

  14. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    Directory of Open Access Journals (Sweden)

    Brejnev Muhizi Muhire

    Full Text Available The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV. There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT, a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms.

  15. An accurate clone-based haplotyping method by overlapping pool sequencing.

    Science.gov (United States)

    Li, Cheng; Cao, Changchang; Tu, Jing; Sun, Xiao

    2016-07-08

    Chromosome-long haplotyping of human genomes is important to identify genetic variants with differing gene expression, in human evolution studies, clinical diagnosis, and other biological and medical fields. Although several methods have realized haplotyping based on sequencing technologies or population statistics, accuracy and cost are factors that prohibit their wide use. Borrowing ideas from group testing theories, we proposed a clone-based haplotyping method by overlapping pool sequencing. The clones from a single individual were pooled combinatorially and then sequenced. According to the distinct pooling pattern for each clone in the overlapping pool sequencing, alleles for the recovered variants could be assigned to their original clones precisely. Subsequently, the clone sequences could be reconstructed by linking these alleles accordingly and assembling them into haplotypes with high accuracy. To verify the utility of our method, we constructed 130 110 clones in silico for the individual NA12878 and simulated the pooling and sequencing process. Ultimately, 99.9% of variants on chromosome 1 that were covered by clones from both parental chromosomes were recovered correctly, and 112 haplotype contigs were assembled with an N50 length of 3.4 Mb and no switch errors. A comparison with current clone-based haplotyping methods indicated our method was more accurate. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. A next generation semiconductor based sequencing approach for the identification of meat species in DNA mixtures.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43% in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97 and lower for avian species (0.70. PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.

  17. Discrepancy between Hepatitis C Virus Genotypes and NS4-Based Serotypes: Association with Their Subgenomic Sequences

    Directory of Open Access Journals (Sweden)

    Nan Nwe Win

    2017-01-01

    Full Text Available Determination of hepatitis C virus (HCV genotypes plays an important role in the direct-acting agent era. Discrepancies between HCV genotyping and serotyping assays are occasionally observed. Eighteen samples with discrepant results between genotyping and serotyping methods were analyzed. HCV serotyping and genotyping were based on the HCV nonstructural 4 (NS4 region and 5′-untranslated region (5′-UTR, respectively. HCV core and NS4 regions were chosen to be sequenced and were compared with the genotyping and serotyping results. Deep sequencing was also performed for the corresponding HCV NS4 regions. Seventeen out of 18 discrepant samples could be sequenced by the Sanger method. Both HCV core and NS4 sequences were concordant with that of genotyping in the 5′-UTR in all 17 samples. In cloning analysis of the HCV NS4 region, there were several amino acid variations, but each sequence was much closer to the peptide with the same genotype. Deep sequencing revealed that minor clones with different subgenotypes existed in two of the 17 samples. Genotyping by genome amplification showed high consistency, while several false reactions were detected by serotyping. The deep sequencing method also provides accurate genotyping results and may be useful for analyzing discrepant cases. HCV genotyping should be correctly determined before antiviral treatment.

  18. Optimal protein library design using recombination or point mutations based on sequence-based scoring functions.

    Science.gov (United States)

    Pantazes, Robert J; Saraf, Manish C; Maranas, Costas D

    2007-08-01

    In this paper, we introduce and test two new sequence-based protein scoring systems (i.e. S1, S2) for assessing the likelihood that a given protein hybrid will be functional. By binning together amino acids with similar properties (i.e. volume, hydrophobicity and charge) the scoring systems S1 and S2 allow for the quantification of the severity of mismatched interactions in the hybrids. The S2 scoring system is found to be able to significantly functionally enrich a cytochrome P450 library over other scoring methods. Given this scoring base, we subsequently constructed two separate optimization formulations (i.e. OPTCOMB and OPTOLIGO) for optimally designing protein combinatorial libraries involving recombination or mutations, respectively. Notably, two separate versions of OPTCOMB are generated (i.e. model M1, M2) with the latter allowing for position-dependent parental fragment skipping. Computational benchmarking results demonstrate the efficacy of models OPTCOMB and OPTOLIGO to generate high scoring libraries of a prespecified size.

  19. Study design requirements for RNA sequencing-based breast cancer diagnostics.

    Science.gov (United States)

    Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias

    2016-02-01

    Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic.

  20. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  1. Visual Localization across Seasons Using Sequence Matching Based on Multi-Feature Combination.

    Science.gov (United States)

    Qiao, Yongliang

    2017-10-25

    Visual localization is widely used in autonomous navigation system and Advanced Driver Assistance Systems (ADAS). However, visual-based localization in seasonal changing situations is one of the most challenging topics in computer vision and the intelligent vehicle community. The difficulty of this task is related to the strong appearance changes that occur in scenes due to weather or season changes. In this paper, a place recognition based visual localization method is proposed, which realizes the localization by identifying previously visited places using the sequence matching method. It operates by matching query image sequences to an image database acquired previously (video acquired during traveling period). In this method, in order to improve matching accuracy, multi-feature is constructed by combining a global GIST descriptor and local binary feature CSLBP (Center-symmetric local binary patterns) to represent image sequence. Then, similarity measurement according to Chi-square distance is used for effective sequences matching. For experimental evaluation, the relationship between image sequence length and sequences matching performance is studied. To show its effectiveness, the proposed method is tested and evaluated in four seasons outdoor environments. The results have shown improved precision-recall performance against the state-of-the-art SeqSLAM algorithm.

  2. Visual Localization across Seasons Using Sequence Matching Based on Multi-Feature Combination

    Directory of Open Access Journals (Sweden)

    Yongliang Qiao

    2017-10-01

    Full Text Available Visual localization is widely used in autonomous navigation system and Advanced Driver Assistance Systems (ADAS. However, visual-based localization in seasonal changing situations is one of the most challenging topics in computer vision and the intelligent vehicle community. The difficulty of this task is related to the strong appearance changes that occur in scenes due to weather or season changes. In this paper, a place recognition based visual localization method is proposed, which realizes the localization by identifying previously visited places using the sequence matching method. It operates by matching query image sequences to an image database acquired previously (video acquired during traveling period. In this method, in order to improve matching accuracy, multi-feature is constructed by combining a global GIST descriptor and local binary feature CSLBP (Center-symmetric local binary patterns to represent image sequence. Then, similarity measurement according to Chi-square distance is used for effective sequences matching. For experimental evaluation, the relationship between image sequence length and sequences matching performance is studied. To show its effectiveness, the proposed method is tested and evaluated in four seasons outdoor environments. The results have shown improved precision–recall performance against the state-of-the-art SeqSLAM algorithm.

  3. LookSeq: a browser-based viewer for deep sequencing data.

    Science.gov (United States)

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P

    2009-11-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

  4. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.

    2005-01-01

    sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human...

  5. Sequencing-based breast cancer diagnostics as an alternative to routine biomarkers.

    Science.gov (United States)

    Rantalainen, Mattias; Klevebring, Daniel; Lindberg, Johan; Ivansson, Emma; Rosin, Gustaf; Kis, Lorand; Celebioglu, Fuat; Fredriksson, Irma; Czene, Kamila; Frisell, Jan; Hartman, Johan; Bergh, Jonas; Grönberg, Henrik

    2016-11-30

    Sequencing-based breast cancer diagnostics have the potential to replace routine biomarkers and provide molecular characterization that enable personalized precision medicine. Here we investigate the concordance between sequencing-based and routine diagnostic biomarkers and to what extent tumor sequencing contributes clinically actionable information. We applied DNA- and RNA-sequencing to characterize tumors from 307 breast cancer patients with replication in up to 739 patients. We developed models to predict status of routine biomarkers (ER, HER2,Ki-67, histological grade) from sequencing data. Non-routine biomarkers, including mutations in BRCA1, BRCA2 and ERBB2(HER2), and additional clinically actionable somatic alterations were also investigated. Concordance with routine diagnostic biomarkers was high for ER status (AUC = 0.95;AUC(replication) = 0.97) and HER2 status (AUC = 0.97;AUC(replication) = 0.92). The transcriptomic grade model enabled classification of histological grade 1 and histological grade 3 tumors with high accuracy (AUC = 0.98;AUC(replication) = 0.94). Clinically actionable mutations in BRCA1, BRCA2 and ERBB2(HER2) were detected in 5.5% of patients, while 53% had genomic alterations matching ongoing or concluded breast cancer studies. Sequencing-based molecular profiling can be applied as an alternative to histopathology to determine ER and HER2 status, in addition to providing improved tumor grading and clinically actionable mutations and molecular subtypes. Our results suggest that sequencing-based breast cancer diagnostics in a near future can replace routine biomarkers.

  6. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  7. HLA class I sequence-based typing using DNA recovered from frozen plasma.

    Science.gov (United States)

    Cotton, Laura A; Abdur Rahman, Manal; Ng, Carmond; Le, Anh Q; Milloy, M-J; Mo, Theresa; Brumme, Zabrina L

    2012-08-31

    We describe a rapid, reliable and cost-effective method for intermediate-to-high-resolution sequence-based HLA class I typing using frozen plasma as a source of genomic DNA. The plasma samples investigated had a median age of 8.5 years. Total nucleic acids were isolated from matched frozen PBMC (~2.5 million) and plasma (500 μl) samples from a panel of 25 individuals using commercial silica-based kits. Extractions yielded median [IQR] nucleic acid concentrations of 85.7 [47.0-130.0]ng/μl and 2.2 [1.7-2.6]ng/μl from PBMC and plasma, respectively. Following extraction, ~1000 base pair regions spanning exons 2 and 3 of HLA-A, -B and -C were amplified independently via nested PCR using universal, locus-specific primers and sequenced directly. Chromatogram analysis was performed using commercial DNA sequence analysis software and allele interpretation was performed using a free web-based tool. HLA-A, -B and -C amplification rates were 100% and chromatograms were of uniformly high quality with clearly distinguishable mixed bases regardless of DNA source. Concordance between PBMC and plasma-derived HLA types was 100% at the allele and protein levels. At the nucleotide level, a single partially discordant base (resulting from a failure to call both peaks in a mixed base) was observed out of >46,975 bases sequenced (>99.9% concordance). This protocol has previously been used to perform HLA class I typing from a variety of genomic DNA sources including PBMC, whole blood, granulocyte pellets and serum, from specimens up to 30 years old. This method provides comparable specificity to conventional sequence-based approaches and could be applied in situations where cell samples are unavailable or DNA quantities are limiting. Copyright © 2012 Elsevier B.V. All rights reserved.

  8. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... explored. Results show that low sequencing depth caused underestimation of allele substitution effects in GWAS and overestimation of genomic heritability in prediction studies. Other factors susch as SNP marker density, population structure and size of training population influenced accuracy of genomic...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

  9. CodonLogo: a sequence logo-based viewer for codon patterns.

    Science.gov (United States)

    Sharma, Virag; Murphy, David P; Provan, Gregory; Baranov, Pavel V

    2012-07-15

    Conserved patterns across a multiple sequence alignment can be visualized by generating sequence logos. Sequence logos show each column in the alignment as stacks of symbol(s) where the height of a stack is proportional to its informational content, whereas the height of each symbol within the stack is proportional to its frequency in the column. Sequence logos use symbols of either nucleotide or amino acid alphabets. However, certain regulatory signals in messenger RNA (mRNA) act as combinations of codons. Yet no tool is available for visualization of conserved codon patterns. We present the first application which allows visualization of conserved regions in a multiple sequence alignment in the context of codons. CodonLogo is based on WebLogo3 and uses the same heuristics but treats codons as inseparable units of a 64-letter alphabet. CodonLogo can discriminate patterns of codon conservation from patterns of nucleotide conservation that appear indistinguishable in standard sequence logos. The CodonLogo source code and its implementation (in a local version of the Galaxy Browser) are available at http://recode.ucc.ie/CodonLogo and through the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/.

  10. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    Science.gov (United States)

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  11. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  12. MISTICA: Minimum Spanning Tree-Based Coarse Image Alignment for Microscopy Image Sequences.

    Science.gov (United States)

    Ray, Nilanjan; McArdle, Sara; Ley, Klaus; Acton, Scott T

    2016-11-01

    Registration of an in vivo microscopy image sequence is necessary in many significant studies, including studies of atherosclerosis in large arteries and the heart. Significant cardiac and respiratory motion of the living subject, occasional spells of focal plane changes, drift in the field of view, and long image sequences are the principal roadblocks. The first step in such a registration process is the removal of translational and rotational motion. Next, a deformable registration can be performed. The focus of our study here is to remove the translation and/or rigid body motion that we refer to here as coarse alignment. The existing techniques for coarse alignment are unable to accommodate long sequences often consisting of periods of poor quality images (as quantified by a suitable perceptual measure). Many existing methods require the user to select an anchor image to which other images are registered. We propose a novel method for coarse image sequence alignment based on minimum weighted spanning trees (MISTICA) that overcomes these difficulties. The principal idea behind MISTICA is to reorder the images in shorter sequences, to demote nonconforming or poor quality images in the registration process, and to mitigate the error propagation. The anchor image is selected automatically making MISTICA completely automated. MISTICA is computationally efficient. It has a single tuning parameter that determines graph width, which can also be eliminated by the way of additional computation. MISTICA outperforms existing alignment methods when applied to microscopy image sequences of mouse arteries.

  13. Histoimmunogenetics Markup Language 1.0: Reporting next generation sequencing-based HLA and KIR genotyping.

    Science.gov (United States)

    Milius, Robert P; Heuer, Michael; Valiga, Daniel; Doroschak, Kathryn J; Kennedy, Caleb J; Bolon, Yung-Tsi; Schneider, Joel; Pollack, Jane; Kim, Hwa Ran; Cereb, Nezih; Hollenbach, Jill A; Mack, Steven J; Maiers, Martin

    2015-12-01

    We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and Sequence Based Typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  14. A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer.

    Science.gov (United States)

    Álvarez-Martos, Isabel; Ferapontova, Elena E

    2017-08-05

    A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus, is not an aptamer and cannot be used neither for in vivo nor in situ analysis of dopamine in the presence of structurally related neurotransmitters. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. 3D knee segmentation based on three MRI sequences from different planes.

    Science.gov (United States)

    Zhou, L; Chav, R; Cresson, T; Chartrand, G; de Guise, J

    2016-08-01

    In clinical practice, knee MRI sequences with 3.5~5 mm slice distance in sagittal, coronal, and axial planes are often requested for the knee examination since its acquisition is faster than high-resolution MRI sequence in a single plane, thereby reducing the probability of motion artifact. In order to take advantage of the three sequences from different planes, a 3D segmentation method based on the combination of three knee models obtained from the three sequences is proposed in this paper. In the method, the sub-segmentation is respectively performed with sagittal, coronal, and axial MRI sequence in the image coordinate system. With each sequence, an initial knee model is hierarchically deformed, and then the three deformed models are mapped to reference coordinate system defined by the DICOM standard and combined to obtain a patient-specific model. The experimental results verified that the three sub-segmentation results can complement each other, and their integration can compensate for the insufficiency of boundary information caused by 3.5~5 mm gap between consecutive slices. Therefore, the obtained patient-specific model is substantially more accurate than each sub-segmentation results.

  16. Taxonomy and phylogeny of the genus citrus based on the nuclear ribosomal dna its region sequence

    International Nuclear Information System (INIS)

    Sun, Y.L.

    2015-01-01

    The genus Citrus (Aurantioideae, Rutaceae) is the sole source of the citrus fruits of commerce showing high economic values. In this study, the taxonomy and phylogeny of Citrus species is evaluated using sequence analysis of the ITS region of nrDNA. This study is based on 26 plants materials belonging to 22 Citrus species having wild, domesticated, and cultivated species. Through DNA alignment of the ITS sequence, ITS1 and ITS2 regions showed relatively high variations of sequence length and nucleotide among these Citrus species. According to previous six-tribe discrimination theory by Swingle and Reece, the grouping in our ITS phylogenetic tree reconstructed by ITS sequences was not related to tribe discrimination but species discrimination. However, the molecular analysis could provide more information on citrus taxonomy. Combined with ITS sequences of other subgenera in then true citrus fruit tree group, the ITS phylogenetic tree indicated subgenera Citrus was monophyletic and nearer to Fortunella, Poncirus, and Clymenia compared to Microcitrus and Eremocitrus. Abundant sequence variations of the ITS region shown in this study would help species identification and tribe differentiation of the genus Citrus. (author)

  17. Molecular diagnosis of Usher syndrome: application of two different next generation sequencing-based procedures.

    Directory of Open Access Journals (Sweden)

    Danilo Licastro

    Full Text Available Usher syndrome (USH is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II and Roche 454 (GS FLX for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified.

  18. Molecular Diagnosis of Usher Syndrome: Application of Two Different Next Generation Sequencing-Based Procedures

    Science.gov (United States)

    Licastro, Danilo; Mutarelli, Margherita; Peluso, Ivana; Neveling, Kornelia; Wieskamp, Nienke; Rispoli, Rossella; Vozzi, Diego; Athanasakis, Emmanouil; D'Eustacchio, Angela; Pizzo, Mariateresa; D'Amico, Francesca; Ziviello, Carmela; Simonelli, Francesca; Fabretto, Antonella; Scheffer, Hans; Gasparini, Paolo; Banfi, Sandro; Nigro, Vincenzo

    2012-01-01

    Usher syndrome (USH) is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS) technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II) and Roche 454 (GS FLX) for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous) out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified. PMID:22952768

  19. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Directory of Open Access Journals (Sweden)

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  20. Simultaneous genomic identification and profiling of a single cell using semiconductor-based next generation sequencing

    Directory of Open Access Journals (Sweden)

    Manabu Watanabe

    2014-09-01

    Full Text Available Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line were used as a model. Single-cell capture was performed using laser capture microdissection (LCM with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈106 cells were subjected to whole genome amplification (WGA. For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 1031–35. For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100× were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100× were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.

  1. Implementation of an RFID-Based Sequencing-Error-Proofing System for Automotive Manufacturing Logistics

    Directory of Open Access Journals (Sweden)

    Yong-Shin Kang

    2018-01-01

    Full Text Available Serialized tracing provides the ability to track and trace the lifecycle of the products and parts. Unlike barcodes, Radio frequency identification (RFID, which is an important building block for internet of things (IoT, does not require a line of sight and has the advantages of recognizing many objects simultaneously and rapidly, and storing more information than barcodes. Therefore, RFID has been used in a variety of application domains such as logistics, distributions, and manufacturing, significantly improving traceability and process efficiency. In this study, we applied RFID to improve the just-in-sequence operation of an automotive inbound logistics process. First, we implemented an RFID-based visibility system for real-time traceability and control of part supply from the production lines of suppliers to the assembly line of a car manufacturer. Second, we developed an RFID-based sequence-error proofing system to avoid accidental line stops due to incorrect part sequencing. The whole system has been successfully installed in a rear-axle inbound logistics process of GM Korea. We achieved a significant amount of cost savings, especially due to the prevention of sequencing errors and part shortages, and the reduction of manual operations. Thorough cost-benefit analysis demonstrates the clear economic feasibility of using RFID technologies for the just-in-sequence inbound logistics in an automobile manufacturing environment.

  2. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  3. pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment.

    Science.gov (United States)

    Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

    2018-01-01

    Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python. The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS. pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.

  4. A new RF tagging pulse based on the Frank poly-phase perfect sequence

    DEFF Research Database (Denmark)

    Laustsen, Christoffer; Greferath, Marcus; Ringgaard, Steffen

    2014-01-01

    Radio frequency (RF) spectrally selective multiband pulses or tagging pulses, are applicable in a broad range of magnetic resonance methods. We demonstrate through simulations and experiments a new phase-modulation-only RF pulse for RF tagging based on the Frank poly-phase perfect sequence...

  5. State of the art and challenges in sequence based T-cell epitope prediction

    DEFF Research Database (Denmark)

    Lundegaard, Claus; Hoof, Ilka; Lund, Ole

    2010-01-01

    Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway, the...

  6. Teaching Research Methodology Using a Project-Based Three Course Sequence Critical Reflections on Practice

    Science.gov (United States)

    Braguglia, Kay H.; Jackson, Kanata A.

    2012-01-01

    This article presents a reflective analysis of teaching research methodology through a three course sequence using a project-based approach. The authors reflect critically on their experiences in teaching research methods courses in an undergraduate business management program. The introduction of a range of specific techniques including student…

  7. Magnetism Teaching Sequences Based on an Inductive Approach for First-Year Thai University Science Students

    Science.gov (United States)

    Narjaikaew, Pattawan; Emarat, Narumon; Arayathanitkul, Kwan; Cowie, Bronwen

    2010-01-01

    The study investigated the impact on student motivation and understanding of magnetism of teaching sequences based on an inductive approach. The study was conducted in large lecture classes. A pre- and post-Conceptual Survey of Electricity and Magnetism was conducted with just fewer than 700 Thai undergraduate science students, before and after…

  8. Method for Generating Pseudorandom Sequences with the Assured Period Based on R-blocks

    Directory of Open Access Journals (Sweden)

    M. A. Ivanov

    2011-03-01

    Full Text Available The article describes the characteristics of a new class of fast-acting pseudorandom number generators, based on the use of stochastic adders or R-blocks. A new method for generating pseudorandom sequences with the assured length of period is offered.

  9. Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikos; Vinther, Jeppe

    2015-01-01

    time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data...

  10. CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

    Directory of Open Access Journals (Sweden)

    Spraggins Thomas A

    2007-04-01

    Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the

  11. Face recognition based on matching of local features on 3D dynamic range sequences

    Science.gov (United States)

    Echeagaray-Patrón, B. A.; Kober, Vitaly

    2016-09-01

    3D face recognition has attracted attention in the last decade due to improvement of technology of 3D image acquisition and its wide range of applications such as access control, surveillance, human-computer interaction and biometric identification systems. Most research on 3D face recognition has focused on analysis of 3D still data. In this work, a new method for face recognition using dynamic 3D range sequences is proposed. Experimental results are presented and discussed using 3D sequences in the presence of pose variation. The performance of the proposed method is compared with that of conventional face recognition algorithms based on descriptors.

  12. Security problems for a pseudorandom sequence generator based on the Chen chaotic system

    Science.gov (United States)

    Özkaynak, Fatih; Yavuz, Sırma

    2013-09-01

    Recently, a novel pseudorandom number generator scheme based on the Chen chaotic system was proposed. In this study, we analyze the security weaknesses of the proposed generator. By applying a brute force attack on a reduced key space, we show that 66% of the generated pseudorandom number sequences can be revealed. Executable C# code is given for the proposed attack. The computational complexity of this attack is O(n), where n is the sequence length. Both mathematical proofs and experimental results are presented to support the proposed attack.

  13. Molecular diversification of Trichuris spp. from Sigmodontinae (Cricetidae) rodents from Argentina based on mitochondrial DNA sequences.

    Science.gov (United States)

    Callejón, Rocío; Robles, María Del Rosario; Panei, Carlos Javier; Cutillas, Cristina

    2016-08-01

    A molecular phylogenetic hypothesis is presented for the genus Trichuris based on sequence data from mitochondrial cytochrome c oxidase 1 (cox1) and cytochrome b (cob). The taxa consisted of nine populations of whipworm from five species of Sigmodontinae rodents from Argentina. Bayesian Inference, Maximum Parsimony, and Maximum Likelihood methods were used to infer phylogenies for each gene separately but also for the combined mitochondrial data and the combined mitochondrial and nuclear dataset. Phylogenetic results based on cox1 and cob mitochondrial DNA (mtDNA) revealed three clades strongly resolved corresponding to three different species (Trichuris navonae, Trichuris bainae, and Trichuris pardinasi) showing phylogeographic variation, but relationships among Trichuris species were poorly resolved. Phylogenetic reconstruction based on concatenated sequences had greater phylogenetic resolution for delimiting species and populations intra-specific of Trichuris than those based on partitioned genes. Thus, populations of T. bainae and T. pardinasi could be affected by geographical factors and co-divergence parasite-host.

  14. Phylogenetic analysis of Demodex caprae based on mitochondrial 16S rDNA sequence.

    Science.gov (United States)

    Zhao, Ya-E; Hu, Li; Ma, Jun-Xian

    2013-11-01

    Demodex caprae infests the hair follicles and sebaceous glands of goats worldwide, which not only seriously impairs goat farming, but also causes a big economic loss. However, there are few reports on the DNA level of D. caprae. To reveal the taxonomic position of D. caprae within the genus Demodex, the present study conducted phylogenetic analysis of D. caprae based on mt16S rDNA sequence data. D. caprae adults and eggs were obtained from a skin nodule of the goat suffering demodicidosis. The mt16S rDNA sequences of individual mite were amplified using specific primers, and then cloned, sequenced, and aligned. The sequence divergence, genetic distance, and transition/transversion rate were computed, and the phylogenetic trees in Demodex were reconstructed. Results revealed the 339-bp partial sequences of six D. caprae isolates were obtained, and the sequence identity was 100% among isolates. The pairwise divergences between D. caprae and Demodex canis or Demodex folliculorum or Demodex brevis were 22.2-24.0%, 24.0-24.9%, and 22.9-23.2%, respectively. The corresponding average genetic distances were 2.840, 2.926, and 2.665, and the average transition/transversion rates were 0.70, 0.55, and 0.54, respectively. The divergences, genetic distances, and transition/transversion rates of D. caprae versus the other three species all reached interspecies level. The five phylogenetic trees all presented that D. caprae clustered with D. brevis first, and then with D. canis, D. folliculorum, and Demodex injai in sequence. In conclusion, D. caprae is an independent species, and it is closer to D. brevis than to D. canis, D. folliculorum, or D. injai.

  15. Security Analysis of a Block Encryption Algorithm Based on Dynamic Sequences of Multiple Chaotic Systems

    Science.gov (United States)

    Du, Mao-Kang; He, Bo; Wang, Yong

    2011-01-01

    Recently, the cryptosystem based on chaos has attracted much attention. Wang and Yu (Commun. Nonlin. Sci. Numer. Simulat. 14 (2009) 574) proposed a block encryption algorithm based on dynamic sequences of multiple chaotic systems. We analyze the potential flaws in the algorithm. Then, a chosen-plaintext attack is presented. Some remedial measures are suggested to avoid the flaws effectively. Furthermore, an improved encryption algorithm is proposed to resist the attacks and to keep all the merits of the original cryptosystem.

  16. DNA cross-linking by dehydromonocrotaline lacks apparent base sequence preference.

    Science.gov (United States)

    Rieben, W Kurt; Coulombe, Roger A

    2004-12-01

    Pyrrolizidine alkaloids (PAs) are ubiquitous plant toxins, many of which, upon oxidation by hepatic mixed-function oxidases, become reactive bifunctional pyrrolic electrophiles that form DNA-DNA and DNA-protein cross-links. The anti-mitotic, toxic, and carcinogenic action of PAs is thought to be caused, at least in part, by these cross-links. We wished to determine whether the activated PA pyrrole dehydromonocrotaline (DHMO) exhibits base sequence preferences when cross-linked to a set of model duplex poly A-T 14-mer oligonucleotides with varying internal and/or end 5'-d(CG), 5'-d(GC), 5'-d(TA), 5'-d(CGCG), or 5'-d(GCGC) sequences. DHMO-DNA cross-links were assessed by electrophoretic mobility shift assay (EMSA) of 32P endlabeled oligonucleotides and by HPLC analysis of cross-linked DNAs enzymatically digested to their constituent deoxynucleosides. The degree of DNA cross-links depended upon the concentration of the pyrrole, but not on the base sequence of the oligonucleotide target. Likewise, HPLC chromatograms of cross-linked and digested DNAs showed no discernible sequence preference for any nucleotide. Added glutathione, tyrosine, cysteine, and aspartic acid, but not phenylalanine, threonine, serine, lysine, or methionine competed with DNA as alternate nucleophiles for cross-linking by DHMO. From these data it appears that DHMO exhibits no strong base preference when forming cross-links with DNA, and that some cellular nucleophiles can inhibit DNA cross-link formation.

  17. Autonomously Generating Operations Sequences for a Mars Rover Using Artificial Intelligence-Based Planning

    Science.gov (United States)

    Sherwood, R.; Mutz, D.; Estlin, T.; Chien, S.; Backes, P.; Norris, J.; Tran, D.; Cooper, B.; Rabideau, G.; Mishkin, A.; Maxwell, S.

    2001-07-01

    This article discusses a proof-of-concept prototype for ground-based automatic generation of validated rover command sequences from high-level science and engineering activities. This prototype is based on ASPEN, the Automated Scheduling and Planning Environment. This artificial intelligence (AI)-based planning and scheduling system will automatically generate a command sequence that will execute within resource constraints and satisfy flight rules. An automated planning and scheduling system encodes rover design knowledge and uses search and reasoning techniques to automatically generate low-level command sequences while respecting rover operability constraints, science and engineering preferences, environmental predictions, and also adhering to hard temporal constraints. This prototype planning system has been field-tested using the Rocky 7 rover at JPL and will be field-tested on more complex rovers to prove its effectiveness before transferring the technology to flight operations for an upcoming NASA mission. Enabling goal-driven commanding of planetary rovers greatly reduces the requirements for highly skilled rover engineering personnel. This in turn greatly reduces mission operations costs. In addition, goal-driven commanding permits a faster response to changes in rover state (e.g., faults) or science discoveries by removing the time-consuming manual sequence validation process, allowing rapid "what-if" analyses, and thus reducing overall cycle times.

  18. Structured prediction models for RNN based sequence labeling in clinical text.

    Science.gov (United States)

    Jagannatha, Abhyuday N; Yu, Hong

    2016-11-01

    Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.

  19. On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families

    DEFF Research Database (Denmark)

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    Over the last decades, we have observed an ongoing tremendous growth of available sequencing data fueled by the advancements in wet-lab technology. The sequencing information is only the beginning of the actual understanding of how organisms survive and prosper. It is, for instance, equally...... important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...

  20. MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

    Science.gov (United States)

    Grimes, Susan M; Ji, Hanlee P

    2014-08-27

    Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

  1. Extracting flat-field images from scene-based image sequences using phase correlation

    Energy Technology Data Exchange (ETDEWEB)

    Caron, James N., E-mail: Caron@RSImd.com [Research Support Instruments, 4325-B Forbes Boulevard, Lanham, Maryland 20706 (United States); Montes, Marcos J. [Naval Research Laboratory, Code 7231, 4555 Overlook Avenue, SW, Washington, DC 20375 (United States); Obermark, Jerome L. [Naval Research Laboratory, Code 8231, 4555 Overlook Avenue, SW, Washington, DC 20375 (United States)

    2016-06-15

    Flat-field image processing is an essential step in producing high-quality and radiometrically calibrated images. Flat-fielding corrects for variations in the gain of focal plane array electronics and unequal illumination from the system optics. Typically, a flat-field image is captured by imaging a radiometrically uniform surface. The flat-field image is normalized and removed from the images. There are circumstances, such as with remote sensing, where a flat-field image cannot be acquired in this manner. For these cases, we developed a phase-correlation method that allows the extraction of an effective flat-field image from a sequence of scene-based displaced images. The method uses sub-pixel phase correlation image registration to align the sequence to estimate the static scene. The scene is removed from sequence producing a sequence of misaligned flat-field images. An average flat-field image is derived from the realigned flat-field sequence.

  2. [Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

    Science.gov (United States)

    Ortega, Maya

    2010-01-01

    My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.

  3. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    Directory of Open Access Journals (Sweden)

    Dobbs Drena

    2011-06-01

    Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of

  4. Investigation of next-generation sequencing data of Klebsiella pneumoniae using web-based tools.

    Science.gov (United States)

    Brhelova, Eva; Antonova, Mariya; Pardy, Filip; Kocmanova, Iva; Mayer, Jiri; Racil, Zdenek; Lengerova, Martina

    2017-11-01

    Rapid identification and characterization of multidrug-resistant Klebsiella pneumoniae strains is necessary due to the increasing frequency of severe infections in patients. The decreasing cost of next-generation sequencing enables us to obtain a comprehensive overview of genetic information in one step. The aim of this study is to demonstrate and evaluate the utility and scope of the application of web-based databases to next-generation sequenced (NGS) data. The whole genomes of 11 clinical Klebsiella pneumoniae isolates were sequenced using Illumina MiSeq. Selected web-based tools were used to identify a variety of genetic characteristics, such as acquired antimicrobial resistance genes, multilocus sequence types, plasmid replicons, and identify virulence factors, such as virulence genes, cps clusters, urease-nickel clusters and efflux systems. Using web-based tools hosted by the Center for Genomic Epidemiology, we detected resistance to 8 main antimicrobial groups with at least 11 acquired resistance genes. The isolates were divided into eight sequence types (ST11, 23, 37, 323, 433, 495 and 562, and a new one, ST1646). All of the isolates carried replicons of large plasmids. Capsular types, virulence factors and genes coding AcrAB and OqxAB efflux pumps were detected using BIGSdb-Kp, whereas the selected virulence genes, identified in almost all of the isolates, were detected using CLC Genomic Workbench software. Applying appropriate web-based online tools to NGS data enables the rapid extraction of comprehensive information that can be used for more efficient diagnosis and treatment of patients, while data processing is free of charge, easy and time-efficient.

  5. Comparison of Enzymes / Non-Enzymes Proteins Classification Models Based on 3D, Composition, Sequences and Topological Indices

    OpenAIRE

    Munteanu, Cristian Robert

    2014-01-01

    Comparison of Enzymes / Non-Enzymes Proteins Classification Models Based on 3D, Composition, Sequences and Topological Indices, German Conference on Bioinformatics (GCB), Potsdam, Germany (September, 2007)

  6. Generalized min-max bound-based MRI pulse sequence design framework for wide-range T1 relaxometry: A case study on the tissue specific imaging sequence.

    Directory of Open Access Journals (Sweden)

    Yang Liu

    Full Text Available This paper proposes a new design strategy for optimizing MRI pulse sequences for T1 relaxometry. The design strategy optimizes the pulse sequence parameters to minimize the maximum variance of unbiased T1 estimates over a range of T1 values using the Cramér-Rao bound. In contrast to prior sequences optimized for a single nominal T1 value, the optimized sequence using our bound-based strategy achieves improved precision and accuracy for a broad range of T1 estimates within a clinically feasible scan time. The optimization combines the downhill simplex method with a simulated annealing process. To show the effectiveness of the proposed strategy, we optimize the tissue specific imaging (TSI sequence. Preliminary Monte Carlo simulations demonstrate that the optimized TSI sequence yields improved precision and accuracy over the popular driven-equilibrium single-pulse observation of T1 (DESPOT1 approach for normal brain tissues (estimated T1 700-2000 ms at 3.0T. The relative mean estimation error (MSE for T1 estimation is less than 1.7% using the optimized TSI sequence, as opposed to less than 7.0% using DESPOT1 for normal brain tissues. The optimized TSI sequence achieves good stability by keeping the MSE under 7.0% over larger T1 values corresponding to different lesion tissues and the cerebrospinal fluid (up to 5000 ms. The T1 estimation accuracy using the new pulse sequence also shows improvement, which is more pronounced in low SNR scenarios.

  7. Haematobia irritans dataset of raw sequence reads from Illumina-based transcriptome sequencing of specific tissues and life stages

    Science.gov (United States)

    Illumina HiSeq technology was used to sequence the transcriptome from various dissected tissues and life stages from the horn fly, Haematobia irritans. These samples include eggs (0, 2, 4, and 9 hours post-oviposition), adult fly gut, adult fly legs, adult fly malpighian tubule, adult fly ovary, adu...

  8. Genetic diversity in breonadia salicina based on intra-species sequence variation of chloroplast dna spacer sequence

    International Nuclear Information System (INIS)

    Qurainy, F.A.; Gaafar, A.R.Z.

    2014-01-01

    Assessment and knowledge of the genetic diversity and variation within and between populations of rare and endangered plants is very important for effective conservation. Intergenic spacer sequences variation of psbA-trnH locus of chloroplast genome was assessed within Breonadia salicina (Rubiaceae), a critically endangered and endemic plant species to South western part of Kingdom of Saudi Arabia. The obtained sequence data from 19 individuals in three populations revealed nine haplotypes. The aligned sequences obtained from the overall Saudi accessions extended to 355 bp, revealing nine haplotypes. A high level of haplotype diversity (Hd = 0.842) and low level of nucleotide diversity (Pi = 0.0058) were detected. Consistently, both hierarchical analysis of molecular variance (AMOVA) and constructed neighbor-joining tree indicated null genetic differentiation among populations. This level of differentiation between populations or between regions in psbA-trnH sequences may be due to effects of the abundance of ancestral haplotype sharing and the presence of private haplotypes fixed for each population. Furthermore, the results revealed almost the same level of genetic diversity in comparison with Yemeni accessions, in which Saudi accessions were sharing three haplotypes from the four haplotypes found in Yemeni accessions. (author)

  9. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  10. Genome survey sequencing and genetic background characterization of Gracilariopsis lemaneiformis (Rhodophyta) based on next-generation sequencing.

    Science.gov (United States)

    Zhou, Wei; Hu, Yiyi; Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon.

  11. Genome Survey Sequencing and Genetic Background Characterization of Gracilariopsis lemaneiformis (Rhodophyta) Based on Next-Generation Sequencing

    Science.gov (United States)

    Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon. PMID:23875008

  12. Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.

    Science.gov (United States)

    Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2014-05-23

    The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.

  13. incaRNAfbinv: a web server for the fragment-based design of RNA sequences

    Science.gov (United States)

    Drory Retwitzer, Matan; Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme; Barash, Danny

    2016-01-01

    Abstract In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv. PMID:27185893

  14. PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

    Directory of Open Access Journals (Sweden)

    Francisco Alexandre P

    2012-05-01

    Full Text Available Abstract Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

  15. Robust Automatic Target Recognition via HRRP Sequence Based on Scatterer Matching

    Directory of Open Access Journals (Sweden)

    Yuan Jiang

    2018-02-01

    Full Text Available High resolution range profile (HRRP plays an important role in wideband radar automatic target recognition (ATR. In order to alleviate the sensitivity to clutter and target aspect, employing a sequence of HRRP is a promising approach to enhance the ATR performance. In this paper, a novel HRRP sequence-matching method based on singular value decomposition (SVD is proposed. First, the HRRP sequence is decoupled into the angle space and the range space via SVD, which correspond to the span of the left and the right singular vectors, respectively. Second, atomic norm minimization (ANM is utilized to estimate dominant scatterers in the range space and the Hausdorff distance is employed to measure the scatter similarity between the test and training data. Next, the angle space similarity between the test and training data is evaluated based on the left singular vector correlations. Finally, the range space matching result and the angle space correlation are fused with the singular values as weights. Simulation and outfield experimental results demonstrate that the proposed matching metric is a robust similarity measure for HRRP sequence recognition.

  16. HIV-1 envelope sequence-based diversity measures for identifying recent infections.

    Directory of Open Access Journals (Sweden)

    Alexis Kafando

    Full Text Available Identifying recent HIV-1 infections is crucial for monitoring HIV-1 incidence and optimizing public health prevention efforts. To identify recent HIV-1 infections, we evaluated and compared the performance of 4 sequence-based diversity measures including percent diversity, percent complexity, Shannon entropy and number of haplotypes targeting 13 genetic segments within the env gene of HIV-1. A total of 597 diagnostic samples obtained in 2013 and 2015 from recently and chronically HIV-1 infected individuals were selected. From the selected samples, 249 (134 from recent versus 115 from chronic infections env coding regions, including V1-C5 of gp120 and the gp41 ectodomain of HIV-1, were successfully amplified and sequenced by next generation sequencing (NGS using the Illumina MiSeq platform. The ability of the four sequence-based diversity measures to correctly identify recent HIV infections was evaluated using the frequency distribution curves, median and interquartile range and area under the curve (AUC of the receiver operating characteristic (ROC. Comparing the median and interquartile range and evaluating the frequency distribution curves associated with the 4 sequence-based diversity measures, we observed that the percent diversity, number of haplotypes and Shannon entropy demonstrated significant potential to discriminate recent from chronic infections (p<0.0001. Using the AUC of ROC analysis, only the Shannon entropy measure within three HIV-1 env segments could accurately identify recent infections at a satisfactory level. The env segments were gp120 C2_1 (AUC = 0.806, gp120 C2_3 (AUC = 0.805 and gp120 V3 (AUC = 0.812. Our results clearly indicate that the Shannon entropy measure represents a useful tool for predicting HIV-1 infection recency.

  17. Studies of base pair sequence effects on DNA solvation based on all-atom molecular dynamics simulations.

    Science.gov (United States)

    Dixit, Surjit B; Mezei, Mihaly; Beveridge, David L

    2012-07-01

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization were observed in these simulations. The results were compared to essentially all known experimental data on the subject. Proximity analysis was employed to highlight the sequence dependent differences in solvation and ion localization properties in the grooves of DNA. Comparison of the MD-calculated DNA structure with canonical A- and B-forms supports the idea that the G/C-rich sequences are closer to canonical A- than B-form structures, while the reverse is true for the poly A sequences, with the exception of the alternating ATAT sequence. Analysis of hydration density maps reveals that the flexibility of solute molecule has a significant effect on the nature of observed hydration. Energetic analysis of solute-solvent interactions based on proximity analysis of solvent reveals that the GC or CG base pairs interact more strongly with water molecules in the minor groove of DNA that the AT or TA base pairs, while the interactions of the AT or TA pairs in the major groove are stronger than those of the GC or CG pairs. Computation of solvent-accessible surface area of the nucleotide units in the simulated trajectories reveals that the similarity with results derived from analysis of a database of crystallographic structures is excellent. The MD trajectories tend to follow Manning's counterion condensation theory, presenting a region of condensed counterions within a radius of about 17 A from the DNA surface independent of sequence. The GC and CG pairs tend to associate with cations in the major groove of the DNA structure to a greater extent than the AT and TA pairs. Cation association is more frequent in the minor groove of AT than the GC pairs. In general, the

  18. Molecular characterization of Fasciola gigantica from Mauritania based on mitochondrial and nuclear ribosomal DNA sequences.

    Science.gov (United States)

    Amor, Nabil; Farjallah, Sarra; Salem, Mohamed; Lamine, Dia Mamadou; Merella, Paolo; Said, Khaled; Ben Slimane, Badreddine

    2011-10-01

    Fasciolosis caused by Fasciola hepatica and Fasciola gigantica (Platyhelminthes: Trematoda: Digenea) is considered the most important helminth infection of ruminants in tropical countries, causing considerable socioeconomic problems. From Africa, F. gigantica has been previously characterized from Burkina Faso, Senegal, Kenya, Zambia and Mali, while F. hepatica has been reported from Morocco and Tunisia, and both species have been observed from Ethiopia and Egypt on the basis of morphometric differences, while the use of molecular markers is necessary to distinguish exactly between species. Samples identified morphologically as F. gigantica (n=60) from sheep and cattle from different geographical localities of Mauritania were genetically characterized by sequences of the first (ITS-1), the 5.8S, and second (ITS-2) Internal Transcribed Spacers (ITS) of nuclear ribosomal DNA (rDNA) genes and the mitochondrial Cytochrome c Oxidase I (COI) gene. Comparison of the sequences of the Mauritanian samples with sequences of Fasciola spp. from GenBank confirmed that all samples belong to the species F. gigantica. The nucleotide sequencing of ITS rDNA of F. gigantica showed no nucleotide variation in the ITS-1, 5.8S, and ITS-2 rDNA sequences among all samples examined and those from Burkina Faso, Kenya, Egypt and Iran. The phylogenetic trees based on the ITS-1 and ITS-2 sequences showed a close relationship of the Mauritanian samples with isolates of F. gigantica from different localities of Africa and Asia. The COI genotypes of the Mauritanian specimens of F. gigantica had a high level of diversity, and they belonged to the F. gigantica phylogenically distinguishable clade. The present study is the first molecular characterization of F. gigantica in sheep and cattle from Mauritania, allowing a reliable approach for the genetic differentiation of Fasciola spp. and providing basis for further studies on liver flukes in the African countries. Copyright © 2011 Elsevier Inc. All

  19. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Li Wei

    2005-05-01

    Full Text Available Abstract Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~3.84 million shotgun sequences (0.66X coverage from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. Conclusion The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.

  20. Optimal pseudorandom sequence selection for online c-VEP based BCI control applications

    DEFF Research Database (Denmark)

    Isaksen, Jonas L.; Mohebbi, Ali; Puthusserypady, Sadasivan

    2017-01-01

    to predict the chance of completion and accuracy score. Results: No specific pseudorandom sequence showed superior accuracy on the group basis. When isolating the individual performances with the highest accuracy, time consumption per identification was not significantly increased. The Accuracy Score aids...... is a laborious process. Aims: This study aimed to suggest an efficient method for choosing the optimal stimulus sequence based on a fast test and simple measures to increase the performance and minimize the time consumption for research trials. Methods: A total of 21 healthy subjects were included in an online...... wheelchair control task and completed the same task using stimuli based on the m-code, the gold-code, and the Barker-code. Correct/incorrect identification and time consumption were obtained for each identification. Subject-specific templates were characterized and used in a forward-step first-order model...

  1. Research on Image Encryption Based on DNA Sequence and Chaos Theory

    Science.gov (United States)

    Tian Zhang, Tian; Yan, Shan Jun; Gu, Cheng Yan; Ren, Ran; Liao, Kai Xin

    2018-04-01

    Nowadays encryption is a common technique to protect image data from unauthorized access. In recent years, many scientists have proposed various encryption algorithms based on DNA sequence to provide a new idea for the design of image encryption algorithm. Therefore, a new method of image encryption based on DNA computing technology is proposed in this paper, whose original image is encrypted by DNA coding and 1-D logistic chaotic mapping. First, the algorithm uses two modules as the encryption key. The first module uses the real DNA sequence, and the second module is made by one-dimensional logistic chaos mapping. Secondly, the algorithm uses DNA complementary rules to encode original image, and uses the key and DNA computing technology to compute each pixel value of the original image, so as to realize the encryption of the whole image. Simulation results show that the algorithm has good encryption effect and security.

  2. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  3. Discrimination of the Lactobacillus acidophilus group using sequencing, species-specific PCR and SNaPshot mini-sequencing technology based on the recA gene.

    Science.gov (United States)

    Huang, Chien-Hsun; Chang, Mu-Tzu; Huang, Mu-Chiou; Wang, Li-Tin; Huang, Lina; Lee, Fwu-Ling

    2012-10-01

    To clearly identify specific species and subspecies of the Lactobacillus acidophilus group using phenotypic and genotypic (16S rDNA sequence analysis) techniques alone is difficult. The aim of this study was to use the recA gene for species discrimination in the L. acidophilus group, as well as to develop a species-specific primer and single nucleotide polymorphism primer based on the recA gene sequence for species and subspecies identification. The average sequence similarity for the recA gene among type strains was 80.0%, and most members of the L. acidophilus group could be clearly distinguished. The species-specific primer was designed according to the recA gene sequencing, which was employed for polymerase chain reaction with the template DNA of Lactobacillus strains. A single 231-bp species-specific band was found only in L. delbrueckii. A SNaPshot mini-sequencing assay using recA as a target gene was also developed. The specificity of the mini-sequencing assay was evaluated using 31 strains of L. delbrueckii species and was able to unambiguously discriminate strains belonging to the subspecies L. delbrueckii subsp. bulgaricus. The phylogenetic relationships of most strains in the L. acidophilus group can be resolved using recA gene sequencing, and a novel method to identify the species and subspecies of the L. delbrueckii and L. delbrueckii subsp. bulgaricus was developed by species-specific polymerase chain reaction combined with SNaPshot mini-sequencing. Copyright © 2012 Society of Chemical Industry.

  4. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  5. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    Science.gov (United States)

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.

  6. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    Science.gov (United States)

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  7. CT Image Sequence Restoration Based on Sparse and Low-Rank Decomposition

    Science.gov (United States)

    Gou, Shuiping; Wang, Yueyue; Wang, Zhilong; Peng, Yong; Zhang, Xiaopeng; Jiao, Licheng; Wu, Jianshe

    2013-01-01

    Blurry organ boundaries and soft tissue structures present a major challenge in biomedical image restoration. In this paper, we propose a low-rank decomposition-based method for computed tomography (CT) image sequence restoration, where the CT image sequence is decomposed into a sparse component and a low-rank component. A new point spread function of Weiner filter is employed to efficiently remove blur in the sparse component; a wiener filtering with the Gaussian PSF is used to recover the average image of the low-rank component. And then we get the recovered CT image sequence by combining the recovery low-rank image with all recovery sparse image sequence. Our method achieves restoration results with higher contrast, sharper organ boundaries and richer soft tissue structure information, compared with existing CT image restoration methods. The robustness of our method was assessed with numerical experiments using three different low-rank models: Robust Principle Component Analysis (RPCA), Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) and Go Decomposition (GoDec). Experimental results demonstrated that the RPCA model was the most suitable for the small noise CT images whereas the GoDec model was the best for the large noisy CT images. PMID:24023764

  8. A method to prioritize quantitative traits and individuals for sequencing in family-based studies.

    Directory of Open Access Journals (Sweden)

    Kaanan P Shah

    Full Text Available Owing to recent advances in DNA sequencing, it is now technically feasible to evaluate the contribution of rare variation to complex traits and diseases. However, it is still cost prohibitive to sequence the whole genome (or exome of all individuals in each study. For quantitative traits, one strategy to reduce cost is to sequence individuals in the tails of the trait distribution. However, the next challenge becomes how to prioritize traits and individuals for sequencing since individuals are often characterized for dozens of medically relevant traits. In this article, we describe a new method, the Rare Variant Kinship Test (RVKT, which leverages relationship information in family-based studies to identify quantitative traits that are likely influenced by rare variants. Conditional on nuclear families and extended pedigrees, we evaluate the power of the RVKT via simulation. Not unexpectedly, the power of our method depends strongly on effect size, and to a lesser extent, on the frequency of the rare variant and the number and type of relationships in the sample. As an illustration, we also apply our method to data from two genetic studies in the Old Order Amish, a founder population with extensive genealogical records. Remarkably, we implicate the presence of a rare variant that lowers fasting triglyceride levels in the Heredity and Phenotype Intervention (HAPI Heart study (p = 0.044, consistent with the presence of a previously identified null mutation in the APOC3 gene that lowers fasting triglyceride levels in HAPI Heart study participants.

  9. Sequence-specific inhibition of Dicer measured with a force-based microarray for RNA ligands.

    Science.gov (United States)

    Limmer, Katja; Aschenbrenner, Daniela; Gaub, Hermann E

    2013-04-01

    Malfunction of protein translation causes many severe diseases, and suitable correction strategies may become the basis of effective therapies. One major regulatory element of protein translation is the nuclease Dicer that cuts double-stranded RNA independently of the sequence into pieces of 19-22 base pairs starting the RNA interference pathway and activating miRNAs. Inhibiting Dicer is not desirable owing to its multifunctional influence on the cell's gene regulation. Blocking specific RNA sequences by small-molecule binding, however, is a promising approach to affect the cell's condition in a controlled manner. A label-free assay for the screening of site-specific interference of small molecules with Dicer activity is thus needed. We used the Molecular Force Assay (MFA), recently developed in our lab, to measure the activity of Dicer. As a model system, we used an RNA sequence that forms an aptamer-binding site for paromomycin, a 615-dalton aminoglycoside. We show that Dicer activity is modulated as a function of concentration and incubation time: the addition of paromomycin leads to a decrease of Dicer activity according to the amount of ligand. The measured dissociation constant of paromomycin to its aptamer was found to agree well with literature values. The parallel format of the MFA allows a large-scale search and analysis for ligands for any RNA sequence.

  10. A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification.

    Science.gov (United States)

    Yildirim, Özal

    2018-05-01

    Long-short term memory networks (LSTMs), which have recently emerged in sequential data analysis, are the most widely used type of recurrent neural networks (RNNs) architecture. Progress on the topic of deep learning includes successful adaptations of deep versions of these architectures. In this study, a new model for deep bidirectional LSTM network-based wavelet sequences called DBLSTM-WS was proposed for classifying electrocardiogram (ECG) signals. For this purpose, a new wavelet-based layer is implemented to generate ECG signal sequences. The ECG signals were decomposed into frequency sub-bands at different scales in this layer. These sub-bands are used as sequences for the input of LSTM networks. New network models that include unidirectional (ULSTM) and bidirectional (BLSTM) structures are designed for performance comparisons. Experimental studies have been performed for five different types of heartbeats obtained from the MIT-BIH arrhythmia database. These five types are Normal Sinus Rhythm (NSR), Ventricular Premature Contraction (VPC), Paced Beat (PB), Left Bundle Branch Block (LBBB), and Right Bundle Branch Block (RBBB). The results show that the DBLSTM-WS model gives a high recognition performance of 99.39%. It has been observed that the wavelet-based layer proposed in the study significantly improves the recognition performance of conventional networks. This proposed network structure is an important approach that can be applied to similar signal processing problems. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Expectation violations in sensorimotor sequences: shifting from LTM-based attentional selection to visual search.

    Science.gov (United States)

    Foerster, Rebecca M; Schneider, Werner X

    2015-03-01

    Long-term memory (LTM) delivers important control signals for attentional selection. LTM expectations have an important role in guiding the task-driven sequence of covert attention and gaze shifts, especially in well-practiced multistep sensorimotor actions. What happens when LTM expectations are disconfirmed? Does a sensory-based visual-search mode of attentional selection replace the LTM-based mode? What happens when prior LTM expectations become valid again? We investigated these questions in a computerized version of the number-connection test. Participants clicked on spatially distributed numbered shapes in ascending order while gaze was recorded. Sixty trials were performed with a constant spatial arrangement. In 20 consecutive trials, either numbers, shapes, both, or no features switched position. In 20 reversion trials, participants worked on the original arrangement. Only the sequence-affecting number switches elicited slower clicking, visual search-like scanning, and lower eye-hand synchrony. The effects were neither limited to the exchanged numbers nor to the corresponding actions. Thus, expectation violations in a well-learned sensorimotor sequence cause a regression from LTM-based attentional selection to visual search beyond deviant-related actions and locations. Effects lasted for several trials and reappeared during reversion. © 2015 New York Academy of Sciences.

  12. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    KAUST Repository

    Wang, Bing; Zhang, Jun; Chen, Peng; Ji, Zhiwei; Deng, Shuping; Li, Chi

    2013-01-01

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  13. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    KAUST Repository

    Wang, Bing

    2013-05-09

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  14. Molecular Characterization of Five Potyviruses Infecting Korean Sweet Potatoes Based on Analyses of Complete Genome Sequences

    Directory of Open Access Journals (Sweden)

    Hae-Ryun Kwak

    2015-12-01

    Full Text Available Sweet potatoes (Ipomea batatas L. are grown extensively, in tropical and temperate regions, and are important food crops worldwide. In Korea, potyviruses, including Sweet potato feathery mottle virus (SPFMV, Sweet potato virus C (SPVC, Sweet potato virus G (SPVG, Sweet potato virus 2 (SPV2, and Sweet potato latent virus (SPLV, have been detected in sweet potato fields at a high (~95% incidence. In the present work, complete genome sequences of 18 isolates, representing the five potyviruses mentioned above, were compared with previously reported genome sequences. The complete genomes consisted of 10,081 to 10,830 nucleotides, excluding the poly-A tails. Their genomic organizations were typical of the Potyvirus genus, including one target open reading frame coding for a putative polyprotein. Based on phylogenetic analyses and sequence comparisons, the Korean SPFMV isolates belonged to the strains RC and O with >98% nucleotide sequence identity. Korean SPVC isolates had 99% identity to the Japanese isolate SPVC-Bungo and 70% identity to the SPFMV isolates. The Korean SPVG isolates showed 99% identity to the three previously reported SPVG isolates. Korean SPV2 isolates had 97% identity to the SPV2 GWB-2 isolate from the USA. Korean SPLV isolates had a relatively low (88% nucleotide sequence identity with the Taiwanese SPLV-TW isolates, and they were phylogenetically distantly related to SPFMV isolates. Recombination analysis revealed that possible recombination events occurred in the P1, HC-Pro and NIa-NIb regions of SPFMV and SPLV isolates and these regions were identified as hotspots for recombination in the sweet potato potyviruses.

  15. Estimation of physiological parameters using knowledge-based factor analysis of dynamic nuclear medicine image sequences

    International Nuclear Information System (INIS)

    Yap, J.T.; Chen, C.T.; Cooper, M.

    1995-01-01

    The authors have previously developed a knowledge-based method of factor analysis to analyze dynamic nuclear medicine image sequences. In this paper, the authors analyze dynamic PET cerebral glucose metabolism and neuroreceptor binding studies. These methods have shown the ability to reduce the dimensionality of the data, enhance the image quality of the sequence, and generate meaningful functional images and their corresponding physiological time functions. The new information produced by the factor analysis has now been used to improve the estimation of various physiological parameters. A principal component analysis (PCA) is first performed to identify statistically significant temporal variations and remove the uncorrelated variations (noise) due to Poisson counting statistics. The statistically significant principal components are then used to reconstruct a noise-reduced image sequence as well as provide an initial solution for the factor analysis. Prior knowledge such as the compartmental models or the requirement of positivity and simple structure can be used to constrain the analysis. These constraints are used to rotate the factors to the most physically and physiologically realistic solution. The final result is a small number of time functions (factors) representing the underlying physiological processes and their associated weighting images representing the spatial localization of these functions. Estimation of physiological parameters can then be performed using the noise-reduced image sequence generated from the statistically significant PCs and/or the final factor images and time functions. These results are compared to the parameter estimation using standard methods and the original raw image sequences. Graphical analysis was performed at the pixel level to generate comparable parametric images of the slope and intercept (influx constant and distribution volume)

  16. MetaSeq: privacy preserving meta-analysis of sequencing-based association studies.

    Science.gov (United States)

    Singh, Angad Pal; Zafer, Samreen; Pe'er, Itsik

    2013-01-01

    Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for metaanalysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exome-sequencing data from multiple studies, simulating phenotypic information for powerful meta-analysis. The MetaSeq software is publicly available as open source.

  17. Next-generation Sequencing-based genomic profiling: Fostering innovation in cancer care?

    Directory of Open Access Journals (Sweden)

    Gustavo S. Fernandes

    Full Text Available OBJECTIVES: With the development of next-generation sequencing (NGS technologies, DNA sequencing has been increasingly utilized in clinical practice. Our goal was to investigate the impact of genomic evaluation on treatment decisions for heavily pretreated patients with metastatic cancer. METHODS: We analyzed metastatic cancer patients from a single institution whose cancers had progressed after all available standard-of-care therapies and whose tumors underwent next-generation sequencing analysis. We determined the percentage of patients who received any therapy directed by the test, and its efficacy. RESULTS: From July 2013 to December 2015, 185 consecutive patients were tested using a commercially available next-generation sequencing-based test, and 157 patients were eligible. Sixty-six patients (42.0% were female, and 91 (58.0% were male. The mean age at diagnosis was 52.2 years, and the mean number of pre-test lines of systemic treatment was 2.7. One hundred and seventy-seven patients (95.6% had at least one identified gene alteration. Twenty-four patients (15.2% underwent systemic treatment directed by the test result. Of these, one patient had a complete response, four (16.7% had partial responses, two (8.3% had stable disease, and 17 (70.8% had disease progression as the best result. The median progression-free survival time with matched therapy was 1.6 months, and the median overall survival was 10 months. CONCLUSION: We identified a high prevalence of gene alterations using an next-generation sequencing test. Although some benefit was associated with the matched therapy, most of the patients had disease progression as the best response, indicating the limited biological potential and unclear clinical relevance of this practice.

  18. A Chaos-Based Secure Direct-Sequence/Spread-Spectrum Communication System

    Directory of Open Access Journals (Sweden)

    Nguyen Xuan Quyen

    2013-01-01

    Full Text Available This paper proposes a chaos-based secure direct-sequence/spread-spectrum (DS/SS communication system which is based on a novel combination of the conventional DS/SS and chaos techniques. In the proposed system, bit duration is varied according to a chaotic behavior but is always equal to a multiple of the fixed chip duration in the communication process. Data bits with variable duration are spectrum-spread by multiplying directly with a pseudonoise (PN sequence and then modulated onto a sinusoidal carrier by means of binary phase-shift keying (BPSK. To recover exactly the data bits, the receiver needs an identical regeneration of not only the PN sequence but also the chaotic behavior, and hence data security is improved significantly. Structure and operation of the proposed system are analyzed in detail. Theoretical evaluation of bit-error rate (BER performance in presence of additive white Gaussian noise (AWGN is provided. Parameter choice for different cases of simulation is also considered. Simulation and theoretical results are shown to verify the reliability and feasibility of the proposed system. Security of the proposed system is also discussed.

  19. Protection algorithm for a wind turbine generator based on positive- and negative-sequence fault components

    DEFF Research Database (Denmark)

    Zheng, Tai-Ying; Cha, Seung-Tae; Crossley, Peter A.

    2011-01-01

    A protection relay for a wind turbine generator (WTG) based on positive- and negative-sequence fault components is proposed in the paper. The relay uses the magnitude of the positive-sequence component in the fault current to detect a fault on a parallel WTG, connected to the same power collection...... feeder, or a fault on an adjacent feeder; but for these faults, the relay remains stable and inoperative. A fault on the power collection feeder or a fault on the collection bus, both of which require an instantaneous tripping response, are distinguished from an inter-tie fault or a grid fault, which...... in the fault current is used to decide on either instantaneous or delayed operation. The operating performance of the relay is then verified using various fault scenarios modelled using EMTP-RV. The scenarios involve changes in the position and type of fault, and the faulted phases. Results confirm...

  20. Precision toxicology based on single cell sequencing: an evolving trend in toxicological evaluations and mechanism exploration.

    Science.gov (United States)

    Zhang, Boyang; Huang, Kunlun; Zhu, Liye; Luo, Yunbo; Xu, Wentao

    2017-07-01

    In this review, we introduce a new concept, precision toxicology: the mode of action of chemical- or drug-induced toxicity can be sensitively and specifically investigated by isolating a small group of cells or even a single cell with typical phenotype of interest followed by a single cell sequencing-based analysis. Precision toxicology can contribute to the better detection of subtle intracellular changes in response to exogenous substrates, and thus help researchers find solutions to control or relieve the toxicological effects that are serious threats to human health. We give examples for single cell isolation and recommend laser capture microdissection for in vivo studies and flow cytometric sorting for in vitro studies. In addition, we introduce the procedures for single cell sequencing and describe the expected application of these techniques to toxicological evaluations and mechanism exploration, which we believe will become a trend in toxicology.

  1. Discriminatory usefulness of pulsed-field gel electrophoresis and sequence-based typing in Legionella outbreaks.

    Science.gov (United States)

    Quero, Sara; García-Núñez, Marian; Párraga-Niño, Noemí; Barrabeig, Irene; Pedro-Botet, Maria L; de Simon, Mercè; Sopena, Nieves; Sabrià, Miquel

    2016-06-01

    To compare the discriminatory power of pulsed-field gel electrophoresis (PFGE) and sequence-based typing (SBT) in Legionella outbreaks for determining the infection source. Twenty-five investigations of Legionnaires' disease were analyzed by PFGE, SBT and Dresden monoclonal antibody. The results suggested that monoclonal antibody could reduce the number of Legionella isolates to be characterized by molecular methods. The epidemiological concordance PFGE-SBT was 100%, while the molecular concordance was 64%. Adjusted Wallace index (AW) showed that PFGE has better discriminatory power than SBT (AWSBT→PFGE = 0.767; AWPFGE→SBT = 1). The discrepancies appeared mostly in sequence type (ST) 1, a worldwide distributed ST for which PFGE discriminated different profiles. SBT discriminatory power was not sufficient verifying the infection source, especially in worldwide distributed STs, which were classified into different PFGE patterns.

  2. Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data.

    Science.gov (United States)

    Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H

    2013-05-01

    Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach.

  3. Intergeneric Classification of Genus Bulbophyllum from Peninsular Malaysia Based on Combined Morphological and RBCL Sequence Data

    International Nuclear Information System (INIS)

    Hosseini, S.; Dadkhah, K.

    2016-01-01

    Bulbophyllum Thou. is largest genus in Orchidaceae family and a well-known plant of tropical area. The present study provides a comparative morphological study of 38 Bulbophyllum spp. as well as molecular sequence analysis of large subunit of rubisco (rbcL), to infer the intergeneric classification for studied taxa of genus Bulbophyllum. Thirty morphological characters were coded in a data matrix, and used in phenetic analysis. Morphological result was strongly consistent with earlier classification, with exception of B. auratum, B. gracillimum, B. mutabile and B. limbatum status. Furthermore Molecular data analysis of rbcL was congruent with morphological data in some aspects. Species interrelationships specified using combination of rbcL sequence data with morphological data. The results revealed close affiliation in 11 sections of Bulbophyllum from Peninsular Malaysia. Consequently, based on this study generic status of sections Cirrhopetalum and Epicrianthes cannot longer be supported, as they are deeply embedded within the genus Bulbophyllum. (author)

  4. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  5. ITS-2 sequences-based identification of Trichogramma species in South America

    Directory of Open Access Journals (Sweden)

    R. P. Almeida

    Full Text Available Abstract ITS2 (Internal transcribed spacer 2 sequences have been used in systematic studies and proved to be useful in providing a reliable identification of Trichogramma species. DNAr sequences ranged in size from 379 to 632 bp. In eleven T. pretiosum lines Wolbachia-induced parthenogenesis was found for the first time. These thelytokous lines were collected in Peru (9, Colombia (1 and USA (1. A dichotomous key for species identification was built based on the size of the ITS2 PCR product and restriction analysis using three endonucleases (EcoRI, MseI and MaeI. This molecular technique was successfully used to distinguish among seventeen native/introduced Trichogramma species collected in South America.

  6. High-throughput Sequencing Based Immune Repertoire Study during Infectious Disease

    Directory of Open Access Journals (Sweden)

    Dongni Hou

    2016-08-01

    Full Text Available The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases that achieved by traditional techniques and high-throughput sequencing techniques. High-throughput sequencing techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge, and also provides a basis for further development of novel diagnostic markers, immunotherapies and vaccines.

  7. Multiple ECG Fiducial Points-Based Random Binary Sequence Generation for Securing Wireless Body Area Networks.

    Science.gov (United States)

    Zheng, Guanglou; Fang, Gengfa; Shankaran, Rajan; Orgun, Mehmet A; Zhou, Jie; Qiao, Li; Saleem, Kashif

    2017-05-01

    Generating random binary sequences (BSes) is a fundamental requirement in cryptography. A BS is a sequence of N bits, and each bit has a value of 0 or 1. For securing sensors within wireless body area networks (WBANs), electrocardiogram (ECG)-based BS generation methods have been widely investigated in which interpulse intervals (IPIs) from each heartbeat cycle are processed to produce BSes. Using these IPI-based methods to generate a 128-bit BS in real time normally takes around half a minute. In order to improve the time efficiency of such methods, this paper presents an ECG multiple fiducial-points based binary sequence generation (MFBSG) algorithm. The technique of discrete wavelet transforms is employed to detect arrival time of these fiducial points, such as P, Q, R, S, and T peaks. Time intervals between them, including RR, RQ, RS, RP, and RT intervals, are then calculated based on this arrival time, and are used as ECG features to generate random BSes with low latency. According to our analysis on real ECG data, these ECG feature values exhibit the property of randomness and, thus, can be utilized to generate random BSes. Compared with the schemes that solely rely on IPIs to generate BSes, this MFBSG algorithm uses five feature values from one heart beat cycle, and can be up to five times faster than the solely IPI-based methods. So, it achieves a design goal of low latency. According to our analysis, the complexity of the algorithm is comparable to that of fast Fourier transforms. These randomly generated ECG BSes can be used as security keys for encryption or authentication in a WBAN system.

  8. DNA sequence of 15 base pairs is sufficient to mediate both glucocorticoid and progesterone induction of gene expression

    International Nuclear Information System (INIS)

    Straehle, U.; Klock, G.; Schuetz, G.

    1987-01-01

    To define the recognition sequence of the glucocorticoid receptor and its relationship with that of the progesterone receptor, oligonucleotides derived from the glucocorticoid response element of the tyrosine aminotransferase gene were tested upstream of a heterologous promoter for their capacity to mediate effects of these two steroids. The authors show that a 15-base-pair sequence with partial symmetry is sufficient to confer glucocorticoid inducibility on the promoter of the herpes simplex virus thymidine kinase gene. The same 15-base-pair sequence mediates induction by progesterone. Point mutations in the recognition sequence affect inducibility by glucocorticoids and progesterone similarly. Together with the strong conservation of the sequence of the DNA-binding domain of the two receptors, these data suggest that both proteins recognize a sequence that is similar, if not the same

  9. A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation.

    Directory of Open Access Journals (Sweden)

    Rosemary M McCloskey

    2017-11-01

    Full Text Available Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP, which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85% and specificity (91% than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46% as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where

  10. Sequence-Based Introgression Mapping Identifies Candidate White Mold Tolerance Genes in Common Bean

    Directory of Open Access Journals (Sweden)

    Sujan Mamidi

    2016-07-01

    Full Text Available White mold, caused by the necrotrophic fungus (Lib. de Bary, is a major disease of common bean ( L.. WM7.1 and WM8.3 are two quantitative trait loci (QTL with major effects on tolerance to the pathogen. Advanced backcross populations segregating individually for either of the two QTL, and a recombinant inbred (RI population segregating for both QTL were used to fine map and confirm the genetic location of the QTL. The QTL intervals were physically mapped using the reference common bean genome sequence, and the physical intervals for each QTL were further confirmed by sequence-based introgression mapping. Using whole-genome sequence data from susceptible and tolerant DNA pools, introgressed regions were identified as those with significantly higher numbers of single-nucleotide polymorphisms (SNPs relative to the whole genome. By combining the QTL and SNP data, WM7.1 was located to a 660-kb region that contained 41 gene models on the proximal end of chromosome Pv07, while the WM8.3 introgression was narrowed to a 1.36-Mb region containing 70 gene models. The most polymorphic candidate gene in the WM7.1 region encodes a BEACH-domain protein associated with apoptosis. Within the WM8.3 interval, a receptor-like protein with the potential to recognize pathogen effectors was the most polymorphic gene. The use of gene and sequence-based mapping identified two candidate genes whose putative functions are consistent with the current model of pathogenicity.

  11. Molecular Phylogeny of Triticum and Aegilops Genera Based on ITS and MATK Sequence Data

    International Nuclear Information System (INIS)

    Dizkirici, A.; Kansu, C.; Onde, S.

    2016-01-01

    Understanding the phylogenetic relationship between Triticum and Aegilops species, which form a vast gene pool of wheat, is very important for breeding new cultivated wheat varieties. In the present study, phylogenetic relationships between Triticum (12 samples from 4 species) and Aegilops (24 samples from 8 species) were investigated using sequences of the nuclear ITS rDNA gene and partial sequences of the matK gene of chloroplast genome. The phylogenetic relationships among species were reconstructed using Maximum Likelihood method. The constructed tree based on the sequences of the nuclear component (ITS) displayed a close relationship between polyploid wheats and Aegilops speltoides species which provided new evidence for the source of the enigmatic B genome donor as Ae. speltoides. Concurrent clustering of Ae. cylindrica and Ae. tauschii and their close positioning to polyploid wheats pointed the source of the D genome as one of these species. As reported before, diploid Triticum species (i.e. T. urartu) were identified as the A genome donors and the positioning of these diploid wheats on the constructed tree are meaningful. The constructed tree based on the chloroplastic matK sequences displayed same relationship between polyploid wheats and Ae. speltoides species providing evidence for the later species being the chloroplast donors for polyploid wheats. Therefore, our results supported the idea of coinheritance of nuclear and chloroplast genomes where Ae. speltoides was the maternal donor. For both trees the remaining Aegilops species produced a distinct cluster whereas with the exception of T. urartu, diploid Triticum species displayed a monophyletic structure. (author)

  12. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    Directory of Open Access Journals (Sweden)

    Johannes Eichner

    Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  13. A rule of seven in Watson-Crick base-pairing of mismatched sequences.

    Science.gov (United States)

    Cisse, Ibrahim I; Kim, Hajin; Ha, Taekjip

    2012-05-13

    Sequence recognition through base-pairing is essential for DNA repair and gene regulation, but the basic rules governing this process remain elusive. In particular, the kinetics of annealing between two imperfectly matched strands is not well characterized, despite its potential importance in nucleic acid-based biotechnologies and gene silencing. Here we use single-molecule fluorescence to visualize the multiple annealing and melting reactions of two untethered strands inside a porous vesicle, allowing us to precisely quantify the annealing and melting rates. The data as a function of mismatch position suggest that seven contiguous base pairs are needed for rapid annealing of DNA and RNA. This phenomenological rule of seven may underlie the requirement for seven nucleotides of complementarity to seed gene silencing by small noncoding RNA and may help guide performance improvement in DNA- and RNA-based bio- and nanotechnologies, in which off-target effects can be detrimental.

  14. A sampling and metagenomic sequencing-based methodology for monitoring antimicrobial resistance in swine herds

    DEFF Research Database (Denmark)

    Munk, Patrick; Dalhoff Andersen, Vibe; de Knegt, Leonardo

    2016-01-01

    Objectives Reliable methods for monitoring antimicrobial resistance (AMR) in livestock and other reservoirs are essential to understand the trends, transmission and importance of agricultural resistance. Quantification of AMR is mostly done using culture-based techniques, but metagenomic read...... mapping shows promise for quantitative resistance monitoring. Methods We evaluated the ability of: (i) MIC determination for Escherichia coli; (ii) cfu counting of E. coli; (iii) cfu counting of aerobic bacteria; and (iv) metagenomic shotgun sequencing to predict expected tetracycline resistance based...... cultivation-based techniques in terms of predicting expected tetracycline resistance based on antimicrobial consumption. Our metagenomic approach had sufficient resolution to detect antimicrobial-induced changes to individual resistance gene abundances. Pen floor manure samples were found to represent rectal...

  15. Efficient DNA fingerprinting based on the targeted sequencing of active retrotransposon insertion sites using a bench-top high-throughput sequencing platform.

    Science.gov (United States)

    Monden, Yuki; Yamamoto, Ayaka; Shindo, Akiko; Tahara, Makoto

    2014-10-01

    In many crop species, DNA fingerprinting is required for the precise identification of cultivars to protect the rights of breeders. Many families of retrotransposons have multiple copies throughout the eukaryotic genome and their integrated copies are inherited genetically. Thus, their insertion polymorphisms among cultivars are useful for DNA fingerprinting. In this study, we conducted a DNA fingerprinting based on the insertion polymorphisms of active retrotransposon families (Rtsp-1 and LIb) in sweet potato. Using 38 cultivars, we identified 2,024 insertion sites in the two families with an Illumina MiSeq sequencing platform. Of these insertion sites, 91.4% appeared to be polymorphic among the cultivars and 376 cultivar-specific insertion sites were identified, which were converted directly into cultivar-specific sequence-characterized amplified region (SCAR) markers. A phylogenetic tree was constructed using these insertion sites, which corresponded well with known pedigree information, thereby indicating their suitability for genetic diversity studies. Thus, the genome-wide comparative analysis of active retrotransposon insertion sites using the bench-top MiSeq sequencing platform is highly effective for DNA fingerprinting without any requirement for whole genome sequence information. This approach may facilitate the development of practical polymerase chain reaction-based cultivar diagnostic system and could also be applied to the determination of genetic relationships. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  16. Sonication-based isolation and enrichment of Chlorella protothecoides chloroplasts for illumina genome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Angelova, Angelina [University of Arizona; Park, Sang-Hycuk [University of Arizona; Kyndt, John [Bellevue University; Fitzsimmons, Kevin [University of Arizona; Brown, Judith K [University of Arizona

    2013-09-01

    With the increasing world demand for biofuel, a number of oleaginous algal species are being considered as renewable sources of oil. Chlorella protothecoides Krüger synthesizes triacylglycerols (TAGs) as storage compounds that can be converted into renewable fuel utilizing an anabolic pathway that is poorly understood. The paucity of algal chloroplast genome sequences has been an important constraint to chloroplast transformation and for studying gene expression in TAGs pathways. In this study, the intact chloroplasts were released from algal cells using sonication followed by sucrose gradient centrifugation, resulting in a 2.36-fold enrichment of chloroplasts from C. protothecoides, based on qPCR analysis. The C. protothecoides chloroplast genome (cpDNA) was determined using the Illumina HiSeq 2000 sequencing platform and found to be 84,576 Kb in size (8.57 Kb) in size, with a GC content of 30.8 %. This is the first report of an optimized protocol that uses a sonication step, followed by sucrose gradient centrifugation, to release and enrich intact chloroplasts from a microalga (C. prototheocoides) of sufficient quality to permit chloroplast genome sequencing with high coverage, while minimizing nuclear genome contamination. The approach is expected to guide chloroplast isolation from other oleaginous algal species for a variety of uses that benefit from enrichment of chloroplasts, ranging from biochemical analysis to genomics studies.

  17. The Teaching of Biochemistry: An Innovative Course Sequence Based on the Logic of Chemistry

    Science.gov (United States)

    Jakubowski, Henry V.; Owen, Whyte G.

    1998-06-01

    An innovative course sequence for the teaching of biochemistry is offered, which more truly reflects the common philosophy found in biochemistry texts: that the foundation of biological phenomena can best be understood through the logic of chemistry. Topic order is chosen to develop an emerging understanding that is based on chemical principles. Preeminent biological questions serve as a framework for the course. Lipid and lipid-aggregate structures are introduced first, since it is more logical to discuss the intermolecular association of simple amphiphiles to form micelle and bilayer formations than to discuss the complexities of protein structure/folding. Protein, nucleic acid, and carbohydrate structures are studied next. Binding, a noncovalent process and the simplest expression of macromolecular function, follows. The physical (noncovalent) transport of solute molecules across a biological membrane is studied next, followed by the chemical transformation of substrates by enzymes. These are logical extensions of the expression of molecular function, first involving a simpler (physical transport) and second, a more complex (covalent transformation) process. The final sequence involves energy and signal transduction. This unique course sequence emerges naturally when chemical logic is used as an organizing paradigm for structuring a biochemistry course. Traditional order, which seems to reflect historic trends in research, or even an order derived from the central dogma of biology can not provide this logical framework.

  18. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

    Science.gov (United States)

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

    2018-01-04

    Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. A new feedback image encryption scheme based on perturbation with dynamical compound chaotic sequence cipher generator

    Science.gov (United States)

    Tong, Xiaojun; Cui, Minggen; Wang, Zhu

    2009-07-01

    The design of the new compound two-dimensional chaotic function is presented by exploiting two one-dimensional chaotic functions which switch randomly, and the design is used as a chaotic sequence generator which is proved by Devaney's definition proof of chaos. The properties of compound chaotic functions are also proved rigorously. In order to improve the robustness against difference cryptanalysis and produce avalanche effect, a new feedback image encryption scheme is proposed using the new compound chaos by selecting one of the two one-dimensional chaotic functions randomly and a new image pixels method of permutation and substitution is designed in detail by array row and column random controlling based on the compound chaos. The results from entropy analysis, difference analysis, statistical analysis, sequence randomness analysis, cipher sensitivity analysis depending on key and plaintext have proven that the compound chaotic sequence cipher can resist cryptanalytic, statistical and brute-force attacks, and especially it accelerates encryption speed, and achieves higher level of security. By the dynamical compound chaos and perturbation technology, the paper solves the problem of computer low precision of one-dimensional chaotic function.

  20. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    Science.gov (United States)

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  1. Feasibility of a RARE-based sequence for quantitative diffusion-weighted MRI of the spine

    International Nuclear Information System (INIS)

    Raya, J.G.; Dietrich, O.; Sommer, J.; Reiser, M.F.; Baur-Melnyk, A.; Birkenmaier, C.

    2007-01-01

    The feasibility of a diffusion-weighted single-shot fast-spin-echo sequence for the diagnostic work-up of bone marrow diseases was assessed. Twenty healthy controls and 16 patients with various bone marrow pathologies of the spine (bone marrow edema, tumor and inflammation) were examined with a diffusion-weighted single-shot sequence based on a modified rapid acquisition with relaxation enhancement (mRARE) technique; four diffusion weightings (b-values: 50, 250, 500 and 750 s/mm 2 ) in three orthogonal orientations were applied. Apparent diffusion coefficients (ADCs) were determined in the bone marrow and in the intervertebral discs of healthy volunteers and in diseased bone marrow. Ten of the 20 volunteers were repeatedly scanned within 30 min to examine short-time reproducibility. Spatial reproducibility was assessed by measuring ADCs in two different slices including the same lesion in 12 patients. The ADCs of the lesions exhibited significantly higher values, (1.27 ± 0.32) x 10 -3 mm 2 /s, compared with healthy bone marrow, (0.21 ± 0.10) x 10 -3 mm 2 /s. Short-time and spatial reproducibility had a mean coefficient of variation of 2.1% and 6.4%, respectively. The diffusion-weighted mRARE sequence provides a reliable tool for determining quantitative ADCs in vertebral bone marrow with adequate image quality. (orig.)

  2. Genotyping of B. licheniformis based on a novel multi-locus sequence typing (MLST scheme

    Directory of Open Access Journals (Sweden)

    Madslien Elisabeth H

    2012-10-01

    Full Text Available Abstract Background Bacillus licheniformis has for many years been used in the industrial production of enzymes, antibiotics and detergents. However, as a producer of dormant heat-resistant endospores B. licheniformis might contaminate semi-preserved foods. The aim of this study was to establish a robust and novel genotyping scheme for B. licheniformis in order to reveal the evolutionary history of 53 strains of this species. Furthermore, the genotyping scheme was also investigated for its use to detect food-contaminating strains. Results A multi-locus sequence typing (MLST scheme, based on the sequence of six house-keeping genes (adk, ccpA, recF, rpoB, spo0A and sucC of 53 B. licheniformis strains from different sources was established. The result of the MLST analysis supported previous findings of two different subgroups (lineages within this species, named “A” and “B” Statistical analysis of the MLST data indicated a higher rate of recombination within group “A”. Food isolates were widely dispersed in the MLST tree and could not be distinguished from the other strains. However, the food contaminating strain B. licheniformis NVH1032, represented by a unique sequence type (ST8, was distantly related to all other strains. Conclusions In this study, a novel and robust genotyping scheme for B. licheniformis was established, separating the species into two subgroups. This scheme could be used for further studies of evolution and population genetics in B. licheniformis.

  3. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    Science.gov (United States)

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-06-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. [Sequence-based typing of enviromental Legionella pneumophila isolates in Guangzhou].

    Science.gov (United States)

    Zhang, Ying; Qu, Pinghua; Zhang, Jian; Chen, Shouyi

    2011-03-01

    To characterize the genes of Legionella pneumophila isolated from different water source in Guangzhou from 2006 to 2009. To genotype the strains by using sequence-based typing (SBT) scheme. In total 44 L. pneumophila strains were identified by SBT with 7 diversifying genes of flaA, asd, mip, pilE, mompS, proA and neuA. Analysis of the amplicons sequence was taken in the European Working Group for Legionella Infections (EWGLI) international SBT database to obtain the allelic profiles and sequence types (STs). Serogroups were typed by latex agglutination test. Data from SBT revealed a high diversity among the strains and ST01 accounts for 30% (13/ 44). Fifteen new STs were discovered from 20 STs and 2 of them were newly assigned (ST887 and ST888) by EWGLI. SBT Phylogenetic tree was generated by SplitsTree and BURST programs. High diversity and specificity were observed of the L. pneumophila strains in Guangzhou. SBT is useful for L. pneumophila genomic study and epidemiological surveillance.

  5. Moving target detection based on temporal-spatial information fusion for infrared image sequences

    Science.gov (United States)

    Toing, Wu-qin; Xiong, Jin-yu; Zeng, An-jun; Wu, Xiao-ping; Xu, Hao-peng

    2009-07-01

    Moving target detection and localization is one of the most fundamental tasks in visual surveillance. In this paper, through analyzing the advantages and disadvantages of the traditional approaches about moving target detection, a novel approach based on temporal-spatial information fusion is proposed for moving target detection. The proposed method combines the spatial feature in single frame and the temporal properties within multiple frames of an image sequence of moving target. First, the method uses the spatial image segmentation for target separation from background and uses the local temporal variance for extracting targets and wiping off the trail artifact. Second, the logical "and" operator is used to fuse the temporal and spatial information. In the end, to the fusion image sequence, the morphological filtering and blob analysis are used to acquire exact moving target. The algorithm not only requires minimal computation and memory but also quickly adapts to the change of background and environment. Comparing with other methods, such as the KDE, the Mixture of K Gaussians, etc., the simulation results show the proposed method has better validity and higher adaptive for moving target detection, especially in infrared image sequences with complex illumination change, noise change, and so on.

  6. "Polymeromics": Mass spectrometry based strategies in polymer science toward complete sequencing approaches: a review.

    Science.gov (United States)

    Altuntaş, Esra; Schubert, Ulrich S

    2014-01-15

    Mass spectrometry (MS) is the most versatile and comprehensive method in "OMICS" sciences (i.e. in proteomics, genomics, metabolomics and lipidomics). The applications of MS and tandem MS (MS/MS or MS(n)) provide sequence information of the full complement of biological samples in order to understand the importance of the sequences on their precise and specific functions. Nowadays, the control of polymer sequences and their accurate characterization is one of the significant challenges of current polymer science. Therefore, a similar approach can be very beneficial for characterizing and understanding the complex structures of synthetic macromolecules. MS-based strategies allow a relatively precise examination of polymeric structures (e.g. their molar mass distributions, monomer units, side chain substituents, end-group functionalities, and copolymer compositions). Moreover, tandem MS offer accurate structural information from intricate macromolecular structures; however, it produces vast amount of data to interpret. In "OMICS" sciences, the software application to interpret the obtained data has developed satisfyingly (e.g. in proteomics), because it is not possible to handle the amount of data acquired via (tandem) MS studies on the biological samples manually. It can be expected that special software tools will improve the interpretation of (tandem) MS output from the investigations of synthetic polymers as well. Eventually, the MS/MS field will also open up for polymer scientists who are not MS-specialists. In this review, we dissect the overall framework of the MS and MS/MS analysis of synthetic polymers into its key components. We discuss the fundamentals of polymer analyses as well as recent advances in the areas of tandem mass spectrometry, software developments, and the overall future perspectives on the way to polymer sequencing, one of the last Holy Grail in polymer science. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Origin and spread of photosynthesis based upon conserved sequence features in key bacteriochlorophyll biosynthesis proteins.

    Science.gov (United States)

    Gupta, Radhey S

    2012-11-01

    The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been

  8. Sequence homolog-based molecular engineering for shifting the enzymatic pH optimum

    Directory of Open Access Journals (Sweden)

    Fuqiang Ma

    2016-09-01

    Full Text Available Cell-free synthetic biology system organizes multiple enzymes (parts from different sources to implement unnatural catalytic functions. Highly adaption between the catalytic parts is crucial for building up efficient artificial biosynthetic systems. Protein engineering is a powerful technology to tailor various enzymatic properties including catalytic efficiency, substrate specificity, temperature adaptation and even achieve new catalytic functions. However, altering enzymatic pH optimum still remains a challenging task. In this study, we proposed a novel sequence homolog-based protein engineering strategy for shifting the enzymatic pH optimum based on statistical analyses of sequence-function relationship data of enzyme family. By two statistical procedures, artificial neural networks (ANNs and least absolute shrinkage and selection operator (Lasso, five amino acids in GH11 xylanase family were identified to be related to the evolution of enzymatic pH optimum. Site-directed mutagenesis of a thermophilic xylanase from Caldicellulosiruptor bescii revealed that four out of five mutations could alter the enzymatic pH optima toward acidic condition without compromising the catalytic activity and thermostability. Combination of the positive mutants resulted in the best mutant M31 that decreased its pH optimum for 1.5 units and showed increased catalytic activity at pH < 5.0 compared to the wild-type enzyme. Structure analysis revealed that all the mutations are distant from the active center, which may be difficult to be identified by conventional rational design strategy. Interestingly, the four mutation sites are clustered at a certain region of the enzyme, suggesting a potential “hot zone” for regulating the pH optima of xylanases. This study provides an efficient method of modulating enzymatic pH optima based on statistical sequence analyses, which can facilitate the design and optimization of suitable catalytic parts for the construction

  9. Detection and quantification of Plasmodium falciparum in blood samples using quantitative nucleic acid sequence-based amplification

    NARCIS (Netherlands)

    Schoone, G. J.; Oskam, L.; Kroon, N. C.; Schallig, H. D.; Omar, S. A.

    2000-01-01

    A quantitative nucleic acid sequence-based amplification (QT-NASBA) assay for the detection of Plasmodium parasites has been developed. Primers and probes were selected on the basis of the sequence of the small-subunit rRNA gene. Quantification was achieved by coamplification of the RNA in the

  10. Armillaria phylogeny based on tef-1α sequences suggests ongoing divergent speciation within the boreal floristic kingdom

    Science.gov (United States)

    Ned B. Klopfenstein; John W. Hanna; Amy L. Ross-Davis; Jane E. Stewart; Yuko Ota; Rosario Medel-Ortiz; Miguel Armando Lopez-Ramirez; Ruben Damian Elias-Roman; Dionicio Alvarado-Rosales; Mee-Sook Kim

    2013-01-01

    Armillaria plays diverse ecological roles in forests worldwide, which has inspired interest in understanding phylogenetic relationships within and among species of this genus. Previous rDNA sequence-based phylogenetic analyses of Armillaria have shown general relationships among widely divergent taxa, but rDNA sequences were not reliable for separating closely related...

  11. Next Generation Sequencing-Based Analysis of Repetitive DNA in the Model Dioceous Plant Silene latifolia

    Czech Academy of Sciences Publication Activity Database

    Macas, Jiří; Kejnovský, Eduard; Neumann, Pavel; Novák, Petr; Koblížková, Andrea; Vyskot, Boris

    2011-01-01

    Roč. 6, č. 11 (2011), e27335 E-ISSN 1932-6203 R&D Projects: GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004; GA MŠk(CZ) LH11058; GA ČR(CZ) GAP501/10/0102; GA ČR(CZ) GAP305/10/0930 Institutional research plan: CEZ:AV0Z50510513; CEZ:AV0Z50040702 Keywords : Plant genome * Sequencing-Based Analyses * Repetitive DNA * Silene latifolia Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.092, year: 2011

  12. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  13. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng; Hu, ShanShan; Zhang, Jun; Gao, Xin; Li, Jinyan; Xia, Junfeng; Wang, Bing

    2015-01-01

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  14. MARTA: a suite of Java-based tools for assigning taxonomic status to DNA sequences.

    Science.gov (United States)

    Horton, Matthew; Bodenhausen, Natacha; Bergelson, Joy

    2010-02-15

    We have created a suite of Java-based software to better provide taxonomic assignments to DNA sequences. We anticipate that the program will be useful for protistologists, virologists, mycologists and other microbial ecologists. The program relies on NCBI utilities including the BLAST software and Taxonomy database and is easily manipulated at the command-line to specify a BLAST candidate's query-coverage or percent identity requirements; other options include the ability to set minimal consensus requirements (%) for each of the eight major taxonomic ranks (Domain, Kingdom, Phylum, ...) and whether to consider lower scoring candidates when the top-hit lacks taxonomic classification.

  15. Fluorescence turn-on detection of target sequence DNA based on silicon nanodot-mediated quenching.

    Science.gov (United States)

    Zhang, Yanan; Ning, Xinping; Mao, Guobin; Ji, Xinghu; He, Zhike

    2018-05-01

    We have developed a new enzyme-free method for target sequence DNA detection based on the dynamic quenching of fluorescent silicon nanodots (SiNDs) toward Cy5-tagged DNA probe. Fascinatingly, the water-soluble SiNDs can quench the fluorescence of cyanine (Cy5) in Cy5-tagged DNA probe in homogeneous solution, and the fluorescence of Cy5-tagged DNA probe can be restored in the presence of target sequence DNA (the synthetic target miRNA-27a). Based on this phenomenon, a SiND-featured fluorescent sensor has been constructed for "turn-on" detection of the synthetic target miRNA-27a for the first time. This newly developed approach possesses the merits of low cost, simple design, and convenient operation since no enzymatic reaction, toxic reagents, or separation procedures are involved. The established method achieves a detection limit of 0.16 nM, and the relative standard deviation of this method is 9% (1 nM, n = 5). The linear range is 0.5-20 nM, and the recoveries in spiked human fluids are in the range of 90-122%. This protocol provides a new tactic in the development of the nonenzymic miRNA biosensors and opens a promising avenue for early diagnosis of miRNA-associated disease. Graphical abstract The SiND-based fluorescent sensor for detection of S-miR-27a.

  16. Molecular phylogeny of Toxoplasmatinae: comparison between inferences based on mitochondrial and apicoplast genetic sequences

    Directory of Open Access Journals (Sweden)

    Michelle Klein Sercundes

    2016-03-01

    Full Text Available Abstract Phylogenies within Toxoplasmatinae have been widely investigated with different molecular markers. Here, we studied molecular phylogenies of the Toxoplasmatinae subfamily based on apicoplast and mitochondrial genes. Partial sequences of apicoplast genes coding for caseinolytic protease (clpC and beta subunit of RNA polymerase (rpoB, and mitochondrial gene coding for cytochrome B (cytB were analyzed. Laboratory-adapted strains of the closely related parasites Sarcocystis falcatula and Sarcocystis neurona were investigated, along with Neospora caninum, Neospora hughesi, Toxoplasma gondii (strains RH, CTG and PTG, Besnoitia akodoni, Hammondia hammondiand two genetically divergent lineages of Hammondia heydorni. The molecular analysis based on organellar genes did not clearly differentiate between N. caninum and N. hughesi, but the two lineages of H. heydorni were confirmed. Slight differences between the strains of S. falcatula and S. neurona were encountered in all markers. In conclusion, congruent phylogenies were inferred from the three different genes and they might be used for screening undescribed sarcocystid parasites in order to ascertain their phylogenetic relationships with organisms of the family Sarcocystidae. The evolutionary studies based on organelar genes confirm that the genusHammondia is paraphyletic. The primers used for amplification of clpC and rpoB were able to amplify genetic sequences of organisms of the genus Sarcocystisand organisms of the subfamily Toxoplasmatinae as well.

  17. Spatiotemporal Super-Resolution Reconstruction Based on Robust Optical Flow and Zernike Moment for Video Sequences

    Directory of Open Access Journals (Sweden)

    Meiyu Liang

    2013-01-01

    Full Text Available In order to improve the spatiotemporal resolution of the video sequences, a novel spatiotemporal super-resolution reconstruction model (STSR based on robust optical flow and Zernike moment is proposed in this paper, which integrates the spatial resolution reconstruction and temporal resolution reconstruction into a unified framework. The model does not rely on accurate estimation of subpixel motion and is robust to noise and rotation. Moreover, it can effectively overcome the problems of hole and block artifacts. First we propose an efficient robust optical flow motion estimation model based on motion details preserving, then we introduce the biweighted fusion strategy to implement the spatiotemporal motion compensation. Next, combining the self-adaptive region correlation judgment strategy, we construct a fast fuzzy registration scheme based on Zernike moment for better STSR with higher efficiency, and then the final video sequences with high spatiotemporal resolution can be obtained by fusion of the complementary and redundant information with nonlocal self-similarity between the adjacent video frames. Experimental results demonstrate that the proposed method outperforms the existing methods in terms of both subjective visual and objective quantitative evaluations.

  18. An exponential combination procedure for set-based association tests in sequencing studies.

    Science.gov (United States)

    Chen, Lin S; Hsu, Li; Gamazon, Eric R; Cox, Nancy J; Nicolae, Dan L

    2012-12-07

    State-of-the-art next-generation-sequencing technologies can facilitate in-depth explorations of the human genome by investigating both common and rare variants. For the identification of genetic factors that are associated with disease risk or other complex phenotypes, methods have been proposed for jointly analyzing variants in a set (e.g., all coding SNPs in a gene). Variants in a properly defined set could be associated with risk or phenotype in a concerted fashion, and by accumulating information from them, one can improve power to detect genetic risk factors. Many set-based methods in the literature are based on statistics that can be written as the summation of variant statistics. Here, we propose taking the summation of the exponential of variant statistics as the set summary for association testing. From both Bayesian and frequentist perspectives, we provide theoretical justification for taking the sum of the exponential of variant statistics because it is particularly powerful for sparse alternatives-that is, compared with the large number of variants being tested in a set, only relatively few variants are associated with disease risk-a distinctive feature of genetic data. We applied the exponential combination gene-based test to a sequencing study in anticancer pharmacogenomics and uncovered mechanistic insights into genes and pathways related to chemotherapeutic susceptibility for an important class of oncologic drugs. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  19. Sequence-based analysis of the microbial composition of water kefir from multiple sources.

    Science.gov (United States)

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2013-11-01

    Water kefir is a water-sucrose-based beverage, fermented by a symbiosis of bacteria and yeast to produce a final product that is lightly carbonated, acidic and that has a low alcohol percentage. The microorganisms present in water kefir are introduced via water kefir grains, which consist of a polysaccharide matrix in which the microorganisms are embedded. We aimed to provide a comprehensive sequencing-based analysis of the bacterial population of water kefir beverages and grains, while providing an initial insight into the corresponding fungal population. To facilitate this objective, four water kefirs were sourced from the UK, Canada and the United States. Culture-independent, high-throughput, sequencing-based analyses revealed that the bacterial fraction of each water kefir and grain was dominated by Zymomonas, an ethanol-producing bacterium, which has not previously been detected at such a scale. The other genera detected were representatives of the lactic acid bacteria and acetic acid bacteria. Our analysis of the fungal component established that it was comprised of the genera Dekkera, Hanseniaspora, Saccharomyces, Zygosaccharomyces, Torulaspora and Lachancea. This information will assist in the ultimate identification of the microorganisms responsible for the potentially health-promoting attributes of these beverages. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  20. Aviram–Ratner rectifying mechanism for DNA base-pair sequencing through graphene nanogaps

    International Nuclear Information System (INIS)

    Agapito, Luis A; Gayles, Jacob; Wolowiec, Christian; Kioussis, Nicholas

    2012-01-01

    We demonstrate that biological molecules such as Watson–Crick DNA base pairs can behave as biological Aviram–Ratner electrical rectifiers because of the spatial separation and weak hydrogen bonding between the nucleobases. We have performed a parallel computational implementation of the ab initio non-equilibrium Green’s function (NEGF) theory to determine the electrical response of graphene—base-pair—graphene junctions. The results show an asymmetric (rectifying) current–voltage response for the cytosine–guanine base pair adsorbed on a graphene nanogap. In sharp contrast we find a symmetric response for the thymine–adenine case. We propose applying the asymmetry of the current–voltage response as a sensing criterion to the technological challenge of rapid DNA sequencing via graphene nanogaps. (paper)

  1. Automated Clustering Analysis of Immunoglobulin Sequences in Chronic Lymphocytic Leukemia Based on 3D Structural Descriptors

    DEFF Research Database (Denmark)

    Marcatili, Paolo; Mochament, Konstantinos; Agathangelidis, Andreas

    2016-01-01

    study, we used the structure prediction tools PIGS and I-TASSER for creating the 3D models and the TM-align algorithm to superpose them. The innovation of the current methodology resides in the usage of methods adapted from 3D content-based search methodologies to determine the local structural...... determine it are extremely laborious and demanding. Hence, the ability to gain insight into the structure of Igs at large relies on the availability of tools and algorithms for producing accurate Ig structural models based on their primary sequence alone. These models can then be used to determine...... to achieve an optimal solution to this task yet their results were hindered mainly due to the lack of efficient clustering methods based on the similarity of 3D structure descriptors. Here, we present a novel workflow for robust Ig 3D modeling and automated clustering. We validated our protocol in chronic...

  2. Improved protection system for phase faults on marine vessels based on ratio between negative sequence and positive sequence of the fault current

    DEFF Research Database (Denmark)

    Ciontea, Catalin-Iosif; Hong, Qiteng; Booth, Campbell

    2018-01-01

    algorithm is implemented in a programmable digital relay embedded in a hardware-in-the-loop (HIL) test set-up that emulates a typical maritime feeder using a real-time digital simulator. The HIL set-up allows testing of the new protection method under a wide range of faults and network conditions......This study presents a new method to protect the radial feeders on marine vessels. The proposed protection method is effective against phase–phase (PP) faults and is based on evaluation of the ratio between the negative sequence and positive sequence of the fault currents. It is shown...... that the magnitude of the introduced ratio increases significantly during the PP fault, hence indicating the fault presence in an electric network. Here, the theoretical background of the new method of protection is firstly discussed, based on which the new protection algorithm is described afterwards. The proposed...

  3. Cytochrome oxidase-I sequence based studies of commercially available Pangasius hypophthalmus in Italy

    Directory of Open Access Journals (Sweden)

    Federica Bellagamba

    2015-09-01

    Full Text Available Pangasius hypophthalmus is one of the fish consumed in the Italian diet. It is farmed and imported from Mekong delta region of Vietnam. Among several types of Pangasius, Tra (Pangasius hypophthalmus is permitted for sales by the European Union. Since these fish species are often allegedly substituted with other morphologically similar fish due to commercial benefits, authentication of the products in the international markets become often necessary to prevent fraud and safety issues. In addition, this fish is imported as fillets without skin and bone, thus leaving the consumer’s at the risk of buying a substandard nutritional food. In this article we present the molecular approach we developed to identify Pangasius hypophthalmus from other closely related species based on cytochrome oxidase-I (COI mitochondrial barcoding gene and further described the variants in the studied population genetic of this species. Fifty-one samples of Pangasius hypophthalmus fillets labelled as Pangasio were obtained from various markets around Milan and their COI mitochondrial barcoding gene was sequenced and studied in our bioinformatics pipeline. All samples were successfully amplified and Basic Local Alignment Search Tool results of the amplified region confirmed that all sequences analysed belonged to Pangasius hypophthalmus. Based on the variations in their barcoding region single nucleotide polymorphisms were identified and delineative statistics was calculated on the sequences. Although Pangasius hypophthalmus is considered as a monophyly, seven polymorphisms were identified. The neighbour-joining tree and the Median-joining network of haplotypes showed for all the identified haplotypes a unique cluster, with the exception of one sample.

  4. A new trilocus sequence-based multiplex-PCR to detect major Acinetobacter baumannii clones.

    Science.gov (United States)

    Martins, Natacha; Picão, Renata Cristina; Cerqueira-Alves, Morgana; Uehara, Aline; Barbosa, Lívia Carvalho; Riley, Lee W; Moreira, Beatriz Meurer

    2016-08-01

    A collection of 163 Acinetobacter baumannii isolates detected in a large Brazilian hospital, was potentially related with the dissemination of four clonal complexes (CC): 113/79, 103/15, 109/1 and 110/25, defined by University of Oxford/Institut Pasteur multilocus sequence typing (MLST) schemes. The urge of a simple multiplex-PCR scheme to specify these clones has motivated the present study. The established trilocus sequence-based typing (3LST, for ompA, csuE and blaOXA-51-like genes) multiplex-PCR rapidly identifies international clones I (CC109/1), II (CC118/2) and III (CC187/3). Thus, the system detects only one (CC109/1) out of four main CC in Brazil. We aimed to develop an alternative multiplex-PCR scheme to detect these clones, known to be present additionally in Africa, Asia, Europe, USA and South America. MLST, performed in the present study to complement typing our whole collection of isolates, confirmed that all isolates belonged to the same four CC detected previously. When typed by 3LST-based multiplex-PCR, only 12% of the 163 isolates were classified into groups. By comparative sequence analysis of ompA, csuE and blaOXA-51-like genes, a set of eight primers was designed for an alternative multiplex-PCR to distinguish the five CC 113/79, 103/15, 109/1, 110/25 and 118/2. Study isolates and one CC118/2 isolate were blind-tested with the new alternative PCR scheme; all were correctly clustered in groups of the corresponding CC. The new multiplex-PCR, with the advantage of fitting in a single reaction, detects five leading A. baumannii clones and could help preventing the spread in healthcare settings. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

    Science.gov (United States)

    Damiati, E; Borsani, G; Giacopuzzi, Edoardo

    2016-05-01

    The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

  6. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    Science.gov (United States)

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  7. Changes in DNA base sequence induced by gamma-ray mutagenesis of lambda phage and prophage

    Energy Technology Data Exchange (ETDEWEB)

    Tindall, K.R.; Stein, J.; Hutchinson, F.

    1988-04-01

    Mutations in the cI (repressor) gene were induced by gamma-ray irradiation of lambda phage and of prophage, and 121 mutations were sequenced. Two-thirds of the mutations in irradiated phage assayed in recA host cells (no induction of the SOS response) were G:C to A:T transitions; it is hypothesized that these may arise during DNA replication from adenine mispairing with a cytosine product deaminated by irradiation. For irradiated phage assayed in host cells in which the SOS response had been induced, 85% of the mutations were base substitutions, and in 40 of the 41 base changes, a preexisting base pair had been replaced by an A:T pair; these might come from damaged bases acting as AP (apurinic or apyrimidinic) sites. The remaining mutations were 1 and 2 base deletions. In irradiated prophage, base change mutations involved the substitution of both A:T and of G:C pairs for the preexisting pairs; the substitution of G:C pairs shows that some base substitution mechanism acts on the cell genome but not on the phage. In the irradiated prophage, frameshifts and a significant number of gross rearrangements were also found.

  8. Evaluation of the Terminal Sequencing and Spacing System for Performance Based Navigation Arrivals

    Science.gov (United States)

    Thipphavong, Jane; Jung, Jaewoo; Swenson, Harry N.; Martin, Lynne; Lin, Melody; Nguyen, Jimmy

    2013-01-01

    NASA has developed the Terminal Sequencing and Spacing (TSS) system, a suite of advanced arrival management technologies combining timebased scheduling and controller precision spacing tools. TSS is a ground-based controller automation tool that facilitates sequencing and merging arrivals that have both current standard ATC routes and terminal Performance-Based Navigation (PBN) routes, especially during highly congested demand periods. In collaboration with the FAA and MITRE's Center for Advanced Aviation System Development (CAASD), TSS system performance was evaluated in human-in-the-loop (HITL) simulations with currently active controllers as participants. Traffic scenarios had mixed Area Navigation (RNAV) and Required Navigation Performance (RNP) equipage, where the more advanced RNP-equipped aircraft had preferential treatment with a shorter approach option. Simulation results indicate the TSS system achieved benefits by enabling PBN, while maintaining high throughput rates-10% above baseline demand levels. Flight path predictability improved, where path deviation was reduced by 2 NM on average and variance in the downwind leg length was 75% less. Arrivals flew more fuel-efficient descents for longer, spending an average of 39 seconds less in step-down level altitude segments. Self-reported controller workload was reduced, with statistically significant differences at the p less than 0.01 level. The RNP-equipped arrivals were also able to more frequently capitalize on the benefits of being "Best-Equipped, Best- Served" (BEBS), where less vectoring was needed and nearly all RNP approaches were conducted without interruption.

  9. Application of Sequence-based Methods in Human MicrobialEcology

    Energy Technology Data Exchange (ETDEWEB)

    Weng, Li; Rubin, Edward M.; Bristow, James

    2005-08-29

    Ecologists studying microbial life in the environment have recognized the enormous complexity of microbial diversity for many years, and the development of a variety of culture-independent methods, many of them coupled with high-throughput DNA sequencing, has allowed this diversity to be explored in ever greater detail. Despite the widespread application of these new techniques to the characterization of uncultivated microbes and microbial communities in the environment, their application to human health and disease has lagged behind. Because DNA based-techniques for defining uncultured microbes allow not only cataloging of microbial diversity, but also insight into microbial functions, investigators are beginning to apply these tools to the microbial communities that abound on and within us, in what has aptly been called the second Human Genome Project. In this review we discuss the sequence-based methods for microbial analysis that are currently available and their application to identify novel human pathogens, improve diagnosis of known infectious diseases, and to advance understanding of our relationship with microbial communities that normally reside in and on the human body.

  10. Team-based learning to improve learning outcomes in a therapeutics course sequence.

    Science.gov (United States)

    Bleske, Barry E; Remington, Tami L; Wells, Trisha D; Dorsch, Michael P; Guthrie, Sally K; Stumpf, Janice L; Alaniz, Marissa C; Ellingrod, Vicki L; Tingen, Jeffrey M

    2014-02-12

    To compare the effectiveness of team-based learning (TBL) to that of traditional lectures on learning outcomes in a therapeutics course sequence. A revised TBL curriculum was implemented in a therapeutic course sequence. Multiple choice and essay questions identical to those used to test third-year students (P3) taught using a traditional lecture format were administered to the second-year pharmacy students (P2) taught using the new TBL format. One hundred thirty-one multiple-choice questions were evaluated; 79 tested recall of knowledge and 52 tested higher level, application of knowledge. For the recall questions, students taught through traditional lectures scored significantly higher compared to the TBL students (88%±12% vs. 82%±16%, p=0.01). For the questions assessing application of knowledge, no differences were seen between teaching pedagogies (81%±16% vs. 77%±20%, p=0.24). Scores on essay questions and the number of students who achieved 100% were also similar between groups. Transition to a TBL format from a traditional lecture-based pedagogy allowed P2 students to perform at a similar level as students with an additional year of pharmacy education on application of knowledge type questions. However, P3 students outperformed P2 students regarding recall type questions and overall. Further assessment of long-term learning outcomes is needed to determine if TBL produces more persistent learning and improved application in clinical settings.

  11. EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

    Science.gov (United States)

    Lian, Yao; Ge, Meng; Pan, Xian-Ming

    2014-12-19

    B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

  12. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  13. Sequence-based prediction of protein protein interaction using a deep-learning algorithm.

    Science.gov (United States)

    Sun, Tanlin; Zhou, Bo; Lai, Luhua; Pei, Jianfeng

    2017-05-25

    Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested. We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods. To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field.

  14. Reference voltage calculation method based on zero-sequence component optimisation for a regional compensation DVR

    Science.gov (United States)

    Jian, Le; Cao, Wang; Jintao, Yang; Yinge, Wang

    2018-04-01

    This paper describes the design of a dynamic voltage restorer (DVR) that can simultaneously protect several sensitive loads from voltage sags in a region of an MV distribution network. A novel reference voltage calculation method based on zero-sequence voltage optimisation is proposed for this DVR to optimise cost-effectiveness in compensation of voltage sags with different characteristics in an ungrounded neutral system. Based on a detailed analysis of the characteristics of voltage sags caused by different types of faults and the effect of the wiring mode of the transformer on these characteristics, the optimisation target of the reference voltage calculation is presented with several constraints. The reference voltages under all types of voltage sags are calculated by optimising the zero-sequence component, which can reduce the degree of swell in the phase-to-ground voltage after compensation to the maximum extent and can improve the symmetry degree of the output voltages of the DVR, thereby effectively increasing the compensation ability. The validity and effectiveness of the proposed method are verified by simulation and experimental results.

  15. A phylogenetic analysis of Diurideae (Orchidaceae) based on plastid DNA sequence data.

    Science.gov (United States)

    Kores, P J; Molvray, M; Weston, P H; Hopper, S D; Brown, A P; Cameron, K M; Chase, M W

    2001-10-01

    DNA sequence data from plastid matK and trnL-F regions were used in phylogenetic analyses of Diurideae, which indicate that Diurideae are not monophyletic as currently delimited. However, if Chloraeinae and Pterostylidinae are excluded from Diurideae, the remaining subtribes form a well-supported, monophyletic group that is sister to a "spiranthid" clade. Chloraea, Gavilea, and Megastylis pro parte (Chloraeinae) are all placed among the spiranthid orchids and form a grade with Pterostylis leading to a monophyletic Cranichideae. Codonorchis, previously included among Chloraeinae, is sister to Orchideae. Within the more narrowly delimited Diurideae two major lineages are apparent. One includes Diuridinae, Cryptostylidinae, Thelymitrinae, and an expanded Drakaeinae; the other includes Caladeniinae s.s., Prasophyllinae, and Acianthinae. The achlorophyllous subtribe Rhizanthellinae is a member of Diurideae, but its placement is otherwise uncertain. The sequence-based trees indicate that some morphological characters used in previous classifications, such as subterranean storage organs, anther position, growth habit, fungal symbionts, and pollination syndromes have more complex evolutionary histories than previously hypothesized. Treatments based upon these characters have produced conflicting classifications, and molecular data offer a tool for reevaluating these phylogenetic hypotheses.

  16. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

    Directory of Open Access Journals (Sweden)

    Manzini Giovanni

    2007-07-01

    Full Text Available Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity, NCD (Normalized Compression Dissimilarity and CD (Compression Dissimilarity. Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC

  17. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

    Science.gov (United States)

    Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

    2007-07-13

    Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at

  18. Study on multiple-hops performance of MOOC sequences-based optical labels for OPS networks

    Science.gov (United States)

    Zhang, Chongfu; Qiu, Kun; Ma, Chunli

    2009-11-01

    In this paper, we utilize a new study method that is under independent case of multiple optical orthogonal codes to derive the probability function of MOOCS-OPS networks, discuss the performance characteristics for a variety of parameters, and compare some characteristics of the system employed by single optical orthogonal code or multiple optical orthogonal codes sequences-based optical labels. The performance of the system is also calculated, and our results verify that the method is effective. Additionally it is found that performance of MOOCS-OPS networks would, negatively, be worsened, compared with single optical orthogonal code-based optical label for optical packet switching (SOOC-OPS); however, MOOCS-OPS networks can greatly enlarge the scalability of optical packet switching networks.

  19. Retrospective Evaluations of Sequences: Testing the Predictions of a Memory-Based Analysis.

    Science.gov (United States)

    Aldrovandi, Silvio; Poirier, Marie; Kusev, Petko; Ayton, Peter

    2015-01-01

    Retrospective evaluation (RE) of event sequences is known to be biased in various ways. The present paper presents a series of studies that examined the suggestion that the moments that are the most accessible in memory at the point of RE contribute to these biases. As predicted by this memory-based analysis, Experiment 1 showed that pleasantness ratings of word lists were biased by the presentation position of a negative item and by how easy the negative information was to retrieve. Experiment 2 ruled out the hypothesis that these findings were due to the dual nature of the task called upon. Experiment 3 further manipulated the memorability of the negative items--and corresponding changes in RE were as predicted. Finally, Experiment 4 extended the findings to more complex stimuli involving event narratives. Overall, the results suggest that assessments were adjusted based on the retrieval of the most readily available information.

  20. Development and Characterization of Simple Sequence Repeat (SSR) Markers Based on RNA-Sequencing of Medicago sativa and In silico Mapping onto the M. truncatula Genome

    Science.gov (United States)

    Wang, Zan; Yu, Guohui; Shi, Binbin; Wang, Xuemin; Qiang, Haiping; Gao, Hongwen

    2014-01-01

    Sufficient codominant genetic markers are needed for various genetic investigations in alfalfa since the species is an outcrossing autotetraploid. With the newly developed next generation sequencing technology, a large amount of transcribed sequences of alfalfa have been generated and are available for identifying SSR markers by data mining. A total of 54,278 alfalfa non-redundant unigenes were assembled through the Illumina HiSeqTM 2000 sequencing technology. Based on 3,903 unigene sequences, 4,493 SSRs were identified. Tri-nucleotide repeats (56.71%) were the most abundant motif class while AG/CT (21.7%), AGG/CCT (19.8%), AAC/GTT (10.3%), ATC/ATG (8.8%), and ACC/GGT (6.3%) were the subsequent top five nucleotide repeat motifs. Eight hundred and thirty- seven EST-SSR primer pairs were successfully designed. Of these, 527 (63%) primer pairs yielded clear and scored PCR products and 372 (70.6%) exhibited polymorphisms. High transferability was observed for ssp falcata at 99.2% (523) and 71.7% (378) in M. truncatula. In addition, 313 of 527 SSR marker sequences were in silico mapped onto the eight M. truncatula chromosomes. Thirty-six polymorphic SSR primer pairs were used in the genetic relatedness analysis of 30 Chinese alfalfa cultivated accessions generating a total of 199 scored alleles. The mean observed heterozygosity and polymorphic information content were 0.767 and 0.635, respectively. The codominant markers not only enriched the current resources of molecular markers in alfalfa, but also would facilitate targeted investigations in marker-trait association, QTL mapping, and genetic diversity analysis in alfalfa. PMID:24642969

  1. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Directory of Open Access Journals (Sweden)

    Wang Yong

    2011-07-01

    Full Text Available Abstract Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes. While in Ab- dataset (antigen-antibody complexes are excluded, there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs. The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine

  2. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Science.gov (United States)

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite

  3. Adaptation of Shift Sequence Based Method for High Number in Shifts Rostering Problem for Health Care Workers

    Directory of Open Access Journals (Sweden)

    Mindaugas Liogys

    2013-08-01

    Full Text Available Purpose—is to investigate a shift sequence-based approach efficiency then problem consisting of a high number of shifts.Research objectives:• Solve health care workers rostering problem using a shift sequence based method.• Measure its efficiency then number of shifts increases.Design/methodology/approach—Usually rostering problems are highly constrained. Constraints are classified to soft and hard constraints. Soft and hard constraints of the problem are additionally classified to: sequence constraints, schedule constraints and roster constraints. Sequence constraints are considered when constructing shift sequences. Schedule constraints are considered when constructing a schedule. Roster constraints are applied, then constructing overall solution, i.e. combining all schedules.Shift sequence based approach consists of two stages:• Shift sequences construction,• The construction of schedules.In the shift sequences construction stage, the shift sequences are constructed for each set of health care workers of different skill, considering sequence constraints. Shifts sequences are ranked by their penalties for easier retrieval in later stage.In schedules construction stage, schedules for each health care worker are constructed iteratively, using the shift sequences produced in stage 1.Shift sequence based method is an adaptive iterative method where health care workers who received the highest schedule penalties in the last iteration are scheduled first at the current iteration.During the roster construction, and after a schedule has been generated for the current health care worker, an improvement method based on an efficient greedy local search is carried out on the partial roster. It simply swaps any pair of shifts between two health care workers in the (partial roster, as long as the swaps satisfy hard constraints and decrease the roster penalty.Findings—Using shift sequence method for solving health care workers rostering problem

  4. Adaptation of Shift Sequence Based Method for High Number in Shifts Rostering Problem for Health Care Workers

    Directory of Open Access Journals (Sweden)

    Mindaugas Liogys

    2011-08-01

    Full Text Available Purpose—is to investigate a shift sequence-based approach efficiency then problem consisting of a high number of shifts. Research objectives:• Solve health care workers rostering problem using a shift sequence based method.• Measure its efficiency then number of shifts increases. Design/methodology/approach—Usually rostering problems are highly constrained.Constraints are classified to soft and hard constraints. Soft and hard constraints of the problem are additionally classified to: sequence constraints, schedule constraints and roster constraints. Sequence constraints are considered when constructing shift sequences. Schedule constraints are considered when constructing a schedule. Roster constraints are applied, then constructing overall solution, i.e. combining all schedules.Shift sequence based approach consists of two stages:• Shift sequences construction,• The construction of schedules.In the shift sequences construction stage, the shift sequences are constructed for each set of health care workers of different skill, considering sequence constraints. Shifts sequences are ranked by their penalties for easier retrieval in later stage.In schedules construction stage, schedules for each health care worker are constructed iteratively, using the shift sequences produced in stage 1. Shift sequence based method is an adaptive iterative method where health care workers who received the highest schedule penalties in the last iteration are scheduled first at the current iteration. During the roster construction, and after a schedule has been generated for the current health care worker, an improvement method based on an efficient greedy local search is carried out on the partial roster. It simply swaps any pair of shifts between two health care workers in the (partial roster, as long as the swaps satisfy hard constraints and decrease the roster penalty.Findings—Using shift sequence method for solving health care workers rostering

  5. Digital Sequences and a Time Reversal-Based Impact Region Imaging and Localization Method

    Science.gov (United States)

    Qiu, Lei; Yuan, Shenfang; Mei, Hanfei; Qian, Weifeng

    2013-01-01

    To reduce time and cost of damage inspection, on-line impact monitoring of aircraft composite structures is needed. A digital monitor based on an array of piezoelectric transducers (PZTs) is developed to record the impact region of impacts on-line. It is small in size, lightweight and has low power consumption, but there are two problems with the impact alarm region localization method of the digital monitor at the current stage. The first one is that the accuracy rate of the impact alarm region localization is low, especially on complex composite structures. The second problem is that the area of impact alarm region is large when a large scale structure is monitored and the number of PZTs is limited which increases the time and cost of damage inspections. To solve the two problems, an impact alarm region imaging and localization method based on digital sequences and time reversal is proposed. In this method, the frequency band of impact response signals is estimated based on the digital sequences first. Then, characteristic signals of impact response signals are constructed by sinusoidal modulation signals. Finally, the phase synthesis time reversal impact imaging method is adopted to obtain the impact region image. Depending on the image, an error ellipse is generated to give out the final impact alarm region. A validation experiment is implemented on a complex composite wing box of a real aircraft. The validation results show that the accuracy rate of impact alarm region localization is approximately 100%. The area of impact alarm region can be reduced and the number of PZTs needed to cover the same impact monitoring region is reduced by more than a half. PMID:24084123

  6. Prevalence and Sequence-Based Identity of Rumen Fluke in Cattle and Deer in New Caledonia.

    Directory of Open Access Journals (Sweden)

    Laura Cauquil

    Full Text Available An abattoir survey was performed in the French Melanesian archipelago of New Caledonia to determine the prevalence of paramphistomes in cattle and deer and to generate material for molecular typing at species and subspecies level. Prevalence in adult cattle was high at animal level (70% of 387 adult cattle and batch level (81%. Prevalence was lower in calves at both levels (33% of 484 calves, 51% at batch level. Animals from 2 of 7 deer farms were positive for rumen fluke, with animal-level prevalence of 41.4% (29/70 and 47.1% (33/70, respectively. Using ITS-2 sequencing, 3 species of paramphistomes were identified, i.e. Calicophoron calicophorum, Fischoederius elongatus and Orthocoelium streptocoelium. All three species were detected in cattle as well as deer, suggesting the possibility of rumen fluke transmission between the two host species. Based on heterogeneity in ITS-2 sequences, the C. calicophorum population comprises two clades, both of which occur in cattle as well as deer. The results suggest two distinct routes of rumen fluke introduction into this area. This approach has wider applicability for investigations of the origin of rumen fluke infections and for the possibility of parasite transmission at the livestock-wildlife interface.

  7. Identifications of Putative PKA Substrates with Quantitative Phosphoproteomics and Primary-Sequence-Based Scoring.

    Science.gov (United States)

    Imamura, Haruna; Wagih, Omar; Niinae, Tomoya; Sugiyama, Naoyuki; Beltrao, Pedro; Ishihama, Yasushi

    2017-04-07

    Protein kinase A (PKA or cAMP-dependent protein kinase) is a serine/threonine kinase that plays essential roles in the regulation of proliferation, differentiation, and apoptosis. To better understand the functions of PKA, it is necessary to elucidate the direct interplay between PKA and their substrates in living human cells. To identify kinase target substrates in a high-throughput manner, we first quantified the change of phosphoproteome in the cells of which PKA activity was perturbed by drug stimulations. LC-MS/MS analyses identified 2755 and 3191 phosphopeptides from experiments with activator or inhibitor of PKA. To exclude potential indirect targets of PKA, we built a computational model to characterize the kinase sequence specificity toward the substrate target site based on known kinase-substrate relationships. Finally, by combining the sequence recognition model with the quantitative changes in phosphorylation measured in the two drug perturbation experiments, we identified 29 reliable candidates of PKA targeting residues in living cells including 8 previously known substrates. Moreover, 18 of these sites were confirmed to be site-specifically phosphorylated in vitro. Altogether this study proposed a confident list of PKA substrate candidates, expanding our knowledge of PKA signaling network.

  8. Internal Transcribed Spacer 1 (ITS1 based sequence typing reveals phylogenetically distinct Ascaris population

    Directory of Open Access Journals (Sweden)

    Koushik Das

    2015-01-01

    Full Text Available Taxonomic differentiation among morphologically identical Ascaris species is a debatable scientific issue in the context of Ascariasis epidemiology. To explain the disease epidemiology and also the taxonomic position of different Ascaris species, genome information of infecting strains from endemic areas throughout the world is certainly crucial. Ascaris population from human has been genetically characterized based on the widely used genetic marker, internal transcribed spacer1 (ITS1. Along with previously reported and prevalent genotype G1, 8 new sequence variants of ITS1 have been identified. Genotype G1 was significantly present among female patients aged between 10 to 15 years. Intragenic linkage disequilibrium (LD analysis at target locus within our study population has identified an incomplete LD value with potential recombination events. A separate cluster of Indian isolates with high bootstrap value indicate their distinct phylogenetic position in comparison to the global Ascaris population. Genetic shuffling through recombination could be a possible reason for high population diversity and frequent emergence of new sequence variants, identified in present and other previous studies. This study explores the genetic organization of Indian Ascaris population for the first time which certainly includes some fundamental information on the molecular epidemiology of Ascariasis.

  9. Time-stretch microscopy based on time-wavelength sequence reconstruction from wideband incoherent source

    International Nuclear Information System (INIS)

    Zhang, Chi; Xu, Yiqing; Wei, Xiaoming; Tsia, Kevin K.; Wong, Kenneth K. Y.

    2014-01-01

    Time-stretch microscopy has emerged as an ultrafast optical imaging concept offering the unprecedented combination of the imaging speed and sensitivity. However, dedicated wideband and coherence optical pulse source with high shot-to-shot stability has been mandated for time-wavelength mapping—the enabling process for ultrahigh speed wavelength-encoded image retrieval. From the practical point of view, exploiting methods to relax the stringent requirements (e.g., temporal stability and coherence) for the source of time-stretch microscopy is thus of great value. In this paper, we demonstrated time-stretch microscopy by reconstructing the time-wavelength mapping sequence from a wideband incoherent source. Utilizing the time-lens focusing mechanism mediated by a narrow-band pulse source, this approach allows generation of a wideband incoherent source, with the spectral efficiency enhanced by a factor of 18. As a proof-of-principle demonstration, time-stretch imaging with the scan rate as high as MHz and diffraction-limited resolution is achieved based on the wideband incoherent source. We note that the concept of time-wavelength sequence reconstruction from wideband incoherent source can also be generalized to any high-speed optical real-time measurements, where wavelength is acted as the information carrier

  10. Phylogenetic relationships of Palaearctic Formica species (Hymenoptera, Formicidae based on mitochondrial cytochrome B sequences.

    Directory of Open Access Journals (Sweden)

    Anna V Goropashnaya

    Full Text Available Ants of genus Formica demonstrate variation in social organization and represent model species for ecological, behavioral, evolutionary studies and testing theoretical implications of the kin selection theory. Subgeneric division of the Formica ants based on morphology has been questioned and remained unclear after an allozyme study on genetic differentiation between 13 species representing all subgenera was conducted. In the present study, the phylogenetic relationships within the genus were examined using mitochondrial DNA sequences of the cytochrome b and a part of the NADH dehydrogenase subunit 6. All 23 Formica species sampled in the Palaearctic clustered according to the subgeneric affiliation except F. uralensis that formed a separate phylogenetic group. Unlike Coptoformica and Formica s. str., the subgenus Serviformica did not form a tight cluster but more likely consisted of a few small clades. The genetic distances between the subgenera were around 10%, implying approximate divergence time of 5 Myr if we used the conventional insect divergence rate of 2% per Myr. Within-subgenus divergence estimates were 6.69% in Serviformica, 3.61% in Coptoformica, 1.18% in Formica s. str., which supported our previous results on relatively rapid speciation in the latter subgenus. The phylogeny inferred from DNA sequences provides a necessary framework against which the evolution of social traits can be compared. We discuss implications of inferred phylogeny for the evolution of social traits.

  11. DEEPre: sequence-based enzyme EC number prediction by deep learning

    KAUST Repository

    Li, Yu

    2017-10-20

    Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre\\'s ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.

  12. REMap: Operon map of M. tuberculosis based on RNA sequence data.

    Science.gov (United States)

    Pelly, Shaaretha; Winglee, Kathryn; Xia, Fang Fang; Stevens, Rick L; Bishai, William R; Lamichhane, Gyanu

    2016-07-01

    A map of the transcriptional organization of genes of an organism is a basic tool that is necessary to understand and facilitate a more accurate genetic manipulation of the organism. Operon maps are largely generated by computational prediction programs that rely on gene conservation and genome architecture and may not be physiologically relevant. With the widespread use of RNA sequencing (RNAseq), the prediction of operons based on actual transcriptome sequencing rather than computational genomics alone is much needed. Here, we report a validated operon map of Mycobacterium tuberculosis, developed using RNAseq data from both the exponential and stationary phases of growth. At least 58.4% of M. tuberculosis genes are organized into 749 operons. Our prediction algorithm, REMap (RNA Expression Mapping of operons), considers the many cases of transcription coverage of intergenic regions, and avoids dependencies on functional annotation and arbitrary assumptions about gene structure. As a result, we demonstrate that REMap is able to more accurately predict operons, especially those that contain long intergenic regions or functionally unrelated genes, than previous operon prediction programs. The REMap algorithm is publicly available as a user-friendly tool that can be readily modified to predict operons in other bacteria. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Micro-motion Recognition of Spatial Cone Target Based on ISAR Image Sequences

    Directory of Open Access Journals (Sweden)

    Changyong Shu

    2016-04-01

    Full Text Available The accurate micro-motions recognition of spatial cone target is the foundation of the characteristic parameter acquisition. For this reason, a micro-motion recognition method based on the distinguishing characteristics extracted from the Inverse Synthetic Aperture Radar (ISAR sequences is proposed in this paper. The projection trajectory formula of cone node strong scattering source and cone bottom slip-type strong scattering sources, which are located on the spatial cone target, are deduced under three micro-motion types including nutation, precession, and spinning, and the correctness is verified by the electromagnetic simulation. By comparison, differences are found among the projection of the scattering sources with different micro-motions, the coordinate information of the scattering sources in the Inverse Synthetic Aperture Radar sequences is extracted by the CLEAN algorithm, and the spinning is recognized by setting the threshold value of Doppler. The double observation points Interacting Multiple Model Kalman Filter is used to separate the scattering sources projection of the nutation target or precession target, and the cross point number of each scattering source’s projection track is used to classify the nutation or precession. Finally, the electromagnetic simulation data are used to verify the effectiveness of the micro-motion recognition method.

  14. A time series based sequence prediction algorithm to detect activities of daily living in smart home.

    Science.gov (United States)

    Marufuzzaman, M; Reaz, M B I; Ali, M A M; Rahman, L F

    2015-01-01

    The goal of smart homes is to create an intelligent environment adapting the inhabitants need and assisting the person who needs special care and safety in their daily life. This can be reached by collecting the ADL (activities of daily living) data and further analysis within existing computing elements. In this research, a very recent algorithm named sequence prediction via enhanced episode discovery (SPEED) is modified and in order to improve accuracy time component is included. The modified SPEED or M-SPEED is a sequence prediction algorithm, which modified the previous SPEED algorithm by using time duration of appliance's ON-OFF states to decide the next state. M-SPEED discovered periodic episodes of inhabitant behavior, trained it with learned episodes, and made decisions based on the obtained knowledge. The results showed that M-SPEED achieves 96.8% prediction accuracy, which is better than other time prediction algorithms like PUBS, ALZ with temporal rules and the previous SPEED. Since human behavior shows natural temporal patterns, duration times can be used to predict future events more accurately. This inhabitant activity prediction system will certainly improve the smart homes by ensuring safety and better care for elderly and handicapped people.

  15. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    Science.gov (United States)

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  16. DEEPre: sequence-based enzyme EC number prediction by deep learning

    KAUST Repository

    Li, Yu; Wang, Sheng; Umarov, Ramzan; Xie, Bingqing; Fan, Ming; Li, Lihua; Gao, Xin

    2017-01-01

    Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.

  17. Population-based statistical inference for temporal sequence of somatic mutations in cancer genomes.

    Science.gov (United States)

    Rhee, Je-Keun; Kim, Tae-Min

    2018-04-20

    It is well recognized that accumulation of somatic mutations in cancer genomes plays a role in carcinogenesis; however, the temporal sequence and evolutionary relationship of somatic mutations remain largely unknown. In this study, we built a population-based statistical framework to infer the temporal sequence of acquisition of somatic mutations. Using the model, we analyzed the mutation profiles of 1954 tumor specimens across eight tumor types. As a result, we identified tumor type-specific directed networks composed of 2-15 cancer-related genes (nodes) and their mutational orders (edges). The most common ancestors identified in pairwise comparison of somatic mutations were TP53 mutations in breast, head/neck, and lung cancers. The known relationship of KRAS to TP53 mutations in colorectal cancers was identified, as well as potential ancestors of TP53 mutation such as NOTCH1, EGFR, and PTEN mutations in head/neck, lung and endometrial cancers, respectively. We also identified apoptosis-related genes enriched with ancestor mutations in lung cancers and a relationship between APC hotspot mutations and TP53 mutations in colorectal cancers. While evolutionary analysis of cancers has focused on clonal versus subclonal mutations identified in individual genomes, our analysis aims to further discriminate ancestor versus descendant mutations in population-scale mutation profiles that may help select cancer drivers with clinical relevance.

  18. Identification of somatic mutations in cancer through Bayesian-based analysis of sequenced genome pairs.

    Science.gov (United States)

    Christoforides, Alexis; Carpten, John D; Weiss, Glen J; Demeure, Michael J; Von Hoff, Daniel D; Craig, David W

    2013-05-04

    The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.

  19. Development of SSR markers for a Tibetan medicinal plant, Lancea tibetica (Phrymaceae), based on RAD sequencing.

    Science.gov (United States)

    Tian, Zunzhe; Zhang, Faqi; Liu, Hairui; Gao, Qingbo; Chen, Shilong

    2016-11-01

    Lancea tibetica (Phrymaceae), a Tibetan medicinal plant, is endemic to the Qinghai-Tibet Plateau. The over-exploitation of wild L. tibetica has led to the destruction of many populations. To enhance protection and management, biological research, especially population genetic studies, should be carried out on L. tibetica . Simple sequence repeat (SSR) markers of L. tibetica were developed to analyze population diversity. Four thousand four hundred and forty-one SSR loci were identified for L. tibetica based on restriction-site associated DNA (RAD) sequencing on the Illumina HiSeq platform. One hundred SSR loci were arbitrarily selected for primer design, and 38 of them were successfully amplified. These markers were tested on 56 individuals from three populations of L. tibetica , and 10 markers displayed polymorphisms. The total number of alleles per locus ranged from three to eight, and observed and expected heterozygosities ranged from 0.200 to 1.000 and 0.683 to 0.879, respectively. We tested for cross-amplification of these 10 markers in the related species L. hirsuta and found that nine could be successfully amplified. The SSR markers characterized here are the first to be developed and tested in L. tibetica . They will be useful for future population genetic studies on L. tibetica and closely related species.

  20. Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism spe...cies Data detail Data name Cluster based on sequence comparison of homologous proteins of 95 organism specie...istory of This Database Site Policy | Contact Us Cluster based on sequence compariso

  1. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    Directory of Open Access Journals (Sweden)

    Oren E Livne

    2015-03-01

    Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  2. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    Science.gov (United States)

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  3. Automatic start-up system of nuclear reactor based on sequence control technology

    International Nuclear Information System (INIS)

    Zhang Yao; Zhang Dafa; Peng Huaqing

    2009-01-01

    A conceptive design of an automatic start-up system based on the sequence control for the nuclear reactors is given in this paper, so as to solve the problems during the start-up process, such as the long operation time, low automatic control level and high accident rate. The start-up process and its requirements are analyzed in detail at first. Then,the principle, the architecture, the key technologies of the automatic start-up system of nuclear reactors are designed and discussed. With the designed system, the automatic start-up of the nuclear reactor can be realized,the work load of the operator can be reduced,and the safety and efficiency of the nuclear power plant during its start-up can be improved. (authors)

  4. Xylariaceae diversity in Thailand and Philippines, based on rDNA sequencing

    Directory of Open Access Journals (Sweden)

    Natarajan Velmurugan

    2013-05-01

    Full Text Available Twenty three different Xylariaceae Tul. & C. Tul were isolatedfrom samples collected from forest zones of Thailand and Philippines.The fungal samples were characterized based on morphological characteristics and nuclear ITS1-5.8S rDNA-ITS2 region sequences. Ten species of Xylaria, two species of Hypoxylon, Biscogniauxia, Rosellinia and one species of Annulohypoxylon and Entonaema were found. Entonaema the distinctive genus of Xylariaceae, isolated in the study from Thailand samples showed a close relationship with Xylaria in phylogenetic tree. Xylariaceous species identified at molecular level showed significant similarity of the morphological characters, such as stromal structure, ascal apex and the germ slit of ascospores. In addition, three species of Arthrinium, two species of Pestalotiopsis were also isolated and characterized in the study. A phylogenetic affinity of Pestalotiopsis with Xylariaceae was found.

  5. Xylariaceae diversity in Thailand and Philippines, based on rDNA sequencing

    Directory of Open Access Journals (Sweden)

    Natarajan Velmurugan

    2013-07-01

    Full Text Available Twenty three different Xylariaceae Tul. & C. Tul were isolated from samples collected from forest zones of Thailand and Philippines. The fungal samples were characterized based on morphological characteristics and nuclear ITS1-5.8S rDNA-ITS2 region sequences. Ten species of Xylaria, two species of Hypoxylon, Biscogniauxia, Rosellinia and one species of Annulohypoxylon and Entonaema were found. Entonaema the distinctive genus of Xylariaceae, isolated in the study from Thailand samples showed a close relationship withXylaria in phylogenetic tree. Xylariaceous species identified at molecular level showed significant similarity of the morphological characters, such as stromal structure, ascal apex and the germ slit of ascospores. In addition, three species of Arthrinium, two species of Pestalotiopsis were also isolated and characterized in the study. A phylogenetic affinity of Pestalotiopsis with Xylariaceae was found.

  6. Personal sleep pattern visualization using sequence-based kernel self-organizing map on sound data.

    Science.gov (United States)

    Wu, Hongle; Kato, Takafumi; Yamada, Tomomi; Numao, Masayuki; Fukui, Ken-Ichi

    2017-07-01

    We propose a method to discover sleep patterns via clustering of sound events recorded during sleep. The proposed method extends the conventional self-organizing map algorithm by kernelization and sequence-based technologies to obtain a fine-grained map that visualizes the distribution and changes of sleep-related events. We introduced features widely applied in sound processing and popular kernel functions to the proposed method to evaluate and compare performance. The proposed method provides a new aspect of sleep monitoring because the results demonstrate that sound events can be directly correlated to an individual's sleep patterns. In addition, by visualizing the transition of cluster dynamics, sleep-related sound events were found to relate to the various stages of sleep. Therefore, these results empirically warrant future study into the assessment of personal sleep quality using sound data. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Bayesian prediction of bacterial growth temperature range based on genome sequences

    DEFF Research Database (Denmark)

    Jensen, Dan Børge; Vesth, Tammi Camilla; Hallin, Peter Fischer

    2012-01-01

    Background: The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based...... on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results: This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles...... that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naive Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic...

  8. Geographic Distribution of Leishmania Species in Ecuador Based on the Cytochrome B Gene Sequence Analysis

    Science.gov (United States)

    Kato, Hirotomo; Gomez, Eduardo A.; Martini-Robles, Luiggi; Muzzio, Jenny; Velez, Lenin; Calvopiña, Manuel; Romero-Alvarez, Daniel; Mimori, Tatsuyuki; Uezato, Hiroshi; Hashiguchi, Yoshihisa

    2016-01-01

    A countrywide epidemiological study was performed to elucidate the current geographic distribution of causative species of cutaneous leishmaniasis (CL) in Ecuador by using FTA card-spotted samples and smear slides as DNA sources. Putative Leishmania in 165 samples collected from patients with CL in 16 provinces of Ecuador were examined at the species level based on the cytochrome b gene sequence analysis. Of these, 125 samples were successfully identified as Leishmania (Viannia) guyanensis, L. (V.) braziliensis, L. (V.) naiffi, L. (V.) lainsoni, and L. (Leishmania) mexicana. Two dominant species, L. (V.) guyanensis and L. (V.) braziliensis, were widely distributed in Pacific coast subtropical and Amazonian tropical areas, respectively. Recently reported L. (V.) naiffi and L. (V.) lainsoni were identified in Amazonian areas, and L. (L.) mexicana was identified in an Andean highland area. Importantly, the present study demonstrated that cases of L. (V.) braziliensis infection are increasing in Pacific coast areas. PMID:27410039

  9. Geographic Distribution of Leishmania Species in Ecuador Based on the Cytochrome B Gene Sequence Analysis.

    Science.gov (United States)

    Kato, Hirotomo; Gomez, Eduardo A; Martini-Robles, Luiggi; Muzzio, Jenny; Velez, Lenin; Calvopiña, Manuel; Romero-Alvarez, Daniel; Mimori, Tatsuyuki; Uezato, Hiroshi; Hashiguchi, Yoshihisa

    2016-07-01

    A countrywide epidemiological study was performed to elucidate the current geographic distribution of causative species of cutaneous leishmaniasis (CL) in Ecuador by using FTA card-spotted samples and smear slides as DNA sources. Putative Leishmania in 165 samples collected from patients with CL in 16 provinces of Ecuador were examined at the species level based on the cytochrome b gene sequence analysis. Of these, 125 samples were successfully identified as Leishmania (Viannia) guyanensis, L. (V.) braziliensis, L. (V.) naiffi, L. (V.) lainsoni, and L. (Leishmania) mexicana. Two dominant species, L. (V.) guyanensis and L. (V.) braziliensis, were widely distributed in Pacific coast subtropical and Amazonian tropical areas, respectively. Recently reported L. (V.) naiffi and L. (V.) lainsoni were identified in Amazonian areas, and L. (L.) mexicana was identified in an Andean highland area. Importantly, the present study demonstrated that cases of L. (V.) braziliensis infection are increasing in Pacific coast areas.

  10. Geographic Distribution of Leishmania Species in Ecuador Based on the Cytochrome B Gene Sequence Analysis.

    Directory of Open Access Journals (Sweden)

    Hirotomo Kato

    2016-07-01

    Full Text Available A countrywide epidemiological study was performed to elucidate the current geographic distribution of causative species of cutaneous leishmaniasis (CL in Ecuador by using FTA card-spotted samples and smear slides as DNA sources. Putative Leishmania in 165 samples collected from patients with CL in 16 provinces of Ecuador were examined at the species level based on the cytochrome b gene sequence analysis. Of these, 125 samples were successfully identified as Leishmania (Viannia guyanensis, L. (V. braziliensis, L. (V. naiffi, L. (V. lainsoni, and L. (Leishmania mexicana. Two dominant species, L. (V. guyanensis and L. (V. braziliensis, were widely distributed in Pacific coast subtropical and Amazonian tropical areas, respectively. Recently reported L. (V. naiffi and L. (V. lainsoni were identified in Amazonian areas, and L. (L. mexicana was identified in an Andean highland area. Importantly, the present study demonstrated that cases of L. (V. braziliensis infection are increasing in Pacific coast areas.

  11. Conditional Probabilities of Large Earthquake Sequences in California from the Physics-based Rupture Simulator RSQSim

    Science.gov (United States)

    Gilchrist, J. J.; Jordan, T. H.; Shaw, B. E.; Milner, K. R.; Richards-Dinger, K. B.; Dieterich, J. H.

    2017-12-01

    Within the SCEC Collaboratory for Interseismic Simulation and Modeling (CISM), we are developing physics-based forecasting models for earthquake ruptures in California. We employ the 3D boundary element code RSQSim (Rate-State Earthquake Simulator of Dieterich & Richards-Dinger, 2010) to generate synthetic catalogs with tens of millions of events that span up to a million years each. This code models rupture nucleation by rate- and state-dependent friction and Coulomb stress transfer in complex, fully interacting fault systems. The Uniform California Earthquake Rupture Forecast Version 3 (UCERF3) fault and deformation models are used to specify the fault geometry and long-term slip rates. We have employed the Blue Waters supercomputer to generate long catalogs of simulated California seismicity from which we calculate the forecasting statistics for large events. We have performed probabilistic seismic hazard analysis with RSQSim catalogs that were calibrated with system-wide parameters and found a remarkably good agreement with UCERF3 (Milner et al., this meeting). We build on this analysis, comparing the conditional probabilities of sequences of large events from RSQSim and UCERF3. In making these comparisons, we consider the epistemic uncertainties associated with the RSQSim parameters (e.g., rate- and state-frictional parameters), as well as the effects of model-tuning (e.g., adjusting the RSQSim parameters to match UCERF3 recurrence rates). The comparisons illustrate how physics-based rupture simulators might assist forecasters in understanding the short-term hazards of large aftershocks and multi-event sequences associated with complex, multi-fault ruptures.

  12. Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

    Directory of Open Access Journals (Sweden)

    Lees Jonathan G

    2008-01-01

    Full Text Available Abstract Background A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. Results In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Conclusion Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained.

  13. Long-PCR based next generation sequencing of the whole mitochondrial genome of the peacock skate Pavoraja nitida (Elasmobranchii: Arhynchobatidae).

    Science.gov (United States)

    Yang, Lei; Naylor, Gavin J P

    2016-01-01

    We determined the complete mitochondrial genome sequence (16,760 bp) of the peacock skate Pavoraja nitida using a long-PCR based next generation sequencing method. It has 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region in the typical vertebrate arrangement. Primers, protocols, and procedures used to obtain this mitogenome are provided. We anticipate that this approach will facilitate rapid collection of mitogenome sequences for studies on phylogenetic relationships, population genetics, and conservation of cartilaginous fishes.

  14. Negative Sequence Droop Method based Hierarchical Control for Low Voltage Ride-Through in Grid-Interactive Microgrids

    DEFF Research Database (Denmark)

    Zhao, Xin; Firoozabadi, Mehdi Savaghebi; Quintero, Juan Carlos Vasquez

    2015-01-01

    . In this paper, a voltage support strategy based on negative sequence droop control, which regulate the positive/negative sequence active and reactive power flow by means of sending proper voltage reference to the inner control loop, is proposed for the grid connected MGs to ride through voltage sags under...... complex line impedance conditions. In this case, the MGs should inject a certain amount of positive and negative sequence power to the grid so that the voltage quality at load side can be maintained at a satisfied level. A two layer hierarchical control strategy is proposed in this paper. The primary...... control loop consists of voltage and current inner loops, conventional droop control and virtual impedance loop while the secondary control loop is based on positive/negative sequence droop control which can achieve power injection under voltage sags. Experimental results with asymmetrical voltage sags...

  15. Delimiting species of Protaphorura (Collembola: Onychiuridae): integrative evidence based on morphology, DNA sequences and geography.

    Science.gov (United States)

    Sun, Xin; Zhang, Feng; Ding, Yinhuan; Davies, Thomas W; Li, Yu; Wu, Donghui

    2017-08-15

    Species delimitation remains a significant challenge when the diagnostic morphological characters are limited. Integrative taxonomy was applied to the genus Protaphorura (Collembola: Onychiuridae), which is one of most difficult soil animals to distinguish taxonomically. Three delimitation approaches (morphology, molecular markers and geography) were applied providing rigorous species validation criteria with an acceptably low error rate. Multiple molecular approaches, including distance- and evolutionary model-based methods, were used to determine species boundaries based on 144 standard barcode sequences. Twenty-two molecular putative species were consistently recovered across molecular and geographical analyses. Geographic criteria were was proved to be an efficient delimitation method for onychiurids. Further morphological examination, based on the combination of the number of pseudocelli, parapseudocelli and ventral mesothoracic chaetae, confirmed 18 taxa of 22 molecular units, with six of them described as new species. These characters were found to be of high taxonomical value. This study highlights the potential benefits of integrative taxonomy, particularly simultaneous use of molecular/geographical tools, as a powerful way of ascertaining the true diversity of the Onychiuridae. Our study also highlights that discovering new morphological characters remains central to achieving a full understanding of collembolan taxonomy.

  16. Complete genome sequence of Klebsiella pneumoniae J1, a protein-based microbial flocculant-producing bacterium.

    Science.gov (United States)

    Pang, Changlong; Li, Ang; Cui, Di; Yang, Jixian; Ma, Fang; Guo, Haijuan

    2016-02-20

    Klebsiella pneumoniae J1 is a Gram-negative strain, which belongs to a protein-based microbial flocculant-producing bacterium. However, little genetic information is known about this species. Here we carried out a whole-genome sequence analysis of this strain and report the complete genome sequence of this organism and its genetic basis for carbohydrate metabolism, capsule biosynthesis and transport system. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Evaluation of an Extremum Seeking Control Based Optimization and Sequencing Strategy for a Chilled-water Plant

    OpenAIRE

    Zhao, Zhongfan; Li, Yaoyu; Mu, Baojie; Salsbury, Timothy I.; House, John M.

    2016-01-01

    Chilled-water plants with multiple chillers account for a significant fraction of energy use in large commercial buildings. Real-time optimization and sequencing of such plants is thus critical for building energy efficiency. Due to the cost and complexity associated with calibrating a chiller plant model to field operation, model-free control has become an attractive solution. Recently, Mu et al. (2015) proposed a model-free real-time optimization and sequencing strategy based on extremum se...

  18. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    Science.gov (United States)

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  19. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    Energy Technology Data Exchange (ETDEWEB)

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  20. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999; FINAL

    International Nuclear Information System (INIS)

    Harger, Carol A.

    1999-01-01

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence.[The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  1. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  2. GntR family of regulators in Mycobacterium smegmatis: a sequence and structure based characterization

    Directory of Open Access Journals (Sweden)

    Ranjan Akash

    2007-08-01

    Full Text Available Abstract Background Mycobacterium smegmatis is fast growing non-pathogenic mycobacteria. This organism has been widely used as a model organism to study the biology of other virulent and extremely slow growing species like Mycobacterium tuberculosis. Based on the homology of the N-terminal DNA binding domain, the recently sequenced genome of M. smegmatis has been shown to possess several putative GntR regulators. A striking characteristic feature of this family of regulators is that they possess a conserved N-terminal DNA binding domain and a diverse C-terminal domain involved in the effector binding and/or oligomerization. Since the physiological role of these regulators is critically dependent upon effector binding and operator sites, we have analysed and classified these regulators into their specific subfamilies and identified their potential binding sites. Results The sequence analysis of M. smegmatis putative GntRs has revealed that FadR, HutC, MocR and the YtrA-like regulators are encoded by 45, 8, 8 and 1 genes respectively. Further out of 45 FadR-like regulators, 19 were classified into the FadR group and 26 into the VanR group. All these proteins showed similar secondary structural elements specific to their respective subfamilies except MSMEG_3959, which showed additional secondary structural elements. Using the reciprocal BLAST searches, we further identified the orthologs of these regulators in Bacillus subtilis and other mycobacteria. Since the expression of many regulators is auto-regulatory, we have identified potential operator sites for a number of these GntR regulators by analyzing the upstream sequences. Conclusion This study helps in extending the annotation of M. smegmatis GntR proteins. It identifies the GntR regulators of M. smegmatis that could serve as a model for studying orthologous regulators from virulent as well as other saprophytic mycobacteria. This study also sheds some light on the nucleotide preferences in the

  3. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    Science.gov (United States)

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  4. A dated molecular phylogeny of manta and devil rays (Mobulidae) based on mitogenome and nuclear sequences.

    Science.gov (United States)

    Poortvliet, Marloes; Olsen, Jeanine L; Croll, Donald A; Bernardi, Giacomo; Newton, Kelly; Kollias, Spyros; O'Sullivan, John; Fernando, Daniel; Stevens, Guy; Galván Magaña, Felipe; Seret, Bernard; Wintner, Sabine; Hoarau, Galice

    2015-02-01

    Manta and devil rays are an iconic group of globally distributed pelagic filter feeders, yet their evolutionary history remains enigmatic. We employed next generation sequencing of mitogenomes for nine of the 11 recognized species and two outgroups; as well as additional Sanger sequencing of two mitochondrial and two nuclear genes in an extended taxon sampling set. Analysis of the mitogenome coding regions in a Maximum Likelihood and Bayesian framework provided a well-resolved phylogeny. The deepest divergences distinguished three clades with high support, one containing Manta birostris, Manta alfredi, Mobula tarapacana, Mobula japanica and Mobula mobular; one containing Mobula kuhlii, Mobula eregoodootenkee and Mobula thurstoni; and one containing Mobula munkiana, Mobula hypostoma and Mobula rochebrunei. Mobula remains paraphyletic with the inclusion of Manta, a result that is in agreement with previous studies based on molecular and morphological data. A fossil-calibrated Bayesian random local clock analysis suggests that mobulids diverged from Rhinoptera around 30 Mya. Subsequent divergences are characterized by long internodes followed by short bursts of speciation extending from an initial episode of divergence in the Early and Middle Miocene (19-17 Mya) to a second episode during the Pliocene and Pleistocene (3.6 Mya - recent). Estimates of divergence dates overlap significantly with periods of global warming, during which upwelling intensity - and related high primary productivity in upwelling regions - decreased markedly. These periods are hypothesized to have led to fragmentation and isolation of feeding regions leading to possible regional extinctions, as well as the promotion of allopatric speciation. The closely shared evolutionary history of mobulids in combination with ongoing threats from fisheries and climate change effects on upwelling and food supply, reinforces the case for greater protection of this charismatic family of pelagic filter feeders

  5. Knowledge-based decision support for Space Station assembly sequence planning

    Science.gov (United States)

    1991-04-01

    A complete Personal Analysis Assistant (PAA) for Space Station Freedom (SSF) assembly sequence planning consists of three software components: the system infrastructure, intra-flight value added, and inter-flight value added. The system infrastructure is the substrate on which software elements providing inter-flight and intra-flight value-added functionality are built. It provides the capability for building representations of assembly sequence plans and specification of constraints and analysis options. Intra-flight value-added provides functionality that will, given the manifest for each flight, define cargo elements, place them in the National Space Transportation System (NSTS) cargo bay, compute performance measure values, and identify violated constraints. Inter-flight value-added provides functionality that will, given major milestone dates and capability requirements, determine the number and dates of required flights and develop a manifest for each flight. The current project is Phase 1 of a projected two phase program and delivers the system infrastructure. Intra- and inter-flight value-added were to be developed in Phase 2, which has not been funded. Based on experience derived from hundreds of projects conducted over the past seven years, ISX developed an Intelligent Systems Engineering (ISE) methodology that combines the methods of systems engineering and knowledge engineering to meet the special systems development requirements posed by intelligent systems, systems that blend artificial intelligence and other advanced technologies with more conventional computing technologies. The ISE methodology defines a phased program process that begins with an application assessment designed to provide a preliminary determination of the relative technical risks and payoffs associated with a potential application, and then moves through requirements analysis, system design, and development.

  6. Isolation of xylose isomerases by sequence- and function-based screening from a soil metagenomic library

    Directory of Open Access Journals (Sweden)

    Parachin Nádia

    2011-05-01

    Full Text Available Abstract Background Xylose isomerase (XI catalyses the isomerisation of xylose to xylulose in bacteria and some fungi. Currently, only a limited number of XI genes have been functionally expressed in Saccharomyces cerevisiae, the microorganism of choice for lignocellulosic ethanol production. The objective of the present study was to search for novel XI genes in the vastly diverse microbial habitat present in soil. As the exploitation of microbial diversity is impaired by the ability to cultivate soil microorganisms under standard laboratory conditions, a metagenomic approach, consisting of total DNA extraction from a given environment followed by cloning of DNA into suitable vectors, was undertaken. Results A soil metagenomic library was constructed and two screening methods based on protein sequence similarity and enzyme activity were investigated to isolate novel XI encoding genes. These two screening approaches identified the xym1 and xym2 genes, respectively. Sequence and phylogenetic analyses revealed that the genes shared 67% similarity and belonged to different bacterial groups. When xym1 and xym2 were overexpressed in a xylA-deficient Escherichia coli strain, similar growth rates to those in which the Piromyces XI gene was expressed were obtained. However, expression in S. cerevisiae resulted in only one-fourth the growth rate of that obtained for the strain expressing the Piromyces XI gene. Conclusions For the first time, the screening of a soil metagenomic library in E. coli resulted in the successful isolation of two active XIs. However, the discrepancy between XI enzyme performance in E. coli and S. cerevisiae suggests that future screening for XI activity from soil should be pursued directly using yeast as a host.

  7. Field-based species identification in eukaryotes using real-time nanopore sequencing.

    OpenAIRE

    Papadopulos, Alexander; Devey, Dion; Helmstetter, Andrew; Parker, Joe

    2017-01-01

    Advances in DNA sequencing and informatics have revolutionised biology over the past four decades, but technological limitations have left many applications unexplored. Recently, portable, real-time, nanopore sequencing (RTnS) has become available. This offers opportunities to rapidly collect and analyse genomic data anywhere. However, the generation of datasets from large, complex genomes has been constrained to laboratories. The portability and long DNA sequences of RTnS offer great potenti...

  8. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    Directory of Open Access Journals (Sweden)

    Jonathan Z Li

    Full Text Available The impact of raltegravir-resistant HIV-1 minority variants (MVs on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs.A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser.Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001. Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454.In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  9. A stochastic context free grammar based framework for analysis of protein sequences

    Directory of Open Access Journals (Sweden)

    Nebel Jean-Christophe

    2009-10-01

    Full Text Available Abstract Background In the last decade, there have been many applications of formal language theory in bioinformatics such as RNA structure prediction and detection of patterns in DNA. However, in the field of proteomics, the size of the protein alphabet and the complexity of relationship between amino acids have mainly limited the application of formal language theory to the production of grammars whose expressive power is not higher than stochastic regular grammars. However, these grammars, like other state of the art methods, cannot cover any higher-order dependencies such as nested and crossing relationships that are common in proteins. In order to overcome some of these limitations, we propose a Stochastic Context Free Grammar based framework for the analysis of protein sequences where grammars are induced using a genetic algorithm. Results This framework was implemented in a system aiming at the production of binding site descriptors. These descriptors not only allow detection of protein regions that are involved in these sites, but also provide insight in their structure. Grammars were induced using quantitative properties of amino acids to deal with the size of the protein alphabet. Moreover, we imposed some structural constraints on grammars to reduce the extent of the rule search space. Finally, grammars based on different properties were combined to convey as much information as possible. Evaluation was performed on sites of various sizes and complexity described either by PROSITE patterns, domain profiles or a set of patterns. Results show the produced binding site descriptors are human-readable and, hence, highlight biologically meaningful features. Moreover, they achieve good accuracy in both annotation and detection. In addition, findings suggest that, unlike current state-of-the-art methods, our system may be particularly suited to deal with patterns shared by non-homologous proteins. Conclusion A new Stochastic Context Free

  10. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

    Science.gov (United States)

    Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

    2014-01-13

    Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.

  11. Weighting sequence variants based on their annotation increases power of whole-genome association studies

    DEFF Research Database (Denmark)

    Sveinbjornsson, Gardar; Albrechtsen, Anders; Zink, Florian

    2016-01-01

    The consensus approach to genome-wide association studies (GWAS) has been to assign equal prior probability of association to all sequence variants tested. However, some sequence variants, such as loss-of-function and missense variants, are more likely than others to affect protein function...... for the family-wise error rate (FWER), using as weights the enrichment of sequence annotations among association signals. We show that this weighted adjustment increases the power to detect association over the standard Bonferroni correction. We use the enrichment of associations by sequence annotation we have...

  12. Sequence-based heuristics for faster annotation of non-coding RNA families.

    Science.gov (United States)

    Weinberg, Zasha; Ruzzo, Walter L

    2006-01-01

    Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that--unlike family-specific solutions--can scale to hundreds of ncRNA families. The source code is available under GNU Public License at the supplementary web site.

  13. A New Targeted CFTR Mutation Panel Based on Next-Generation Sequencing Technology.

    Science.gov (United States)

    Lucarelli, Marco; Porcaro, Luigi; Biffignandi, Alice; Costantino, Lucy; Giannone, Valentina; Alberti, Luisella; Bruno, Sabina Maria; Corbetta, Carlo; Torresani, Erminio; Colombo, Carla; Seia, Manuela

    2017-09-01

    Searching for mutations in the cystic fibrosis transmembrane conductance regulator gene (CFTR) is a key step in the diagnosis of and neonatal and carrier screening for cystic fibrosis (CF), and it has implications for prognosis and personalized therapy. The large number of mutations and genetic and phenotypic variability make this search a complex task. Herein, we developed, validated, and tested a laboratory assay for an extended search for mutations in CFTR using a next-generation sequencing-based method, with a panel of 188 CFTR mutations customized for the Italian population. Overall, 1426 dried blood spots from neonatal screening, 402 genomic DNA samples from various origins, and 1138 genomic DNA samples from patients with CF were analyzed. The assay showed excellent analytical and diagnostic operative characteristics. We identified and experimentally validated 159 (of 188) CFTR mutations. The assay achieved detection rates of 95.0% and 95.6% in two large-scale case series of CF patients from central and northern Italy, respectively. These detection rates are among the highest reported so far with a genetic test for CF based on a mutation panel. This assay appears to be well suited for diagnostics, neonatal and carrier screening, and assisted reproduction, and it represents a considerable advantage in CF genetic counseling. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  14. Identification of metal ion binding sites based on amino acid sequences.

    Science.gov (United States)

    Cao, Xiaoyong; Hu, Xiuzhen; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua

    2017-01-01

    The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html.

  15. DNA interaction with platinum-based cytostatics revealed by DNA sequencing.

    Science.gov (United States)

    Smerkova, Kristyna; Vaculovic, Tomas; Vaculovicova, Marketa; Kynicky, Jindrich; Brtnicky, Martin; Eckschlager, Tomas; Stiborova, Marie; Hubalek, Jaromir; Adam, Vojtech

    2017-12-15

    The main mechanism of action of platinum-based cytostatic drugs - cisplatin, oxaliplatin and carboplatin - is the formation of DNA cross-links, which restricts the transcription due to the disability of DNA to enter the active site of the polymerase. The polymerase chain reaction (PCR) was employed as a simplified model of the amplification process in the cell nucleus. PCR with fluorescently labelled dideoxynucleotides commonly employed for DNA sequencing was used to monitor the effect of platinum-based cytostatics on DNA in terms of decrease in labeling efficiency dependent on a presence of the DNA-drug cross-link. It was found that significantly different amounts of the drugs - cisplatin (0.21 μg/mL), oxaliplatin (5.23 μg/mL), and carboplatin (71.11 μg/mL) - were required to cause the same quenching effect (50%) on the fluorescent labelling of 50 μg/mL of DNA. Moreover, it was found that even though the amounts of the drugs was applied to the reaction mixture differing by several orders of magnitude, the amount of incorporated platinum, quantified by inductively coupled plasma mass spectrometry, was in all cases at the level of tenths of μg per 5 μg of DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. A FAST SEGMENTATION ALGORITHM FOR C-V MODEL BASED ON EXPONENTIAL IMAGE SEQUENCE GENERATION

    Directory of Open Access Journals (Sweden)

    J. Hu

    2017-09-01

    Full Text Available For the island coastline segmentation, a fast segmentation algorithm for C-V model method based on exponential image sequence generation is proposed in this paper. The exponential multi-scale C-V model with level set inheritance and boundary inheritance is developed. The main research contributions are as follows: 1 the problems of the "holes" and "gaps" are solved when extraction coastline through the small scale shrinkage, low-pass filtering and area sorting of region. 2 the initial value of SDF (Signal Distance Function and the level set are given by Otsu segmentation based on the difference of reflection SAR on land and sea, which are finely close to the coastline. 3 the computational complexity of continuous transition are successfully reduced between the different scales by the SDF and of level set inheritance. Experiment results show that the method accelerates the acquisition of initial level set formation, shortens the time of the extraction of coastline, at the same time, removes the non-coastline body part and improves the identification precision of the main body coastline, which automates the process of coastline segmentation.

  17. a Fast Segmentation Algorithm for C-V Model Based on Exponential Image Sequence Generation

    Science.gov (United States)

    Hu, J.; Lu, L.; Xu, J.; Zhang, J.

    2017-09-01

    For the island coastline segmentation, a fast segmentation algorithm for C-V model method based on exponential image sequence generation is proposed in this paper. The exponential multi-scale C-V model with level set inheritance and boundary inheritance is developed. The main research contributions are as follows: 1) the problems of the "holes" and "gaps" are solved when extraction coastline through the small scale shrinkage, low-pass filtering and area sorting of region. 2) the initial value of SDF (Signal Distance Function) and the level set are given by Otsu segmentation based on the difference of reflection SAR on land and sea, which are finely close to the coastline. 3) the computational complexity of continuous transition are successfully reduced between the different scales by the SDF and of level set inheritance. Experiment results show that the method accelerates the acquisition of initial level set formation, shortens the time of the extraction of coastline, at the same time, removes the non-coastline body part and improves the identification precision of the main body coastline, which automates the process of coastline segmentation.

  18. Design and implementation of microcontroller-based automatic sequence counting and switching system

    Directory of Open Access Journals (Sweden)

    Joshua ABOLARINWA

    2015-05-01

    Full Text Available Technological advancement and its influence on human being have been on the increase in recent time. Major areas of such influence, include monitoring and control activities. In order to keep track of human movement in and out of a particular building, there is the need for an automatic counting system. Therefore, in this paper, we present the design and implementation of a microcontroller-based automatic sequence counting and switching system. This system was designed and developed to save cost, time, energy, and to achieve seamless control in the event of switching on or off of electrical appliances within a building. Top-down modular design approach was used in conjunction with the versatility of microcontroller. The system is able to monitor, sequentially count the number of entry and exit of people through an entrance, afterwards, automatically control any electrical device connected to it. From various tests and measurements obtained, there are comparative benefits derived from the deployment of this system in terms of simplicity and accuracy over similar system that is not microcontroller-based. Therefore, this system can be deployed at commercial quantity with wide range of applications in homes, offices and other public places.

  19. Identification of maca (Lepidium meyenii Walp.) and its adulterants by a DNA-barcoding approach based on the ITS sequence.

    Science.gov (United States)

    Chen, Jin-Jin; Zhao, Qing-Sheng; Liu, Yi-Lan; Zha, Sheng-Hua; Zhao, Bing

    2015-09-01

    Maca (Lepidium meyenii) is an herbaceous plant that grows in high plateaus and has been used as both food and folk medicine for centuries because of its benefits to human health. In the present study, ITS (internal transcribed spacer) sequences of forty-three maca samples, collected from different regions or vendors, were amplified and analyzed. The ITS sequences of nineteen potential adulterants of maca were also collected and analyzed. The results indicated that the ITS sequence of maca was consistent in all samples and unique when compared with its adulterants. Therefore, this DNA-barcoding approach based on the ITS sequence can be used for the molecular identification of maca and its adulterants. Copyright © 2015 China Pharmaceutical University. Published by Elsevier B.V. All rights reserved.

  20. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    Science.gov (United States)

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  1. Comparison of base composition analysis and Sanger sequencing of mitochondrial DNA for four U.S. population groups.

    Science.gov (United States)

    Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M

    2014-01-01

    A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays. Published by Elsevier Ireland Ltd.

  2. A dated molecular phylogeny of manta and devil rays (Mobulidae) based on mitogenome and nuclear sequences

    NARCIS (Netherlands)

    Poortvliet, Marloes; Olsen, Jeanine; Croll, Donald A.; Bernardi, Giacomo; Newton, Kelly; Kollias, Spyros; O'Sullivan, John; Fernando, Daniel; Stevens, Guy; Galván Magaña, Felipe; Seret, Bernard; Wintner, Sabine; Hoarau, Galice

    Manta and devil rays are an iconic group of globally distributed pelagic filter feeders, yet their evolutionary history remains enigmatic. We employed next generation sequencing of mitogenomes for nine of the 11 recognized species and two outgroups; as well as additional Sanger sequencing of two

  3. Genomic Variance Estimation Based on Genotyping-by-Sequencing with Different Coverage in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Fé, Dario; Jensen, Just

    2014-01-01

    at each SNP in family pools or polyploids. There are, however, several statistical challenges associated with this method, including low sequencing depth and missing values. Low sequencing depth results in inaccuracies in estimates of allele frequencies for each SNP. In this work we have focused...

  4. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

    Science.gov (United States)

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382

  5. Context-dependent motor skill: perceptual processing in memory-based sequence production

    NARCIS (Netherlands)

    Ruitenberg, M.F.L.; Abrahamse, E.L.; de Kleine, Elian; Verwey, Willem B.

    2012-01-01

    Previous studies have shown that motor sequencing skill can benefit from the reinstatement of the learning context—even with respect to features that are formally not required for appropriate task performance. The present study explored whether such context-dependence develops when sequence

  6. Prevalence of Plasmodium spp. in malaria asymptomatic African migrants assessed by nucleic acid sequence based amplification

    Directory of Open Access Journals (Sweden)

    Schallig Henk DFH

    2009-01-01

    Full Text Available Abstract Background Malaria is one of the most important infectious diseases in the world. Although most cases are found distributed in the tropical regions of Africa, Asia, Central and South Americas, there is in Europe a significant increase in the number of imported cases in non-endemic countries, in particular due to the higher mobility in today's society. Methods The prevalence of a possible asymptomatic infection with Plasmodium species was assessed using Nucleic Acid Sequence Based Amplification (NASBA assays on clinical samples collected from 195 study cases with no clinical signs related to malaria and coming from sub-Saharan African regions to Southern Italy. In addition, base-line demographic, clinical and socio-economic information was collected from study participants who also underwent a full clinical examination. Results Sixty-two study subjects (31.8% were found positive for Plasmodium using a pan Plasmodium specific NASBA which can detect all four Plasmodium species causing human disease, based on the small subunit 18S rRNA gene (18S NASBA. Twenty-four samples (38% of the 62 18S NASBA positive study cases were found positive with a Pfs25 mRNA NASBA, which is specific for the detection of gametocytes of Plasmodium falciparum. A statistically significant association was observed between 18S NASBA positivity and splenomegaly, hepatomegaly and leukopaenia and country of origin. Conclusion This study showed that a substantial proportion of people originating from malaria endemic countries harbor malaria parasites in their blood. If transmission conditions are available, they could potentially be a reservoir. Thefore, health authorities should pay special attention to the health of this potential risk group and aim to improve their health conditions.

  7. Discrimination of multilocus sequence typing-based Campylobacter jejuni subgroups by MALDI-TOF mass spectrometry.

    Science.gov (United States)

    Zautner, Andreas Erich; Masanta, Wycliffe Omurwa; Tareen, Abdul Malik; Weig, Michael; Lugert, Raimond; Groß, Uwe; Bader, Oliver

    2013-11-07

    Campylobacter jejuni, the most common bacterial pathogen causing gastroenteritis, shows a wide genetic diversity. Previously, we demonstrated by the combination of multi locus sequence typing (MLST)-based UPGMA-clustering and analysis of 16 genetic markers that twelve different C. jejuni subgroups can be distinguished. Among these are two prominent subgroups. The first subgroup contains the majority of hyperinvasive strains and is characterized by a dimeric form of the chemotaxis-receptor Tlp7(m+c). The second has an extended amino acid metabolism and is characterized by the presence of a periplasmic asparaginase (ansB) and gamma-glutamyl-transpeptidase (ggt). Phyloproteomic principal component analysis (PCA) hierarchical clustering of MALDI-TOF based intact cell mass spectrometry (ICMS) spectra was able to group particular C. jejuni subgroups of phylogenetic related isolates in distinct clusters. Especially the aforementioned Tlp7(m+c)(+) and ansB+/ ggt+ subgroups could be discriminated by PCA. Overlay of ICMS spectra of all isolates led to the identification of characteristic biomarker ions for these specific C. jejuni subgroups. Thus, mass peak shifts can be used to identify the C. jejuni subgroup with an extended amino acid metabolism. Although the PCA hierarchical clustering of ICMS-spectra groups the tested isolates into a different order as compared to MLST-based UPGMA-clustering, the isolates of the indicator-groups form predominantly coherent clusters. These clusters reflect phenotypic aspects better than phylogenetic clustering, indicating that the genes corresponding to the biomarker ions are phylogenetically coupled to the tested marker genes. Thus, PCA clustering could be an additional tool for analyzing the relatedness of bacterial isolates.

  8. DNA base sequence changes induced by ultraviolet light mutagenesis of a gene on a chromosome in Chinese hamster ovary cells

    Energy Technology Data Exchange (ETDEWEB)

    Romac, S; Leong, P; Sockett, H; Hutchinson, F [Yale Univ., New Haven, CT (USA). Dept. of Molecular Biophysics and Biochemistry

    1989-09-20

    The DNA base sequence changes induced by mutagenesis with ultraviolet light have been determined in a gene on a chromosome of cultured Chinese hamster ovary (CHO) cells. The gene was the Excherichia coli gpt gene, of which a single copy was stably incorporated and expressed in the CHO cell genome. The cells were irradiated with ultraviolet light and gpt{sup -} colonies were selected by resistance to 6-thioguanine. The gpt gene was amplified from chromosomal DNA by use of the polymerase chain reaction (PCR) and the amplified DNA sequenced directly by the dideoxy method. Of the 58 sequenced mutants of independent origin 53 were base change mutations. Forty-one base substitutions were single base changes, ten had two adjacent (or tandem) base changes, and one had two base changes separated by a single base-pair. Only one mutant had a multiple base change mutation with two or more well separated base changes. In contrast much higher levels of such mutations were reported in ultraviolet mutagenesis of genes on a shuttle vector in primate cells. Two deletions of a single base-pair were observed and three deletions ranging from 6 to 37 base-pairs. The mutation spectrum in the gpt gene had similarities to the ultraviolet mutation spectra for several genes in prokaryotes, which suggests similarities in mutational mechanisms in prokaryotes and eukaryotes. (author).

  9. Population clustering based on copy number variations detected from next generation sequencing data.

    Science.gov (United States)

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2014-08-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

  10. Hi-Plex for Simple, Accurate, and Cost-Effective Amplicon-based Targeted DNA Sequencing.

    Science.gov (United States)

    Pope, Bernard J; Hammet, Fleur; Nguyen-Dumont, Tu; Park, Daniel J

    2018-01-01

    Hi-Plex is a suite of methods to enable simple, accurate, and cost-effective highly multiplex PCR-based targeted sequencing (Nguyen-Dumont et al., Biotechniques 58:33-36, 2015). At its core is the principle of using gene-specific primers (GSPs) to "seed" (or target) the reaction and universal primers to "drive" the majority of the reaction. In this manner, effects on amplification efficiencies across the target amplicons can, to a large extent, be restricted to early seeding cycles. Product sizes are defined within a relatively narrow range to enable high-specificity size selection, replication uniformity across target sites (including in the context of fragmented input DNA such as that derived from fixed tumor specimens (Nguyen-Dumont et al., Biotechniques 55:69-74, 2013; Nguyen-Dumont et al., Anal Biochem 470:48-51, 2015), and application of high-specificity genetic variant calling algorithms (Pope et al., Source Code Biol Med 9:3, 2014; Park et al., BMC Bioinformatics 17:165, 2016). Hi-Plex offers a streamlined workflow that is suitable for testing large numbers of specimens without the need for automation.

  11. PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Balachandran Manavalan

    2018-03-01

    Full Text Available Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.

  12. Fuzzy time-series based on Fibonacci sequence for stock price forecasting

    Science.gov (United States)

    Chen, Tai-Liang; Cheng, Ching-Hsue; Jong Teoh, Hia

    2007-07-01

    Time-series models have been utilized to make reasonably accurate predictions in the areas of stock price movements, academic enrollments, weather, etc. For promoting the forecasting performance of fuzzy time-series models, this paper proposes a new model, which incorporates the concept of the Fibonacci sequence, the framework of Song and Chissom's model and the weighted method of Yu's model. This paper employs a 5-year period TSMC (Taiwan Semiconductor Manufacturing Company) stock price data and a 13-year period of TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) stock index data as experimental datasets. By comparing our forecasting performances with Chen's (Forecasting enrollments based on fuzzy time-series. Fuzzy Sets Syst. 81 (1996) 311-319), Yu's (Weighted fuzzy time-series models for TAIEX forecasting. Physica A 349 (2004) 609-624) and Huarng's (The application of neural networks to forecast fuzzy time series. Physica A 336 (2006) 481-491) models, we conclude that the proposed model surpasses in accuracy these conventional fuzzy time-series models.

  13. Attention-Based Recurrent Temporal Restricted Boltzmann Machine for Radar High Resolution Range Profile Sequence Recognition

    Directory of Open Access Journals (Sweden)

    Yifan Zhang

    2018-05-01

    Full Text Available The High Resolution Range Profile (HRRP recognition has attracted great concern in the field of Radar Automatic Target Recognition (RATR. However, traditional HRRP recognition methods failed to model high dimensional sequential data efficiently and have a poor anti-noise ability. To deal with these problems, a novel stochastic neural network model named Attention-based Recurrent Temporal Restricted Boltzmann Machine (ARTRBM is proposed in this paper. RTRBM is utilized to extract discriminative features and the attention mechanism is adopted to select major features. RTRBM is efficient to model high dimensional HRRP sequences because it can extract the information of temporal and spatial correlation between adjacent HRRPs. The attention mechanism is used in sequential data recognition tasks including machine translation and relation classification, which makes the model pay more attention to the major features of recognition. Therefore, the combination of RTRBM and the attention mechanism makes our model effective for extracting more internal related features and choose the important parts of the extracted features. Additionally, the model performs well with the noise corrupted HRRP data. Experimental results on the Moving and Stationary Target Acquisition and Recognition (MSTAR dataset show that our proposed model outperforms other traditional methods, which indicates that ARTRBM extracts, selects, and utilizes the correlation information between adjacent HRRPs effectively and is suitable for high dimensional data or noise corrupted data.

  14. PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.

    Science.gov (United States)

    Manavalan, Balachandran; Shin, Tae H; Lee, Gwang

    2018-01-01

    Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.

  15. Sequence-based analysis of the bacterial and fungal compositions of multiple kombucha (tea fungus) samples.

    Science.gov (United States)

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2014-04-01

    Kombucha is a sweetened tea beverage that, as a consequence of fermentation, contains ethanol, carbon dioxide, a high concentration of acid (gluconic, acetic and lactic) as well as a number of other metabolites and is thought to contain a number of health-promoting components. The sucrose-tea solution is fermented by a symbiosis of bacteria and yeast embedded within a cellulosic pellicle, which forms a floating mat in the tea, and generates a new layer with each successful fermentation. The specific identity of the microbial populations present has been the focus of attention but, to date, the majority of studies have relied on culture-based analyses. To gain a more comprehensive insight into the kombucha microbiota we have carried out the first culture-independent, high-throughput sequencing analysis of the bacterial and fungal populations of 5 distinct pellicles as well as the resultant fermented kombucha at two time points. Following the analysis it was established that the major bacterial genus present was Gluconacetobacter, present at >85% in most samples, with only trace populations of Acetobacter detected (kombucha, also being revealed. The yeast populations were found to be dominated by Zygosaccharomyces at >95% in the fermented beverage, with a greater fungal diversity present in the cellulosic pellicle, including numerous species not identified in kombucha previously. Ultimately, this study represents the most accurate description of the microbiology of kombucha to date. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Constrained paths based on the Farey sequence in learning to juggle.

    Science.gov (United States)

    Yamamoto, Kota; Tsutsui, Seijiro; Yamamoto, Yuji

    2015-12-01

    In this article we report the results of a study conducted to investigate the learning dynamics of three-ball juggling from the perspective of frequency locking. Based on the Farey sequence, we predicted that four stable coordination patterns, corresponding to dwell ratios of 0.83, 0.75, 0.67, and 0.50, would appear in the learning process. We examined the learning process in terms of task performance, taking into account individual differences in the amount of learning. We observed that the participants acquired individual-specific coordination patterns in a relatively early stage of learning, and that those coordination patterns were preserved in subsequent learning, even though performance in terms of number of successful consecutive throws increased substantially. This increase appeared to be related to a reduction in spatial variability of the juggling movements. Finally, the observed coordination patterns were in agreement with the predicted patterns, with the proviso that the pattern corresponding to a dwell ratio of 0.50 was not realized and only a hint of evidence was found for the dwell ratio of 0.67. This implies that the dwell ratios of 0.83 and 0.75 in particular exhibited a stable coordination structure due to strong frequency locking between the temporal variables of juggling. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. System risk evolution analysis and risk critical event identification based on event sequence diagram

    International Nuclear Information System (INIS)

    Luo, Pengcheng; Hu, Yang

    2013-01-01

    During system operation, the environmental, operational and usage conditions are time-varying, which causes the fluctuations of the system state variables (SSVs). These fluctuations change the accidents’ probabilities and then result in the system risk evolution (SRE). This inherent relation makes it feasible to realize risk control by monitoring the SSVs in real time, herein, the quantitative analysis of SRE is essential. Besides, some events in the process of SRE are critical to system risk, because they act like the “demarcative points” of safety and accident, and this characteristic makes each of them a key point of risk control. Therefore, analysis of SRE and identification of risk critical events (RCEs) are remarkably meaningful to ensure the system to operate safely. In this context, an event sequence diagram (ESD) based method of SRE analysis and the related Monte Carlo solution are presented; RCE and risk sensitive variable (RSV) are defined, and the corresponding identification methods are also proposed. Finally, the proposed approaches are exemplified with an accident scenario of an aircraft getting into the icing region

  18. Dynamic Gesture Recognition with a Terahertz Radar Based on Range Profile Sequences and Doppler Signatures.

    Science.gov (United States)

    Zhou, Zhi; Cao, Zongjie; Pi, Yiming

    2017-12-21

    The frequency of terahertz radar ranges from 0.1 THz to 10 THz, which is higher than that of microwaves. Multi-modal signals, including high-resolution range profile (HRRP) and Doppler signatures, can be acquired by the terahertz radar system. These two kinds of information are commonly used in automatic target recognition; however, dynamic gesture recognition is rarely discussed in the terahertz regime. In this paper, a dynamic gesture recognition system using a terahertz radar is proposed, based on multi-modal signals. The HRRP sequences and Doppler signatures were first achieved from the radar echoes. Considering the electromagnetic scattering characteristics, a feature extraction model is designed using location parameter estimation of scattering centers. Dynamic Time Warping (DTW) extended to multi-modal signals is used to accomplish the classifications. Ten types of gesture signals, collected from a terahertz radar, are applied to validate the analysis and the recognition system. The results of the experiment indicate that the recognition rate reaches more than 91%. This research verifies the potential applications of dynamic gesture recognition using a terahertz radar.

  19. Position-specific prediction of methylation sites from sequence conservation based on information theory.

    Science.gov (United States)

    Shi, Yinan; Guo, Yanzhi; Hu, Yayun; Li, Menglong

    2015-07-23

    Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. To fully understand the mechanisms underlying methylation for use in drug design and work in methylation-related diseases, an initial but crucial step is to identify methylation sites. The use of high-throughput bioinformatics methods has become imperative to predict methylation sites. In this study, we developed a novel method that is based only on sequence conservation to predict protein methylation sites. Conservation difference profiles between methylated and non-methylated peptides were constructed by the information entropy (IE) in a wider neighbor interval around the methylation sites that fully incorporated all of the environmental information. Then, the distinctive neighbor residues were identified by the importance scores of information gain (IG). The most representative model was constructed by support vector machine (SVM) for Arginine and Lysine methylation, respectively. This model yielded a promising result on both the benchmark dataset and independent test set. The model was used to screen the entire human proteome, and many unknown substrates were identified. These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation.

  20. Molecular phylogeny of the lionfish genera Dendrochirus and Pterois (Scorpaenidae, Pteroinae) based on mitochondrial DNA sequences.

    Science.gov (United States)

    Kochzius, Marc; Söller, Rainer; Khalaf, Maroof A; Blohm, Dietmar

    2003-09-01

    This study investigates the molecular phylogeny of seven lionfishes of the genera Dendrochirus and Pterois. MP, ML, and NJ phylogenetic analysis based on 964 bp of partial mitochondrial DNA sequences (cytochrome b and 16S rDNA) revealed two main clades: (1) "Pterois" clade (Pterois miles and Pterois volitans), and (2) "Pteropterus-Dendrochirus" clade (remainder of the sampled species). The position of Dendrochirus brachypterus either basal to the main clades or in the "Pteropterus-Dendrochirus" clade cannot be resolved. However, the molecular phylogeny did not support the current separation of the genera Pterois and Dendrochirus. The siblings P. miles and P. volitans are clearly separated and our results support the proposed allopatric or parapatric distribution in the Indian and Pacific Ocean. However, the present analysis cannot reveal if P. miles and P. volitans are separate species or two populations of a single species, because the observed separation in different clades can be either explained by speciation or lineage sorting. Molecular clock estimates for the siblings P. miles and P. volitans suggest a divergence time of 2.4-8.3 mya, which coincide with geological events that created vicariance between populations of the Indian and Pacific Ocean.

  1. HLA genes in Madeira Island (Portugal) inferred from sequence-based typing: footprints from different origins.

    Science.gov (United States)

    Spínola, Hélder; Bruges-Armas, Jácome; Mora, Marian Gantes; Middleton, Derek; Brehm, António

    2006-04-01

    Human leukocyte antigen (HLA)-A, HLA-B, and HLA-DRB1 polymorphisms were examined in Madeira Island populations. The data was obtained at high-resolution level, using sequence-based typing (SBT). The most frequent alleles at each loci were: A*020101 (24.6%), B*5101 (9.7%), B*440201 (9.2%), and DRB1*070101 (15.7%). The predominant three-loci haplotypes in Madeira were A*020101-B*510101-DRB1*130101 (2.7%) and A*010101-B*0801-DRB1*030101 (2.4%), previously found in north and central Portugal. The present study corroborates historical sources and other genetic studies that say Madeira were populated not only by Europeans, mostly Portuguese, but also sub-Saharan Africans due to slave trade. Comparison with other populations shows that Madeira experienced a stronger African influence due to slave trade than Portugal mainland and even the Azores archipelago. Despite this African genetic input, haplotype and allele frequencies were predominantly from European origin, mostly common to mainland Portugal.

  2. Treatment of Rural Wastewater Using a Spiral Fiber Based Salinity-Persistent Sequencing Batch Biofilm Reactor

    Directory of Open Access Journals (Sweden)

    Ying-Xin Zhao

    2017-12-01

    Full Text Available Differing from municipal wastewater, rural wastewater in salinization areas is characterized with arbitrary discharge and high concentration of salt, COD, nitrogen and phosphorus, which would cause severe deterioration of rivers and lakes. To overcome the limits of traditional biological processes, a spiral fiber based salinity-persistent Sequencing Biofilm Batch Reactor (SBBR was developed and investigated with synthetic rural wastewater (COD = 500 mg/L, NH4+-N = 50 mg/L, TP = 6 mg/L under different salinity (0.0–10.0 g/L of NaCl. Results indicated that a quick start-up could be achieved in 15 days, along with sufficient biomass up to 7275 mg/L. During operating period, the removal of COD, NH4+-N, TN was almost not disturbed by salt varying from 0.0 to 10.0 g/L with stable efficiency reaching 92%, 82% and 80%, respectively. Although TP could be removed at high efficiency of 90% in low salinity conditions (from 0.0 to 5.0 g/L of NaCl, it was seriously inhibited due to nitrite accumulation and reduction of Phosphorus Accumulating Organisms (PAOs after addition of 10.0 g/L of salt. The behavior proposed in this study will provide theoretical foundation and guidance for application of SBBR in saline rural wastewater treatment.

  3. A Bayesian-probability-based method for assigning protein backbone dihedral angles based on chemical shifts and local sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wang Jun; Liu Haiyan [University of Science and Technology of China, Hefei National Laboratory for Physical Sciences at the Microscale, and Key Laboratory of Structural Biology, School of Life Sciences (China)], E-mail: hyliu@ustc.edu.cn

    2007-01-15

    Chemical shifts contain substantial information about protein local conformations. We present a method to assign individual protein backbone dihedral angles into specific regions on the Ramachandran map based on the amino acid sequences and the chemical shifts of backbone atoms of tripeptide segments. The method uses a scoring function derived from the Bayesian probability for the central residue of a query tripeptide segment to have a particular conformation. The Ramachandran map is partitioned into representative regions at two levels of resolution. The lower resolution partitioning is equivalent to the conventional definitions of different secondary structure regions on the map. At the higher resolution level, the {alpha} and {beta} regions are further divided into subregions. Predictions are attempted at both levels of resolution. We compared our method with TALOS using the original TALOS database, and obtained comparable results. Although TALOS may produce the best results with currently available databases which are much enlarged, the Bayesian-probability-based approach can provide a quantitative measure for the reliability of predictions.

  4. Interaction Between Hippocampus and Cerebellum Crus I in Sequence-Based but not Place-Based Navigation

    Science.gov (United States)

    Iglói, Kinga; Doeller, Christian F.; Paradis, Anne-Lise; Benchenane, Karim; Berthoz, Alain; Burgess, Neil; Rondi-Reig, Laure

    2015-01-01

    To examine the cerebellar contribution to human spatial navigation we used functional magnetic resonance imaging and virtual reality. Our findings show that the sensory-motor requirements of navigation induce activity in cerebellar lobules and cortical areas known to be involved in the motor loop and vestibular processing. By contrast, cognitive aspects of navigation mainly induce activity in a different cerebellar lobule (VIIA Crus I). Our results demonstrate a functional link between cerebellum and hippocampus in humans and identify specific functional circuits linking lobule VIIA Crus I of the cerebellum to medial parietal, medial prefrontal, and hippocampal cortices in nonmotor aspects of navigation. They further suggest that Crus I belongs to 2 nonmotor loops, involved in different strategies: place-based navigation is supported by coherent activity between left cerebellar lobule VIIA Crus I and medial parietal cortex along with right hippocampus activity, while sequence-based navigation is supported by coherent activity between right lobule VIIA Crus I, medial prefrontal cortex, and left hippocampus. These results highlight the prominent role of the human cerebellum in both motor and cognitive aspects of navigation, and specify the cortico-cerebellar circuits by which it acts depending on the requirements of the task. PMID:24947462

  5. A rank-based sequence aligner with applications in phylogenetic analysis.

    Directory of Open Access Journals (Sweden)

    Liviu P Dinu

    Full Text Available Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD. The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.

  6. Phylogenetic relationships in three species of canine Demodex mite based on partial sequences of mitochondrial 16S rDNA.

    Science.gov (United States)

    Sastre, Natalia; Ravera, Ivan; Villanueva, Sergio; Altet, Laura; Bardagí, Mar; Sánchez, Armand; Francino, Olga; Ferrer, Lluís

    2012-12-01

    The historical classification of Demodex mites has been based on their hosts and morphological features. Genome sequencing has proved to be a very effective taxonomic tool in phylogenetic studies and has been applied in the classification of Demodex. Mitochondrial 16S rDNA has been demonstrated to be an especially useful marker to establish phylogenetic relationships. To amplify and sequence a segment of the mitochondrial 16S rDNA from Demodex canis and Demodex injai, as well as from the short-bodied mite called, unofficially, D. cornei and to determine their genetic proximity. Demodex mites were examined microscopically and classified as Demodex folliculorum (one sample), D. canis (four samples), D. injai (two samples) or the short-bodied species D. cornei (three samples). DNA was extracted, and a 338 bp fragment of the 16S rDNA was amplified and sequenced. The sequences of the four D. canis mites were identical and shared 99.6 and 97.3% identity with two D. canis sequences available at GenBank. The sequences of the D. cornei isolates were identical and showed 97.8, 98.2 and 99.6% identity with the D. canis isolates. The sequences of the two D. injai isolates were also identical and showed 76.6% identity with the D. canis sequence. Demodex canis and D. injai are two different species, with a genetic distance of 23.3%. It would seem that the short-bodied Demodex mite D. cornei is a morphological variant of D. canis. © 2012 The Authors. Veterinary Dermatology © 2012 ESVD and ACVD.

  7. A proposal for accident management optimization based on the study of accident sequence analysis for a BWR

    International Nuclear Information System (INIS)

    Sobajima, M.

    1998-01-01

    The paper describes a proposal for accident management optimization based on the study of accident sequence and source term analyses for a BWR. In Japan, accident management measures are to be implemented in all LWRs by the year 2000 in accordance with the recommendation of the regulatory organization and based on the PSAs carried out by the utilities. Source terms were evaluated by the Japan Atomic Energy Research Institute (JAERI) with the THALES code for all BWR sequences in which loss of decay heat removal resulted in the largest release. Identification of the priority and importance of accident management measures was carried out for the sequences with larger risk contributions. Considerations for optimizing emergency operation guides are believed to be essential for risk reduction. (author)

  8. PRIMAL: Page Rank-Based Indoor Mapping and Localization Using Gene-Sequenced Unlabeled WLAN Received Signal Strength

    Directory of Open Access Journals (Sweden)

    Mu Zhou

    2015-09-01

    Full Text Available Due to the wide deployment of wireless local area networks (WLAN, received signal strength (RSS-based indoor WLAN localization has attracted considerable attention in both academia and industry. In this paper, we propose a novel page rank-based indoor mapping and localization (PRIMAL by using the gene-sequenced unlabeled WLAN RSS for simultaneous localization and mapping (SLAM. Specifically, first of all, based on the observation of the motion patterns of the people in the target environment, we use the Allen logic to construct the mobility graph to characterize the connectivity among different areas of interest. Second, the concept of gene sequencing is utilized to assemble the sporadically-collected RSS sequences into a signal graph based on the transition relations among different RSS sequences. Third, we apply the graph drawing approach to exhibit both the mobility graph and signal graph in a more readable manner. Finally, the page rank (PR algorithm is proposed to construct the mapping from the signal graph into the mobility graph. The experimental results show that the proposed approach achieves satisfactory localization accuracy and meanwhile avoids the intensive time and labor cost involved in the conventional location fingerprinting-based indoor WLAN localization.

  9. PRIMAL: Page Rank-Based Indoor Mapping and Localization Using Gene-Sequenced Unlabeled WLAN Received Signal Strength.

    Science.gov (United States)

    Zhou, Mu; Zhang, Qiao; Xu, Kunjie; Tian, Zengshan; Wang, Yanmeng; He, Wei

    2015-09-25

    Due to the wide deployment of wireless local area networks (WLAN), received signal strength (RSS)-based indoor WLAN localization has attracted considerable attention in both academia and industry. In this paper, we propose a novel page rank-based indoor mapping and localization (PRIMAL) by using the gene-sequenced unlabeled WLAN RSS for simultaneous localization and mapping (SLAM). Specifically, first of all, based on the observation of the motion patterns of the people in the target environment, we use the Allen logic to construct the mobility graph to characterize the connectivity among different areas of interest. Second, the concept of gene sequencing is utilized to assemble the sporadically-collected RSS sequences into a signal graph based on the transition relations among different RSS sequences. Third, we apply the graph drawing approach to exhibit both the mobility graph and signal graph in a more readable manner. Finally, the page rank (PR) algorithm is proposed to construct the mapping from the signal graph into the mobility graph. The experimental results show that the proposed approach achieves satisfactory localization accuracy and meanwhile avoids the intensive time and labor cost involved in the conventional location fingerprinting-based indoor WLAN localization.

  10. Use of the LUS in sequence allele designations to facilitate probabilistic genotyping of NGS-based STR typing results.

    Science.gov (United States)

    Just, Rebecca S; Irwin, Jodi A

    2018-05-01

    Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate - in the nearterm - probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools

  11. Introduction of the hybcell-based compact sequencing technology and comparison to state-of-the-art methodologies for KRAS mutation detection.

    Science.gov (United States)

    Zopf, Agnes; Raim, Roman; Danzer, Martin; Niklas, Norbert; Spilka, Rita; Pröll, Johannes; Gabriel, Christian; Nechansky, Andreas; Roucka, Markus

    2015-03-01

    The detection of KRAS mutations in codons 12 and 13 is critical for anti-EGFR therapy strategies; however, only those methodologies with high sensitivity, specificity, and accuracy as well as the best cost and turnaround balance are suitable for routine daily testing. Here we compared the performance of compact sequencing using the novel hybcell technology with 454 next-generation sequencing (454-NGS), Sanger sequencing, and pyrosequencing, using an evaluation panel of 35 specimens. A total of 32 mutations and 10 wild-type cases were reported using 454-NGS as the reference method. Specificity ranged from 100% for Sanger sequencing to 80% for pyrosequencing. Sanger sequencing and hybcell-based compact sequencing achieved a sensitivity of 96%, whereas pyrosequencing had a sensitivity of 88%. Accuracy was 97% for Sanger sequencing, 85% for pyrosequencing, and 94% for hybcell-based compact sequencing. Quantitative results were obtained for 454-NGS and hybcell-based compact sequencing data, resulting in a significant correlation (r = 0.914). Whereas pyrosequencing and Sanger sequencing were not able to detect multiple mutated cell clones within one tumor specimen, 454-NGS and the hybcell-based compact sequencing detected multiple mutations in two specimens. Our comparison shows that the hybcell-based compact sequencing is a valuable alternative to state-of-the-art methodologies used for detection of clinically relevant point mutations.

  12. Use of amplicon sequencing to improve sensitivity in PCR-based detection of microbial pathogen in environmental samples.

    Science.gov (United States)

    Saingam, Prakit; Li, Bo; Yan, Tao

    2018-06-01

    DNA-based molecular detection of microbial pathogens in complex environments is still plagued by sensitivity, specificity and robustness issues. We propose to address these issues by viewing them as inadvertent consequences of requiring specific and adequate amplification (SAA) of target DNA molecules by current PCR methods. Using the invA gene of Salmonella as the model system, we investigated if next generation sequencing (NGS) can be used to directly detect target sequences in false-negative PCR reaction (PCR-NGS) in order to remove the SAA requirement from PCR. False-negative PCR and qPCR reactions were first created using serial dilutions of laboratory-prepared Salmonella genomic DNA and then analyzed directly by NGS. Target invA sequences were detected in all false-negative PCR and qPCR reactions, which lowered the method detection limits near the theoretical minimum of single gene copy detection. The capability of the PCR-NGS approach in correcting false negativity was further tested and confirmed under more environmentally relevant conditions using Salmonella-spiked stream water and sediment samples. Finally, the PCR-NGS approach was applied to ten urban stream water samples and detected invA sequences in eight samples that would be otherwise deemed Salmonella negative. Analysis of the non-target sequences in the false-negative reactions helped to identify primer dime-like short sequences as the main cause of the false negativity. Together, the results demonstrated that the PCR-NGS approach can significantly improve method sensitivity, correct false-negative detections, and enable sequence-based analysis for failure diagnostics in complex environmental samples. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Multiplexed detection of DNA sequences using a competitive displacement assay in a microfluidic SERRS-based device.

    Science.gov (United States)

    Yazdi, Soroush H; Giles, Kristen L; White, Ian M

    2013-11-05

    We demonstrate sensitive and multiplexed detection of DNA sequences through a surface enhanced resonance Raman spectroscopy (SERRS)-based competitive displacement assay in an integrated microsystem. The use of the competitive displacement scheme, in which the target DNA sequence displaces a Raman-labeled reporter sequence that has lower affinity for the immobilized probe, enables detection of unlabeled target DNA sequences with a simple single-step procedure. In our implementation, the displacement reaction occurs in a microporous packed column of silica beads prefunctionalized with probe-reporter pairs. The use of a functionalized packed-bead column in a microfluidic channel provides two major advantages: (i) immobilization surface chemistry can be performed as a batch process instead of on a chip-by-chip basis, and (ii) the microporous network eliminates the diffusion limitations of a typical biological assay, which increases the sensitivity. Packed silica beads are also leveraged to improve the SERRS detection of the Raman-labeled reporter. Following displacement, the reporter adsorbs onto aggregated silver nanoparticles in a microfluidic mixer; the nanoparticle-reporter conjugates are then trapped and concentrated in the silica bead matrix, which leads to a significant increase in plasmonic nanoparticles and adsorbed Raman reporters within the detection volume as compared to an open microfluidic channel. The experimental results reported here demonstrate detection down to 100 pM of the target DNA sequence, and the experiments are shown to be specific, repeatable, and quantitative. Furthermore, we illustrate the advantage of using SERRS by demonstrating multiplexed detection. The sensitivity of the assay, combined with the advantages of multiplexed detection and single-step operation with unlabeled target sequences makes this method attractive for practical applications. Importantly, while we illustrate DNA sequence detection, the SERRS-based competitive

  14. Simultaneous digital quantification and fluorescence-based size characterization of massively parallel sequencing libraries.

    Science.gov (United States)

    Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H

    2013-08-01

    Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.

  15. Structural characterization of HDPE/LLDPE blend-based nano composites obtained by different blending sequence

    International Nuclear Information System (INIS)

    Passador, Fabio R.; Ruvolo Filho, Adhemar; Pessan, Luiz A.

    2011-01-01

    The blending sequence affects the morphology formation of the nanocomposites. In this work, the blending sequences were explored to determine its influence in the rheological behavior of HDPE/LLDPE/OMMT nanocomposites. The nanocomposites were obtained by melt-intercalation using a mixture of LLDPE-g-MA and HDPE-g-MA as compatibilizer system in a torque rheometer at 180 deg C and five blending sequences were studied. The materials structures were characterized by wide angle X-ray diffraction (WAXD) and by rheological properties. The nanoclay's addition increased the shear viscosity at low shear rates, changing the behavior of HDPE/LLDPE matrix to a Bingham model behavior with an apparent yield stress. Intense interactions were obtained for the blending sequence where LLDPE and/or LLDPE-g-MA were first reinforced with organoclay since the intercalation process occurs preferentially in the amorphous phase. (author)

  16. Matrix based method for synthesis of main intensified and integrated distillation sequences

    International Nuclear Information System (INIS)

    Khalili-Garakani, Amirhossein; Kasiri, Norollah; Ivakpour, Javad

    2016-01-01

    The objective of many studies in this area has involved access to a column-sequencing algorithm enabling designers and researchers alike to generate a wide range of sequences in a broad search space, and be as mathematically and as automated as possible for programing purposes and with good generality. In the present work an algorithm previously developed by the authors, called the matrix method, has been developed much further. The new version of the algorithm includes thermally coupled, thermodynamically equivalent, intensified, simultaneous heat and mass integrated and divided-wall column sequences which are of gross application and provide vast saving potential both on capital investment, operating costs and energy usage in industrial applications. To demonstrate the much wider searchable space now accessible, a three component separation has been thoroughly examined as a case study, always resulting in an integrated sequence being proposed as the optimum.

  17. Phylogenetic analysis of Fusobacterium prausnitzii based upon the 16S rRNA gene sequence and PCR confirmation.

    Science.gov (United States)

    Wang, R F; Cao, W W; Cerniglia, C E

    1996-01-01

    In order to develop a PCR method to detect Fusobacterium prausnitzii in human feces and to clarify the phylogenetic position of this species, its 16S rRNA gene sequence was determined. The sequence described in this paper is different from the 16S rRNA gene sequence is specific for F. prausnitzii, and the results of this assay confirmed that F. prausnitzii is the most common species in human feces. However, a PCR assay based on the original GenBank sequence was negative when it was performed with two strains of F. prausnitzii obtained from the American Type Culture Collection. A phylogenetic tree based on the new 16S rRNA gene sequence was constructed. On this tree F. prausnitzii was not a member of the Fusobacterium group but was closer to some Eubacterium spp. and located between Clostridium "clusters III and IV" (M.D. Collins, P.A. Lawson, A. Willems, J.J. Cordoba, J. Fernandez-Garayzabal, P. Garcia, J. Cai, H. Hippe, and J.A.E. Farrow, Int. J. Syst. Bacteriol. 44:812-826, 1994).

  18. MytiBase: a knowledgebase of mussel (M. galloprovincialis transcribed sequences

    Directory of Open Access Journals (Sweden)

    Roch Philippe

    2009-02-01

    Full Text Available Abstract Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01 was constructed as determined by the high rate of gene discovery (65.6%. Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database http://mussel.cribi.unipd.it. Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels.

  19. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology.

    Science.gov (United States)

    Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi

    2012-07-02

    Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  20. Transcriptome analysis of carnation (Dianthus caryophyllus L. based on next-generation sequencing technology

    Directory of Open Access Journals (Sweden)

    Tanase Koji

    2012-07-01

    Full Text Available Abstract Background Carnation (Dianthus caryophyllus L., in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380 of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  1. Chronostatigraphy of limistone sequency of north Brazilian coast based on data from stronium isotope

    International Nuclear Information System (INIS)

    Takaki, T.; Rodrigues, R.

    1987-01-01

    The strontium isotope composition of marine limestones can be a valuable tool for stratigraphic correlation, in addition to other techniques usually employed for this purpose. The technique can render particularly important in sequences where the fossil assemblage do not present good stratigraphic resolution. As examples, data from Mesozoic and Cenozoic limestone sequences of north Brazilian coast are here presented. Data on strontium isotope composition are compared to those obtained by De Paolo and Ingram in another geographic locations. (author) [pt

  2. Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data

    OpenAIRE

    Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H

    2013-01-01

    Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multip...

  3. VisRseq: R-based visual framework for analysis of sequencing data

    OpenAIRE

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven JM

    2015-01-01

    Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for ...

  4. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences

    KAUST Repository

    Zhang, Zhang; Yu, Jun

    2010-01-01

    Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.

  5. Research based teaching sequence for enhancing electrical capacitance understanding at first fist year of university

    Directory of Open Access Journals (Sweden)

    Jenaro Guisasola

    2011-04-01

    Full Text Available In the electricity curriculum for introductory university physics courses and final secondary school courses, no provision is normally made for a teaching sequence which analyses the transition of specific charges to charged bodies, thus preventing the construction of a model able to explain the aspects connected with the process of charging a body, accumulating the charge and its relation to the potential acquired. This constituted a relevant historical problem and demanded the introduction of a new concept, that of electrical capacitance, to solve it. The aim of the work presented here is to design and assess a teaching sequence which endeavours to overcome the difficulties in learning found in the bibliography. The structure of the sequence was established in activities following a “problematised structure” design. The problems defining the sequence appeared when a step-by-step analysis of the transfer of charges from one body to another was made, by establishing connections between the movement of charges (microscopic level. The results of implementing the sequence indicate that a considerable number of students have achieved a more satisfactory understanding of the electrical capacitance of bodies and charging processes. This seems to confirm that the aspects highlighted in the sequence are relevant to the objectives specified.

  6. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences

    KAUST Repository

    Zhang, Zhang

    2010-11-08

    Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.

  7. Population diversity of Diaphorina citri (Hemiptera: Liviidae) in China based on whole mitochondrial genome sequences.

    Science.gov (United States)

    Wu, Fengnian; Jiang, Hongyan; Beattie, G Andrew C; Holford, Paul; Chen, Jianchi; Wallis, Christopher M; Zheng, Zheng; Deng, Xiaoling; Cen, Yijing

    2018-04-24

    Diaphorina citri (Asian citrus psyllid; ACP) transmits 'Candidatus Liberibacter asiaticus' associated with citrus Huanglongbing (HLB). ACP has been reported in 11 provinces/regions in China, yet its population diversity remains unclear. In this study, we evaluated ACP population diversity in China using representative whole mitochondrial genome (mitogenome) sequences. Additional mitogenome sequences outside China were also acquired and evaluated. The sizes of the 27 ACP mitogenome sequences ranged from 14 986 to 15 030 bp. Along with three previously published mitogenome sequences, the 30 sequences formed three major mitochondrial groups (MGs): MG1, present in southwestern China and occurring at elevations above 1000 m; MG2, present in southeastern China and Southeast Asia (Cambodia, Indonesia, Malaysia, and Vietnam) and occurring at elevations below 180 m; and MG3, present in the USA and Pakistan. Single nucleotide polymorphisms in five genes (cox2, atp8, nad3, nad1 and rrnL) contributed mostly in the ACP diversity. Among these genes, rrnL had the most variation. Mitogenome sequences analyses revealed two major phylogenetic groups of ACP present in China as well as a possible unique group present currently in Pakistan and the USA. The information could have significant implications for current ACP control and HLB management. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.

  8. Phylogeny and evolutionary histories of Pyrus L. revealed by phylogenetic trees and networks based on data from multiple DNA sequences

    Science.gov (United States)

    Reconstructing the phylogeny of Pyrus has been difficult due to the wide distribution of the genus and lack of informative data. In this study, we collected 110 accessions representing 25 Pyrus species and constructed both phylogenetic trees and phylogenetic networks based on multiple DNA sequence d...

  9. Radiocarbon and optically stimulated luminescence dating based chronology of a polycyclic driftsand sequence at Weerterbergen (SE Netherlands)

    NARCIS (Netherlands)

    van Mourik, J.M.; Nierop, K.G.J.; Vandenberghe, D.A.G.

    2010-01-01

    The chronology of polycyclic driftsand sequences in cultural landscapes has mainly been based on the combination of radiocarbon (14C) dating of intercalated organic horizons and pollen analysis. This approach, however, yields indirect age information for the sediment units. Also, as soils are

  10. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs

    DEFF Research Database (Denmark)

    van Beest, M; Dooijes, D; van De Wetering, M

    2000-01-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment...

  11. A comparative study of pseudorandom sequences used in a c-VEP based BCI for online wheelchair control

    DEFF Research Database (Denmark)

    Isaksen, Jonas L.; Mohebbi, Ali; Puthusserypady, Sadasivan

    2016-01-01

    In this study, a c-VEP based BCI system was developed to run on three distinctive pseudorandom sequences, namely the m-code, the Gold-code, and the Barker-code. The Visual Evoked Potentials (VEPs) were provoked using these codes. In the online session, subjects controlled a LEGO® Mindstorms® robot...

  12. The Effects of CBI Lesson Sequence Type and Field Dependence on Learning from Computer-Based Cooperative Instruction in Web

    Science.gov (United States)

    Ipek, Ismail

    2010-01-01

    The purpose of this study was to investigate the effects of CBI lesson sequence type and cognitive style of field dependence on learning from Computer-Based Cooperative Instruction (CBCI) in WEB on the dependent measures, achievement, reading comprehension and reading rate. Eighty-seven college undergraduate students were randomly assigned to…

  13. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    Science.gov (United States)

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  14. Deep sequencing-based identification of small regulatory RNAs in Synechocystis sp. PCC 6803.

    Directory of Open Access Journals (Sweden)

    Wen Xu

    Full Text Available Synechocystis sp. PCC 6803 is a genetically tractable model organism for photosynthesis research. The genome of Synechocystis sp. PCC 6803 consists of a circular chromosome and seven plasmids. The importance of small regulatory RNAs (sRNAs as mediators of a number of cellular processes in bacteria has begun to be recognized. However, little is known regarding sRNAs in Synechocystis sp. PCC 6803. To provide a comprehensive overview of sRNAs in this model organism, the sRNAs of Synechocystis sp. PCC 6803 were analyzed using deep sequencing, and 7,951,189 reads were obtained. High quality mapping reads (6,127,890 were mapped onto the genome and assembled into 16,192 transcribed regions (clusters based on read overlap. A total number of 5211 putative sRNAs were revealed from the genome and the 4 megaplasmids, and 27 of these molecules, including four from plasmids, were confirmed by RT-PCR. In addition, possible target genes regulated by all of the putative sRNAs identified in this study were predicted by IntaRNA and analyzed for functional categorization and biological pathways, which provided evidence that sRNAs are indeed involved in many different metabolic pathways, including basic metabolic pathways, such as glycolysis/gluconeogenesis, the citrate cycle, fatty acid metabolism and adaptations to environmentally stress-induced changes. The information from this study provides a valuable reservoir for understanding the sRNA-mediated regulation of the complex physiology and metabolic processes of cyanobacteria.

  15. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments

    Directory of Open Access Journals (Sweden)

    Jyh-Da Wei

    2017-08-01

    Full Text Available High-end graphics processing units (GPUs, such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1, which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs. Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform. Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.

  16. Polymorphic Microsatellite Markers for the Tetrapolar Anther-Smut Fungus Microbotryum saponariae Based on Genome Sequencing

    Science.gov (United States)

    Fortuna, Taiadjana M.; Snirc, Alodie; Badouin, Hélène; Gouzy, Jérome; Siguenza, Sophie; Esquerre, Diane; Le Prieur, Stéphanie; Shykoff, Jacqui A.; Giraud, Tatiana

    2016-01-01

    Background Anther-smut fungi belonging to the genus Microbotryum sterilize their host plants by aborting ovaries and replacing pollen by fungal spores. Sibling Microbotryum species are highly specialized on their host plants and they have been widely used as models for studies of ecology and evolution of plant pathogenic fungi. However, most studies have focused, so far, on M. lychnidis-dioicae that parasitizes the white campion Silene latifolia. Microbotryum saponariae, parasitizing mainly Saponaria officinalis, is an interesting anther-smut fungus, since it belongs to a tetrapolar lineage (i.e., with two independently segregating mating-type loci), while most of the anther-smut Microbotryum fungi are bipolar (i.e., with a single mating-type locus). Saponaria officinalis is a widespread long-lived perennial plant species with multiple flowering stems, which makes its anther-smut pathogen a good model for studying phylogeography and within-host multiple infections. Principal Findings Here, based on a generated genome sequence of M. saponariae we developed 6 multiplexes with a total of 22 polymorphic microsatellite markers using an inexpensive and efficient method. We scored these markers in fungal individuals collected from 97 populations across Europe, and found that the number of their alleles ranged from 2 to 11, and their expected heterozygosity from 0.01 to 0.58. Cross-species amplification was examined using nine other Microbotryum species parasitizing hosts belonging to Silene, Dianthus and Knautia genera. All loci were successfully amplified in at least two other Microbotryum species. Significance These newly developed markers will provide insights into the population genetic structure and the occurrence of within-host multiple infections of M. saponariae. In addition, the draft genome of M. saponariae, as well as one of the described markers will be useful resources for studying the evolution of the breeding systems in the genus Microbotryum and the

  17. Sequence and 3D structure based analysis of TNT degrading proteins in Arabidopsis thaliana.

    Science.gov (United States)

    Bhattacherjee, Amrita; Mandal, Rahul Shubhra; Das, Santasabuj; Kundu, Sudip

    2014-03-01

    TNT, accidentally released at several manufacturing sites, contaminates ground water and soil. It has a toxic effect to algae and invertebrate, and chronic exposure to TNT also causes harmful effects to human. On the other hand, many plants including Arabidopsis thaliana have the ability to metabolize TNT either completely or at least to a reduced less toxic form. In A. thaliana, the enzyme UDP glucosyltransferase (UDPGT) can further conjugate the reduced forms 2-HADNT and 4-HADNT (2-hydroxylamino-4, 6- dinitrotoluene and 4-hydroxylamino-2, 6- dinitrotoluene) of TNT. Based on the experimental analysis, existing literature and phylogenetic analysis, it is evident that among 107 UDPGT proteins only six are involved in the TNT degrading process. A total of 13 UDPGT proteins including five of these TNT degrading proteins fall within the same group of phylogeny. Thus, these 13 UDPGT proteins have been classified into two groups, TNT-degrading and TNT-non-degrading proteins. To understand the differences in TNT-degrading capacities; using homology modeling we first predicted two structures, taking one representative sequence from both the groups. Next, we performed molecular docking of the modeled structure and TNT reduced form 2-hydroxylamino-4, 6- dinitrotoluene (2-HADNT). We observed that while the Trp residue located within the active site region of the TNT- degrading protein showed π-Cation interaction; such type of interaction was absent in TNT-non-degrading protein, as the respective Trp residue lay outside of the pocket in this case. We observed the conservation of this π-Cation interaction during MD simulation of TNT-degrading protein. Thus, the position and the orientation of the active site residue Trp could explain the presence and absence of TNT-degrading capacity of the UDPGT proteins.

  18. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments.

    Science.gov (United States)

    Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

    2017-01-01

    High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.

  19. ECG based Atrial Fibrillation detection using Sequency Ordered Complex Hadamard Transform and Hybrid Firefly Algorithm

    Directory of Open Access Journals (Sweden)

    Padmavathi Kora

    2017-06-01

    Full Text Available Electrocardiogram (ECG, a non-invasive diagnostic technique, used for detecting cardiac arrhythmia. From last decade industry dealing with biomedical instrumentation and research, demanding an advancement in its ability to distinguish different cardiac arrhythmia. Atrial Fibrillation (AF is an irregular rhythm of the human heart. During AF, the atrial moments are quicker than the normal rate. As blood is not completely ejected out of atria, chances for the formation of blood clots in atrium. These abnormalities in the heart can be identified by the changes in the morphology of the ECG. The first step in the detection of AF is preprocessing of ECG, which removes noise using filters. Feature extraction is the next key process in this research. Recent feature extraction methods, such as Auto Regressive (AR modeling, Magnitude Squared Coherence (MSC and Wavelet Coherence (WTC using standard database (MIT-BIH, yielded a lot of features. Many of these features might be insignificant containing some redundant and non-discriminatory features that introduce computational burden and loss of performance. This paper presents fast Conjugate Symmetric Sequency Ordered Complex Hadamard Transform (CS-SCHT for extracting relevant features from the ECG signal. The sparse matrix factorization method is used for developing fast and efficient CS-SCHT algorithm and its computational performance is examined and compared to that of the HT and NCHT. The applications of the CS-SCHT in the ECG-based AF detection is also discussed. These fast CS-SCHT features are optimized using Hybrid Firefly and Particle Swarm Optimization (FFPSO to increase the performance of the classifier.

  20. HLA polymorphisms in Cabo Verde and Guiné-Bissau inferred from sequence-based typing.

    Science.gov (United States)

    Spínola, Hélder; Bruges-Armas, Jácome; Middleton, Derek; Brehm, António

    2005-10-01

    Human leukocyte antigen (HLA)-A, -B, and -DRB1 polymorphisms were examined in the Cabo Verde and Guiné-Bissau populations. The data were obtained at high-resolution level, using sequence-based typing. The most frequent alleles in each locus was: A*020101 (16.7% in Guiné-Bissau and 13.5% in Cabo Verde), B*350101 (14.4% in Guiné-Bissau and 13.2% in Cabo Verde), DRB1*1304 (19.6% in Guiné-Bissau), and DRB1*1101 (10.1% in Cabo Verde). The predominant three loci haplotype in Guiné-Bissau was A*2301-B*1503-DRB1*1101 (4.6%) and in Cabo Verde was A*3002-B*350101-DRB1*1001 (2.8%), exclusive to northwestern islands (5.6%) and absent in Guiné-Bissau. The present study corroborates historic sources and other genetic studies that say Cabo Verde were populated not only by Africans but also by Europeans. Haplotypes and dendrogram analysis shows a Caucasian genetic influence in today's gene pool of Cabo Verdeans. Haplotypes and allele frequencies present a differential distribution between southeastern and northwestern Cabo Verde islands, which could be the result of different genetic influences, founder effect, or bottlenecks. Dendrograms and principal coordinates analysis show that Guineans are more similar to North Africans than other HLA-studied sub-Saharans, probably from ancient and recent genetic contacts with other peoples, namely East Africans.

  1. Compact field programmable gate array-based pulse-sequencer and radio-frequency generator for experiments with trapped atoms

    Energy Technology Data Exchange (ETDEWEB)

    Pruttivarasin, Thaned, E-mail: thaned.pruttivarasin@riken.jp [Quantum Metrology Laboratory, RIKEN, Wako-shi, Saitama 351-0198 (Japan); Katori, Hidetoshi [Quantum Metrology Laboratory, RIKEN, Wako-shi, Saitama 351-0198 (Japan); Innovative Space-Time Project, ERATO, JST, Bunkyo-ku, Tokyo 113-8656 (Japan); Department of Applied Physics, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656 (Japan)

    2015-11-15

    We present a compact field-programmable gate array (FPGA) based pulse sequencer and radio-frequency (RF) generator suitable for experiments with cold trapped ions and atoms. The unit is capable of outputting a pulse sequence with at least 32 transistor-transistor logic (TTL) channels with a timing resolution of 40 ns and contains a built-in 100 MHz frequency counter for counting electrical pulses from a photo-multiplier tube. There are 16 independent direct-digital-synthesizers RF sources with fast (rise-time of ∼60 ns) amplitude switching and sub-mHz frequency tuning from 0 to 800 MHz.

  2. High Throughput Sample Preparation and Analysis for DNA Sequencing, PCR and Combinatorial Screening of Catalysis Based on Capillary Array Technique

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yonghua [Iowa State Univ., Ames, IA (United States)

    2000-01-01

    Sample preparation has been one of the major bottlenecks for many high throughput analyses. The purpose of this research was to develop new sample preparation and integration approach for DNA sequencing, PCR based DNA analysis and combinatorial screening of homogeneous catalysis based on multiplexed capillary electrophoresis with laser induced fluorescence or imaging UV absorption detection. The author first introduced a method to integrate the front-end tasks to DNA capillary-array sequencers. protocols for directly sequencing the plasmids from a single bacterial colony in fused-silica capillaries were developed. After the colony was picked, lysis was accomplished in situ in the plastic sample tube using either a thermocycler or heating block. Upon heating, the plasmids were released while chromsomal DNA and membrane proteins were denatured and precipitated to the bottom of the tube. After adding enzyme and Sanger reagents, the resulting solution was aspirated into the reaction capillaries by a syringe pump, and cycle sequencing was initiated. No deleterious effect upon the reaction efficiency, the on-line purification system, or the capillary electrophoresis separation was observed, even though the crude lysate was used as the template. Multiplexed on-line DNA sequencing data from 8 parallel channels allowed base calling up to 620 bp with an accuracy of 98%. The entire system can be automatically regenerated for repeated operation. For PCR based DNA analysis, they demonstrated that capillary electrophoresis with UV detection can be used for DNA analysis starting from clinical sample without purification. After PCR reaction using cheek cell, blood or HIV-1 gag DNA, the reaction mixtures was injected into the capillary either on-line or off-line by base stacking. The protocol was also applied to capillary array electrophoresis. The use of cheaper detection, and the elimination of purification of DNA sample before or after PCR reaction, will make this approach an

  3. RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

    Science.gov (United States)

    Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie

    2016-06-15

    Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by

  4. A model of human motor sequence learning explains facilitation and interference effects based on spike-timing dependent plasticity.

    Directory of Open Access Journals (Sweden)

    Quan Wang

    2017-08-01

    Full Text Available The ability to learn sequential behaviors is a fundamental property of our brains. Yet a long stream of studies including recent experiments investigating motor sequence learning in adult human subjects have produced a number of puzzling and seemingly contradictory results. In particular, when subjects have to learn multiple action sequences, learning is sometimes impaired by proactive and retroactive interference effects. In other situations, however, learning is accelerated as reflected in facilitation and transfer effects. At present it is unclear what the underlying neural mechanism are that give rise to these diverse findings. Here we show that a recently developed recurrent neural network model readily reproduces this diverse set of findings. The self-organizing recurrent neural network (SORN model is a network of recurrently connected threshold units that combines a simplified form of spike-timing dependent plasticity (STDP with homeostatic plasticity mechanisms ensuring network stability, namely intrinsic plasticity (IP and synaptic normalization (SN. When trained on sequence learning tasks modeled after recent experiments we find that it reproduces the full range of interference, facilitation, and transfer effects. We show how these effects are rooted in the network's changing internal representation of the different sequences across learning and how they depend on an interaction of training schedule and task similarity. Furthermore, since learning in the model is based on fundamental neuronal plasticity mechanisms, the model reveals how these plasticity mechanisms are ultimately responsible for the network's sequence learning abilities. In particular, we find that all three plasticity mechanisms are essential for the network to learn effective internal models of the different training sequences. This ability to form effective internal models is also the basis for the observed interference and facilitation effects. This suggests that

  5. Sequence-based comparative study of classical swine fever virus genogroup 2.2 isolate with pestivirus reference strains.

    Science.gov (United States)

    Kumar, Ravi; Rajak, Kaushal Kishor; Chandra, Tribhuwan; Muthuchelvan, Dhanavelu; Saxena, Arpit; Chaudhary, Dheeraj; Kumar, Ajay; Pandey, Awadh Bihari

    2015-09-01

    This study was undertaken with the aim to compare and establish the genetic relatedness between classical swine fever virus (CSFV) genogroup 2.2 isolate and pestivirus reference strains. The available complete genome sequences of CSFV/IND/UK/LAL-290 strain and other pestivirus reference strains were retrieved from GenBank. The complete genome sequence, complete open reading frame, 5' and 3' non-coding region (NCR) sequences were analyzed and compared with reference pestiviruses strains. Clustal W model in MegAlign program of Lasergene 6.0 software was used for analysis of genetic heterogeneity. Phylogenetic analysis was carried out using MEGA 6.06 software package. The complete genome sequence alignment of CSFV/IND/UK/LAL-290 isolate and reference pestivirus strains showed 58.9-72% identities at the nucleotide level and 50.3-76.9% at amino acid level. Sequence homology of 5' and 3' NCRs was found to be 64.1-82.3% and 22.9-71.4%, respectively. In phylogenetic analysis, overall tree topology was found similar irrespective of sequences used in this study; however, whole genome phylogeny of pestivirus formed two main clusters, which further distinguished into the monophyletic clade of each pestivirus species. CSFV/IND/UK/LAL-290 isolate placed with the CSFV Eystrup strain in the same clade with close proximity to border disease virus and Aydin strains. CSFV/IND/UK/LAL-290 exhibited the analogous genomic organization to those of all reference pestivirus strains. Based on sequence identity and phylogenetic analysis, the isolate showed close homology to Aydin/04-TR virus and distantly related to Bungowannah virus.

  6. Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.

    Science.gov (United States)

    Ragan, Chikako; Mowry, Bryan J; Bauer, Denis C

    2012-09-01

    Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.

  7. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  8. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    Science.gov (United States)

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. STING Millennium: a web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence

    Science.gov (United States)

    Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.

    2003-01-01

    STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333

  10. Identification of QTLs for 14 Agronomically Important Traits in Setaria italica Based on SNPs Generated from High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Kai Zhang

    2017-05-01

    Full Text Available Foxtail millet (Setaria italica is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding.

  11. Identification of QTLs for 14 Agronomically Important Traits in Setaria italica Based on SNPs Generated from High-Throughput Sequencing.

    Science.gov (United States)

    Zhang, Kai; Fan, Guangyu; Zhang, Xinxin; Zhao, Fang; Wei, Wei; Du, Guohua; Feng, Xiaolei; Wang, Xiaoming; Wang, Feng; Song, Guoliang; Zou, Hongfeng; Zhang, Xiaolei; Li, Shuangdong; Ni, Xuemei; Zhang, Gengyun; Zhao, Zhihai

    2017-05-05

    Foxtail millet ( Setaria italica ) is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs) and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding. Copyright © 2017 Zhang et al.

  12. Predicting Folding Sequences Based on the Maximum Rock Strength and Mechanical Equilibrium

    Science.gov (United States)

    Cubas, N.; Souloumiac, P.; Maillot, B.; Leroy, Y. M.

    2007-12-01

    The objective is to propose and validate simple procedures, compared to the finite-element method, to select and optimize the dominant mode of folding in fold-and-thrust belts and accretionary wedges, and to determine its stress distribution. Mechanical equilibrium as well as the constraints due to the limited rock strength of the bulk material and of major discontinuities, such as décollements, are accounted for. The first part of the proposed procedure, which is at the core of the external approach of classical limit analysis, consists in estimating the least upper bound on the tectonic force by minimisation of the internal dissipation and part of the external work. The new twist to the method is that the optimization is also done with respect to the geometry of the evolving fold. If several folding events are possible, the dominant mode is the one leading to the least upper bound. The second part of the procedure is based on the Equilibrium Element Method, which is an application of the internal approach of limit analysis. The optimum stress field, obtained by spatial discretisation of the fold, provides the best lower bound on the tectonic force. The difference between the two bounds defines an error estimate of the exact unknown tectonic force. To show the merits of the proposed procedure, its first part is applied to predict the life span of a thrust within an accretionary prism, from its onset, its development with a relief build up and its arrest because of the onset of a more favorable new thrust (Cubas et al., 2007). This life span is sensitive to the friction angles over the ramp and the décollement. It is shown how the normal sequence of thrusting in a supercritical wedge is ended with the first out-of sequence event. The second part of the procedure provides the stress state over each thrust showing that the active back thrust is a narrow fan which dip is sensitive to the friction angle over the ramp and the amount of relief build up (Souloumiac et

  13. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

    Directory of Open Access Journals (Sweden)

    Xin Yi Ng

    2015-01-01

    Full Text Available This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM- LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity.

  14. Phylogenetic relationships in Demodex mites (Acari: Demodicidae) based on mitochondrial 16S rDNA partial sequences.

    Science.gov (United States)

    Zhao, Ya-E; Wu, Li-Ping

    2012-09-01

    To confirm phylogenetic relationships in Demodex mites based on mitochondrial 16S rDNA partial sequences, mtDNA 16S partial sequences of ten isolates of three Demodex species from China were amplified, recombined, and sequenced and then analyzed with two Demodex folliculorum isolates from Spain. Lastly, genetic distance was computed, and phylogenetic tree was reconstructed. MEGA 4.0 analysis showed high sequence identity among 16S rDNA partial sequences of three Demodex species, which were 95.85 % in D. folliculorum, 98.53 % in Demodex canis, and 99.71 % in Demodex brevis. The divergence, genetic distance, and transition/transversions of the three Demodex species reached interspecies level, whereas there was no significant difference of the divergence (1.1 %), genetic distance (0.011), and transition/transversions (3/1) of the two geographic D. folliculorum isolates (Spain and China). Phylogenetic trees reveal that the three Demodex species formed three separate branches of one clade, where D. folliculorum and D. canis gathered first, and then gathered with D. brevis. The two Spain and five China D. folliculorum isolates did not form sister clades. In conclusion, 16S mtDNA are suitable for phylogenetic relationship analysis in low taxa (genus or species), but not for intraspecies determination of Demodex. The differentiation among the three Demodex species has reached interspecies level.

  15. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps

    Science.gov (United States)

    Drory Retwitzer, Matan; Polishchuk, Maya; Churkin, Elena; Kifer, Ilona; Yakhini, Zohar; Barash, Danny

    2015-01-01

    Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular—no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site. PMID:25940619

  16. A Quantitative Tool to Distinguish Isobaric Leucine and Isoleucine Residues for Mass Spectrometry-Based De Novo Monoclonal Antibody Sequencing

    Science.gov (United States)

    Poston, Chloe N.; Higgs, Richard E.; You, Jinsam; Gelfanova, Valentina; Hale, John E.; Knierman, Michael D.; Siegel, Robert; Gutierrez, Jesus A.

    2014-07-01

    De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.

  17. In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment

    Directory of Open Access Journals (Sweden)

    Chitale Meghana

    2013-02-01

    Full Text Available Abstract Background Many Automatic Function Prediction (AFP methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG meeting at the Intelligent Systems in Molecular Biology (ISMB conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA. Results We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST. Conclusion The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.

  18. dictyExpress: a web-based platform for sequence data management and analytics in Dictyostelium and beyond.

    Science.gov (United States)

    Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz

    2017-06-02

    Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open

  19. A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

    Science.gov (United States)

    van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y

    2018-04-17

    Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to

  20. Unexpected DNA affinity and sequence selectivity through core rigidity in guanidinium-based minor groove binders.

    Science.gov (United States)

    Nagle, Padraic S; McKeever, Caitriona; Rodriguez, Fernando; Nguyen, Binh; Wilson, W David; Rozas, Isabel

    2014-09-25

    In this paper we report the design and biophysical evaluation of novel rigid-core symmetric and asymmetric dicationic DNA binders containing 9H-fluorene and 9,10-dihydroanthracene cores as well as the synthesis of one of these fluorene derivatives. First, the affinity toward particular DNA sequences of these compounds and flexible core derivatives was evaluated by means of surface plasmon resonance and thermal denaturation experiments finding that the position of the cations significantly influence the binding strength. Then their affinity and mode of binding were further studied by performing circular dichroism and UV studies and the results obtained were rationalized by means of DFT calculations. We found that the fluorene derivatives prepared have the ability to bind to the minor groove of certain DNA sequences and intercalate to others, whereas the dihydroanthracene compounds bind via intercalation to all the DNA sequences studied here.

  1. Antibody-based screening for hereditary nonpolyposis colorectal carcinoma compared with microsatellite analysis and sequencing

    DEFF Research Database (Denmark)

    Christensen, Mariann; Katballe, Niels; Wikman, Friedrik

    2002-01-01

    BACKGROUND: Germline mutations in the DNA mismatch repair genes, MSH2, MLH1, and others are associated with hereditary nonpolyposis colorectal cancer (HNPCC). Due to the high costs of sequencing, cheaper screening methods are needed to identify HNPCC cases. Ideally, these methods should have a high...... carcinoma of whom 11 met the Amsterdam criteria and 31 were suspected to belong to HNPCC families. Thirty-five patients were examined by microsatellite analysis, 40 by immunohistochemical staining, and in 31 patients both the MLH1 and MSH2 genes were sequenced. RESULTS: Ninety-two percent of patients...... the three methods was found in 74 % of the tumors. CONCLUSIONS: The authors suggest that immunohistochemistry should be used in combination with microsatellite analysis to prescreen suspected HNPCC patients for the selection of cases where sequencing of the MLH1 and MSH2 mismatch repair genes is indicated....

  2. Towards the development of a DNA-sequence based approach to serotyping of Salmonella enterica

    Directory of Open Access Journals (Sweden)

    Logan Julie MJ

    2004-08-01

    Full Text Available Abstract Background The fliC and fljB genes in Salmonella code for the phase 1 (H1 and phase 2 (H2 flagellin respectively, the rfb cluster encodes the majority of enzymes for polysaccharide (O antigen biosynthesis, together they determine the antigenic profile by which Salmonella are identified. Sequencing and characterisation of fliC was performed in the development of a molecular serotyping technique. Results FliC sequencing of 106 strains revealed two groups; the g-complex included those exhibiting "g" or "m,t" antigenic factors, and the non-g strains which formed a second more diverse group. Variation in fliC was characterised and sero-specific motifs identified. Furthermore, it was possible to identify differences in certain H antigens that are not detected by traditional serotyping. A rapid short sequencing assay was developed to target serotype-specific sequence motifs in fliC. The assay was evaluated for identification of H1 antigens with a panel of 55 strains. Conclusion FliC sequences were obtained for more than 100 strains comprising 29 different H1 alleles. Unique pyrosequencing profiles corresponding to the H1 component of the serotype were generated reproducibly for the 23 alleles represented in the evaluation panel. Short read sequence assays can now be used to identify fliC alleles in approximately 97% of the 50 medically most important Salmonella in England and Wales. Capability for high throughput testing and automation give these assays considerable advantages over traditional methods.

  3. A likelihood ratio test for species membership based on DNA sequence data

    DEFF Research Database (Denmark)

    Matz, Mikhail V.; Nielsen, Rasmus

    2005-01-01

    DNA barcoding as an approach for species identification is rapidly increasing in popularity. However, it remains unclear which statistical procedures should accompany the technique to provide a measure of uncertainty. Here we describe a likelihood ratio test which can be used to test if a sampled...... sequence is a member of an a priori specified species. We investigate the performance of the test using coalescence simulations, as well as using the real data from butterflies and frogs representing two kinds of challenge for DNA barcoding: extremely low and extremely high levels of sequence variability....

  4. The Cenozoic geological evolution of the Central and Northern North Sea based on seismic sequence stratigraphy

    Energy Technology Data Exchange (ETDEWEB)

    Jordt, Henrik

    1996-03-01

    This thesis represents scientific results from seismic sequence stratigraphic investigations. These investigations and results are integrated into an ongoing mineralogical study of the Cenozoic deposits. the main results from this mineralogical study are presented and discussed. The seismic investigations have provided boundary conditions for a forward modelling study of the Cenozoic depositional history. Results from the forward modelling are presented as they emphasise the influence of tectonics on sequence development. The tectonic motions described were important for the formation of the large oil and gas fields in the North Sea.

  5. Outline of a genome navigation system based on the properties of GA-sequences and their flanks.

    Directory of Open Access Journals (Sweden)

    Guenter Albrecht-Buehler

    protein synthesis based on the shared segments of different GA-sequences.

  6. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  7. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  8. Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

    Science.gov (United States)

    Raghav, Sunil Kumar; Deplancke, Bart

    2012-01-01

    Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.

  9. Molecular identification based on ITS sequences for Kappaphycus and Eucheuma cultivated in China

    Science.gov (United States)

    Zhao, Sufen; He, Peimin

    2011-11-01

    The systematic classification of the Eucheumatoideae is difficult because of their variable morphology and interpretation of reproductive structures. Kappaphycus and Eucheuma specimens cultivated on the Hainan and Fujian coast of China were introduced from Vietnam, the Philippines and Indonesia. Combined with morphological characteristics, all Kappaphycus and Eucheuma cultivated strains were identified by internal transcribed spacer (ITS) sequences. The phylogenetic tree was constructed using neighbor-joining and maximum likelihood methods. The results indicate that different ITS sequence lengths occurred in the different genera and species. An obvious difference in morphology could be found in the protuberance shape between Kappaphycus and Eucheuma. The protuberance in Eucheuma was thorn-like and in Kappaphycus was wartlike or papillate. Their ITS sequence lengths differed significantly in nucleotide variation rates up to 58.55%-63.90%. All nucleotide variations occurred in the ITS1 and ITS2 regions except for five nucleotide transversions in the 5.8S rDNA region. In addition, the difference was at the branches among congeneric species. Kappaphycus sp. had branches with small buds, while K. alvarezii did not have such a feature. The nucleotide variation rates varied from 7.02% to 7.48% among species; within the same species of the clades it was K. alvarezii, Kappaphycus sp., and E. denticulatum. The results indicate that ITS sequence analysis was an effective way for identification of interspecies and intraspecies phylogenetic relationships and might provide a clue for molecular identification of algal Eucheumatoideae.

  10. Citrus plastid-related gene profiling based on expressed sequence tag analyses

    Directory of Open Access Journals (Sweden)

    Tercilio Calsa Jr.

    2007-01-01

    Full Text Available Plastid-related sequences, derived from putative nuclear or plastome genes, were searched in a large collection of expressed sequence tags (ESTs and genomic sequences from the Citrus Biotechnology initiative in Brazil. The identified putative Citrus chloroplast gene sequences were compared to those from Arabidopsis, Eucalyptus and Pinus. Differential expression profiling for plastid-directed nuclear-encoded proteins and photosynthesis-related gene expression variation between Citrus sinensis and Citrus reticulata, when inoculated or not with Xylella fastidiosa, were also analyzed. Presumed Citrus plastome regions were more similar to Eucalyptus. Some putative genes appeared to be preferentially expressed in vegetative tissues (leaves and bark or in reproductive organs (flowers and fruits. Genes preferentially expressed in fruit and flower may be associated with hypothetical physiological functions. Expression pattern clustering analysis suggested that photosynthesis- and carbon fixation-related genes appeared to be up- or down-regulated in a resistant or susceptible Citrus species after Xylella inoculation in comparison to non-infected controls, generating novel information which may be helpful to develop novel genetic manipulation strategies to control Citrus variegated chlorosis (CVC.

  11. A multifunction editor for programming control sequences for a robot based radiopharmaceutical synthesis system

    International Nuclear Information System (INIS)

    Appelquist, G.; Bohm, C.

    1990-01-01

    A Multifunction Editor is a development tool for building control sequences for a robotized production system for positron emitting radiopharmaceuticals. This system consists of SCARA robot and a PC-AT personal computer as a controller together with general and synthesis specific chemistry equipment. The general equipment, which is common for many synthesis, is fixed to the wall of the hotcell, while the specific equipment, dedicated to the given synthesis, is located on a removable tray. The program recognizes commands to move the robot, to control valves and to control the computer screen. From within the editor it is possible to run the control sequence forward or backward to test it and to use the single step feature to debug. The editor commands include insert, replace and delete of commands in the sequence. When programming or editing robot movements the robot may be controlled by the mouse, from the keyboard or from a remote control box. The robot control sequence consists of a succession of stored robot positions. The screen control is used to display dynamic flowchart diagrams. This is achieved by displaying a modified picture on the screen whenever the system state has been changed significantly

  12. Performance of Correspondence Algorithms in Vision-Based Driver Assistance Using an Online Image Sequence Database

    DEFF Research Database (Denmark)

    Klette, Reinhard; Krüger, Norbert; Vaudrey, Tobi

    2011-01-01

    the classification of recorded video data into situations defined by a cooccurrence of some events in recorded traffic scenes. About 100-400 stereo frames (or 4-16 s of recording) are considered a basic sequence, which will be identified with one particular situation. Future testing is expected to be on data...

  13. Insights into phylogeny, sex function and age of Fragaria based on whole chloroplast genome sequencing

    Science.gov (United States)

    Wambui Njunguna; Aaron Liston; Richard Cronn; Tia-Lynn Ashman; Nahla Bassil

    2013-01-01

    The cultivated strawberry is one of the youngest domesticated plants, developed in France in the 1700s from chance hybridization between two western hemisphere octoploid species. However, little is known about the evolution of the species that gave rise to this important fruit crop. Phylogenetic analysis of chloroplast genome sequences of 21 Fragaria...

  14. Unveiling distribution patterns of freshwater phytoplankton by a next generation sequencing based approach

    Czech Academy of Sciences Publication Activity Database

    Eiler, A.; Drakare, S.; Bertilsson, S.; Pernthaler, J.; Peura, S.; Rofner, C.; Šimek, Karel; Yang, Y.; Znachor, Petr; Lindström, E.S.

    2013-01-01

    Roč. 8, č. 1 (2013), e53516 E-ISSN 1932-6203 R&D Projects: GA ČR(CZ) GA206/08/0015 Institutional support: RVO:60077344 Keywords : phytoplankton * next generation sequencing * diversity Subject RIV: EE - Microbiology, Virology Impact factor: 3.534, year: 2013

  15. Multicenter validation of cancer gene panel-based next-generation sequencing for translational research and molecular diagnostics.

    Science.gov (United States)

    Hirsch, B; Endris, V; Lassmann, S; Weichert, W; Pfarr, N; Schirmacher, P; Kovaleva, V; Werner, M; Bonzheim, I; Fend, F; Sperveslage, J; Kaulich, K; Zacher, A; Reifenberger, G; Köhrer, K; Stepanow, S; Lerke, S; Mayr, T; Aust, D E; Baretton, G; Weidner, S; Jung, A; Kirchner, T; Hansmann, M L; Burbat, L; von der Wall, E; Dietel, M; Hummel, M

    2018-04-01

    The simultaneous detection of multiple somatic mutations in the context of molecular diagnostics of cancer is frequently performed by means of amplicon-based targeted next-generation sequencing (NGS). However, only few studies are available comparing multicenter testing of different NGS platforms and gene panels. Therefore, seven partner sites of the German Cancer Consortium (DKTK) performed a multicenter interlaboratory trial for targeted NGS using the same formalin-fixed, paraffin-embedded (FFPE) specimen of molecularly pre-characterized tumors (n = 15; each n = 5 cases of Breast, Lung, and Colon carcinoma) and a colorectal cancer cell line DNA dilution series. Detailed information regarding pre-characterized mutations was not disclosed to the partners. Commercially available and custom-designed cancer gene panels were used for library preparation and subsequent sequencing on several devices of two NGS different platforms. For every case, centrally extracted DNA and FFPE tissue sections for local processing were delivered to each partner site to be sequenced with the commercial gene panel and local bioinformatics. For cancer-specific panel-based sequencing, only centrally extracted DNA was analyzed at seven sequencing sites. Subsequently, local data were compiled and bioinformatics was performed centrally. We were able to demonstrate that all pre-characterized mutations were re-identified correctly, irrespective of NGS platform or gene panel used. However, locally processed FFPE tissue sections disclosed that the DNA extraction method can affect the detection of mutations with a trend in favor of magnetic bead-based DNA extraction methods. In conclusion, targeted NGS is a very robust method for simultaneous detection of various mutations in FFPE tissue specimens if certain pre-analytical conditions are carefully considered.

  16. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

    Science.gov (United States)

    Nishizawa, M; Nishizawa, K

    2000-10-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.

  17. Molecular characterization of Fasciola spp. from the endemic area of northern Iran based on nuclear ribosomal DNA sequences.

    Science.gov (United States)

    Amor, Nabil; Halajian, Ali; Farjallah, Sarra; Merella, Paolo; Said, Khaled; Ben Slimane, Badreddine

    2011-07-01

    Fasciolosis caused by Fasciola spp. (Platyhelminthes: Trematoda: Digenea) is considered as the most important helminth infection of ruminants in tropical countries, causing considerable socioeconomic problems. In the endemic regions of the North of Iran, Fasciola hepatica and Fasciola gigantica have been previously characterized on the basis of morphometric differences, but the use of molecular markers is necessary to distinguish exactly between species and intermediate forms. Samples from buffaloes and goats from different localities of northern Iran were identified morphologically and then genetically characterized by sequences of the first (ITS-1) and second (ITS-2) Internal Transcribed Spacers (ITS) of nuclear ribosomal DNA (rDNA). Comparison of the ITS of the northern Iranian samples with sequences of Fasciola spp. from GenBank showed that the examined specimens had sequences identical to those of the most frequent haplotypes of F. hepatica (n=25, 48.1%) and F. gigantica (n=20, 38.45%), which differed from each other in different variable nucleotide positions of ITS region sequences, and their intermediate forms (n=7, 13.45%), which had nucleotides overlapped between the two Fasciola species in all the positions. The ITS sequences from populations of Fasciola isolates in buffaloes and goats had experienced introgression/hybridization as previously reported in isolates from other ruminants and humans. Based on ITS-1 and ITS-2 sequences, flukes are scattered in pure F. hepatica, F. gigantica and intermediate Fasciola clades, revealing that multiple genotypes of Fasciola are able to infect goats and buffaloes in North of Iran. Furthermore, the phylogenetic trees based upon the ITS-1 and ITS-2 sequences showed a close relationship of the Iranian samples with isolates of F. hepatica and F. gigantica from different localities of Africa and Asia. In the present study, the intergenic transcribed spacers ITS-1 and ITS-2 showed to be reliable approaches for the genetic

  18. DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing.

    Science.gov (United States)

    Vidaki, Athina; Ballard, David; Aliferi, Anastasia; Miller, Thomas H; Barron, Leon P; Syndercombe Court, Denise

    2017-05-01

    generation sequencing (NGS)-based method able to quantify the methylation status of the selected 16 CpG sites was developed using the Illumina MiSeq ® platform. The method was validated using DNA standards of known methylation levels and the age prediction accuracy has been initially assessed in a set of 46 whole blood samples. Although the resulted prediction accuracy using the NGS data was lower compared to the original model (MAE=7.5years), it is expected that future optimization of our strategy to account for technical variation as well as increasing the sample size will improve both the prediction accuracy and reproducibility. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  19. Unraveling systematic inventory of Echinops (Asteraceae) with special reference to nrDNA ITS sequence-based molecular typing of Echinops abuzinadianus.

    Science.gov (United States)

    Ali, M A; Al-Hemaid, F M; Lee, J; Hatamleh, A A; Gyulai, G; Rahman, M O

    2015-10-02

    The present study explored the systematic inventory of Echinops L. (Asteraceae) of Saudi Arabia, with special reference to the molecular typing of Echinops abuzinadianus Chaudhary, an endemic species to Saudi Arabia, based on the internal transcribed spacer (ITS) sequences (ITS1-5.8S-ITS2) of nuclear ribosomal DNA. A sequence similarity search using BLAST and a phylogenetic analysis of the ITS sequence of E. abuzinadianus revealed a high level of sequence similarity with E. glaberrimus DC. (section Ritropsis). The novel primary sequence and the secondary structure of ITS2 of E. abuzinadianus could potentially be used for molecular genotyping.

  20. BAC-end sequence-based SNPs and Bin mapping for rapid integration of physical and genetic maps in apple.

    Science.gov (United States)

    Han, Yuepeng; Chagné, David; Gasic, Ksenija; Rikkerink, Erik H A; Beever, Jonathan E; Gardiner, Susan E; Korban, Schuyler S

    2009-03-01

    A genome-wide BAC physical map of the apple, Malus x domestica Borkh., has been recently developed. Here, we report on integrating the physical and genetic maps of the apple using a SNP-based approach in conjunction with bin mapping. Briefly, BAC clones located at ends of BAC contigs were selected, and sequenced at both ends. The BAC end sequences (BESs) were used to identify candidate SNPs. Subsequently, these candidate SNPs were genetically mapped using a bin mapping strategy for the purpose of mapping the physical onto the genetic map. Using this approach, 52 (23%) out of 228 BESs tested were successfully exploited to develop SNPs. These SNPs anchored 51 contigs, spanning approximately 37 Mb in cumulative physical length, onto 14 linkage groups. The reliability of the integration of the physical and genetic maps using this SNP-based strategy is described, and the results confirm the feasibility of this approach to construct an integrated physical and genetic maps for apple.

  1. Molecular systematics of Indian Alysicarpus (Fabaceae) based on analyses of nuclear ribosomal DNA sequences.

    Science.gov (United States)

    Gholami, Akram; Subramaniam, Shweta; Geeta, R; Pandey, Arun K

    2017-06-01

    Alysicarpus Necker ex Desvaux (Fabaceae, Desmodieae) consists of ~30 species that are distributed in tropical and subtropical regions of theworld. In India, the genus is represented by ca. 18 species, ofwhich seven are endemic. Sequences of the nuclear Internal transcribed spacer from38 accessions representing 16 Indian specieswere subjected to phylogenetic analyses. The ITS sequence data strongly support the monophyly of the genus Alysicarpus. Analyses revealed four major well-supported clades within Alysicarpus. Ancestral state reconstructions were done for two morphological characters, namely calyx length in relation to pod (macrocalyx and microcalyx) and pod surface ornamentation (transversely rugose and nonrugose). The present study is the first report on molecular systematics of Indian Alysicarpus.

  2. One Terminal Digital Algorithm for Adaptive Single Pole Auto-Reclosing Based on Zero Sequence Voltage

    Directory of Open Access Journals (Sweden)

    S. Jamali

    2008-10-01

    Full Text Available This paper presents an algorithm for adaptive determination of the dead timeduring transient arcing faults and blocking automatic reclosing during permanent faults onoverhead transmission lines. The discrimination between transient and permanent faults ismade by the zero sequence voltage measured at the relay point. If the fault is recognised asan arcing one, then the third harmonic of the zero sequence voltage is used to evaluate theextinction time of the secondary arc and to initiate reclosing signal. The significantadvantage of this algorithm is that it uses an adaptive threshold level and therefore itsperformance is independent of fault location, line parameters and the system operatingconditions. The proposed algorithm has been successfully tested under a variety of faultlocations and load angles on a 400KV overhead line using Electro-Magnetic TransientProgram (EMTP. The test results validate the algorithm ability in determining thesecondary arc extinction time during transient faults as well as blocking unsuccessfulautomatic reclosing during permanent faults.

  3. Photobiont diversity in lichens from metal-rich substrata based on ITS rDNA sequences.

    Science.gov (United States)

    Backor, Martin; Peksa, Ondrej; Skaloud, Pavel; Backorová, Miriam

    2010-05-01

    The photobiont is considered as the more sensitive partner of lichen symbiosis in metal pollution. For this reason the presence of a metal tolerant photobiont in lichens may be a key factor of ecological success of lichens growing on metal polluted substrata. The photobiont inventory was examined for terricolous lichen community growing in Cu mine-spoil heaps derived by historical mining. Sequences of internal transcribed spacer (ITS) were phylogenetically analyzed using maximum likelihood analyses. A total of 50 ITS algal sequences were obtained from 22 selected lichen taxa collected at three Cu mine-spoil heaps and two control localities. Algae associated with Cladonia and Stereocaulon were identified as members of several Asterochloris lineages, photobionts of cetrarioid lichens clustered with Trebouxia hypogymniae ined. We did not find close relationship between heavy metal content (in localities as well as lichen thalli) and photobiont diversity. Presence of multiple algal genotypes in single lichen thallus has been confirmed. Copyright 2009 Elsevier Inc. All rights reserved.

  4. Population structure of Lactobacillus helveticus isolates from naturally fermented dairy products based on multilocus sequence typing.

    Science.gov (United States)

    Sun, Zhihong; Liu, Wenjun; Song, Yuqin; Xu, Haiyan; Yu, Jie; Bilige, Menghe; Zhang, Heping; Chen, Yongfu

    2015-05-01

    Lactobacillus helveticus is an economically important lactic acid bacterium used in industrial dairy fermentation. In the present study, the population structure of 245 isolates of L. helveticus from different naturally fermented dairy products in China and Mongolia were investigated using an multilocus sequence typing scheme with 11 housekeeping genes. A total of 108 sequence types were detected, which formed 8 clonal complexes and 27 singletons. Results from Structure, SplitsTree, and ClonalFrame software analyses demonstrated the presence of 3 subpopulations in the L. helveticus isolates used in our study, namely koumiss, kurut-tarag, and panmictic lineages. Most L. helveticus isolates from particular ecological origins had specific population structures. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  5. [Study on method of tracking the active cells in image sequences based on EKF-PF].

    Science.gov (United States)

    Tang, Chunming; Liu, Ying

    2013-02-01

    In cell image sequences, due to the nonlinear and nonGaussian motion characteristics of active cells, the accurate prediction and tracking is still an unsolved problem. We applied extended Kalman particle filter (EKF-PF) here in our study, attempting to solve the problem. Firstly we confirmed the existence and positions of the active cells. Then we established a motion model and improved it via adding motion angle estimation. Next we predicted motion parameters, such as displacement, velocity, accelerated velocity and motion angle, in region centers of the cells being tracked. Finally we obtained the motion traces of active cells. There were fourteen active cells in three image sequences which have been tracked. The errors were less than 2.5 pixels when the prediction values were compared with actual values. It showed that the presented algorithm may basically reach the solution of accurate predition and tracking of the active cells.

  6. Deep sequencing-based transcriptome analysis of Plutella xylostella larvae parasitized by Diadegma semiclausum

    Science.gov (United States)

    2011-01-01

    Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the

  7. Complete genome sequence of cyanobacterium Nostoc sp. NIES-3756, a potentially useful strain for phytochrome-based bioengineering.

    Science.gov (United States)

    Hirose, Yuu; Fujisawa, Takatomo; Ohtsubo, Yoshiyuki; Katayama, Mitsunori; Misawa, Naomi; Wakazuki, Sachiko; Shimura, Yohei; Nakamura, Yasukazu; Kawachi, Masanobu; Yoshikawa, Hirofumi; Eki, Toshihiko; Kanesaki, Yu

    2016-01-20

    To explore the diverse photoreceptors of cyanobacteria, we isolated Nostoc sp. strain NIES-3756 from soil at Mimomi-Park, Chiba, Japan, and determined its complete genome sequence. The Genome consists of one chromosome and two plasmids (total 6,987,571 bp containing no gaps). The NIES-3756 strain carries 7 phytochrome and 12 cyanobacteriochrome genes, which will facilitate the studies of phytochrome-based bioengineering. Copyright © 2015. Published by Elsevier B.V.

  8. Application of PCR-based DNA sequencing technique for the detection of Leptospira in peripheral blood of septicemia patients

    OpenAIRE

    Ram, S.; Vimalin, J.M.; Jambulingam, M.; Tiru, V.; Gopalakrishnan, R.K.; Madhavan, H.N.

    2012-01-01

    Aim: Isolation, dark field detection and microscopic agglutination test (MAT) are considered ―gold standard‖ tests for diagnosis of Leptospirosis. Several PCR assays are reported but very few have been evaluated for detection of Leptospirosis. Therefore, this study was undertaken. This study aims to design and standardize polymerase chain reaction (PCR) - based DNA sequencing technique for the detection of pathogenic Leptospira from peripheral blood of patients clinically diagnosed with septi...

  9. Polyvinyl-alcohol-based magnetic beads for rapid and efficient separation of specific or unspecific nucleic acid sequences

    International Nuclear Information System (INIS)

    Oster, J.; Parker, Jeffrey; Brassard, Lothar

    2001-01-01

    The versatile application of polyvinyl-alcohol-based magnetic M-PVA beads is demonstrated in the separation of genomic DNA, sequence specific nucleic acid purification, and binding of bacteria for subsequent DNA extraction and detection. It is shown that nucleic acids can be obtained in high yield and purity using M-PVA beads, making sample preparation efficient, fast and highly adaptable for automation processes

  10. Repair of oxidative DNA base damage in the host genome influences the HIV integration site sequence preference.

    Directory of Open Access Journals (Sweden)

    Geoffrey R Bennett

    Full Text Available Host base excision repair (BER proteins that repair oxidative damage enhance HIV infection. These proteins include the oxidative DNA damage glycosylases 8-oxo-guanine DNA glycosylase (OGG1 and mutY homolog (MYH as well as DNA polymerase beta (Polβ. While deletion of oxidative BER genes leads to decreased HIV infection and integration efficiency, the mechanism remains unknown. One hypothesis is that BER proteins repair the DNA gapped integration intermediate. An alternative hypothesis considers that the most common oxidative DNA base damages occur on guanines. The subtle consensus sequence preference at HIV integration sites includes multiple G:C base pairs surrounding the points of joining. These observations suggest a role for oxidative BER during integration targeting at the nucleotide level. We examined the hypothesis that BER repairs a gapped integration intermediate by measuring HIV infection efficiency in Polβ null cell lines complemented with active site point mutants of Polβ. A DNA synthesis defective mutant, but not a 5'dRP lyase mutant, rescued HIV infection efficiency to wild type levels; this suggested Polβ DNA synthesis activity is not necessary while 5'dRP lyase activity is required for efficient HIV infection. An alternate hypothesis that BER events in the host genome influence HIV integration site selection was examined by sequencing integration sites in OGG1 and MYH null cells. In the absence of these 8-oxo-guanine specific glycosylases the chromatin elements of HIV integration site selection remain the same as in wild type cells. However, the HIV integration site sequence preference at G:C base pairs is altered at several positions in OGG1 and MYH null cells. Inefficient HIV infection in the absence of oxidative BER proteins does not appear related to repair of the gapped integration intermediate; instead oxidative damage repair may participate in HIV integration site preference at the sequence level.

  11. Molecular identification based on coat protein sequences of the Barley yellow dwarf virus from Brazil

    Directory of Open Access Journals (Sweden)

    Talita Bernardon Mar

    2013-12-01

    Full Text Available Yellow dwarf disease, one of the most important diseases of cereal crops worldwide, is caused by virus species belonging to the Luteoviridae family. Forty-two virus isolates obtained from oat (Avena sativa L., wheat (Triticum aestivum L., barley (Hordeum vulgare L., corn (Zea mays L., and ryegrass (Lolium multiflorum Lam. collected between 2007 and 2008 from winter cereal crop regions in southern Brazil were screened by polymerase chain reaction (PCR with primers designed on ORF 3 (coat protein - CP for the presence of Barley yellow dwarf virus and Cereal yellow dwarf virus (B/CYDV. PCR products of expected size (~357 bp for subgroup II and (~831 bp for subgroup I were obtained for three and 39 samples, respectively. These products were cloned and sequenced. The subgroup II 3' partial CP amino acid deduced sequences were identified as BYDV-RMV (92 - 93 % of identity with "Illinois" Z14123 isolate. The complete CP amino acid deduced sequences of subgroup I isolates were confirmed as BYDV-PAV (94 - 99 % of identity and established a very homogeneous group (identity higher than 99 %. These results support the prevalence of BYDV-PAV in southern Brazil as previously diagnosed by Enzyme-Linked Immunosorbent Assay (ELISA and suggest that this population is very homogeneous. To our knowledge, this is the first report of BYDV-RMV in Brazil and the first genetic diversity study on B/CYDV in South America.

  12. [EST-SSR identification, markers development of Ligusticum chuanxiong based on Ligusticum chuanxiong transcriptome sequences].

    Science.gov (United States)

    Yuan, Can; Peng, Fang; Yang, Ze-Mao; Zhong, Wen-Juan; Mou, Fang-Sheng; Gong, Yi-Yun; Ji, Pei-Cheng; Pu, De-Qiang; Huang, Hai-Yan; Yang, Xiao; Zhang, Chao

    2017-09-01

    Ligusticum chuanxiong is a well-known traditional Chinese medicine plant. The study on its molecular markers development and germplasm resources is very important. In this study, we obtained 24 422 unigenes by assembling transcriptome sequencing reads of L. chuanxiong root. EST-SSR was detected and 4 073 SSR loci were identified. EST-SSR distribution and characteristic analysis results showed that the mono-nucleotide repeats were the main repeat types, accounting for 41.0%. In addition, the sequences containing SSR were functionally annotated in Gene Ontology (GO) and KEGG pathway and were assigned to 49 GO categories, 242 KEGG pathways, among them 2 201 sequences were annotated against Nr database. By validating 235 EST-SSRs,74 primer pairs were ultimately proved to have high quality amplification. Subsequently, genetic diversity analysis, UPGMA cluster analysis, PCoA analysis and population structure analysis of 34 L. chuanxiong germplasm resources were carried out with 74 primer pairs. In both UPGMA tree and PCoA results, L. chuanxiong resources were clustered into two groups, which are believed to be partial related to their geographical distribution. In this study, EST-SSRs in L. chuanxiong was firstly identified, and newly developed molecular markers would contribute significantly to further genetic diversity study, the purity detection, gene mapping, and molecular breeding. Copyright© by the Chinese Pharmaceutical Association.

  13. Molecular Identification of Ancylostoma caninum Isolated from Cats in Southern China Based on Complete ITS Sequence

    Directory of Open Access Journals (Sweden)

    Yuanjia Liu

    2013-01-01

    Full Text Available Ancylostoma caninum is a blood-feeding parasitic intestinal nematode which infects dogs, cats, and other mammals throughout the world. A highly sensitive and species-specific PCR-RFLP technique was utilised to detect the prevalence of A. caninum in cats in Guangzhou, southern China. Of the 102 fecal samples examined, the prevalence of A. caninum in cats was 95.1% and 83.3% using PCR-RFLP and microscopy, respectively. Among them, the prevalence of single hookworm infection with A. caninum was 54.90%, while mixed infections with both A. caninum and A. ceylanicum were 40.20%. Comparative analysis of three complete ITS sequences obtained from cat-derived A. caninum showed the same length (738 bp as that of dog-derived A. caninum. However, the sequence variation range was 98.6%–100%, where only one cat isolate (M63 showed 100% sequence similarity in comparison with two dog-derived A. caninum isolates (AM850106, EU159416 in the same studied area. The phylogenetic tree revealed A. caninum derived from both cats and dogs in single cluster. Results suggest that cats could be the main host of A. caninum in China, which may cause cross-infection between dogs and cats in the same area.

  14. VisRseq: R-based visual framework for analysis of sequencing data.

    Science.gov (United States)

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven J M

    2015-01-01

    Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.

  15. Sequence based polymorphic (SBP marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome

    Directory of Open Access Journals (Sweden)

    Sahu Binod B

    2012-01-01

    Full Text Available Abstract Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs for any genomic regions. Here a sequence based polymorphic (SBP marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0 was obtained by applying Illumina's sequencing by synthesis (Solexa technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0 genome sequence identified putative single nucleotide polymorphisms (SNPs throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for

  16. Sequence- and interactome-based prediction of viral protein hotspots targeting host proteins: a case study for HIV Nef.

    Directory of Open Access Journals (Sweden)

    Mahdi Sarmady

    Full Text Available Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk.

  17. Imaging of cranial nerves with three-dimensional high resolution diffusion-weighted MR sequence based on SSFP technique

    International Nuclear Information System (INIS)

    Zhang Zhongwei; Chen Yingming; Meng Quanfei

    2008-01-01

    Objective: To depict the normal anatomy of cranial nerves in detail and define the exact relationships between cranial nerves and adjacent structures with three-dimensional high resolution diffusion-weighted MR sequence based on SSFP technique (3D DW-SSFP). Methods: 3D DW- SSFP sequence was performed and axial images were obtained in 12 healthy volunteers Post-processing techniques were used to generate images of cranial nerves, and the images acquired were compared with anatomical sections and diagrams of textbook. Results: In all subjects, 3D DW-SSFP sequence could produce homogeneous images and high contrast between the cranial nerves and other solid structures. The intracranial portions of all cranial nerves except olfactory nerve were identified; the extracranial portions of nerve Ⅱ-Ⅻ were identified in all subjects bilaterally. Conclusion: The 3D DW-SSFP sequence can characterize the normal MR appearance of cranial nerves and its branches and the ability to define the nerves may provide greater sensitivity and specificity in detecting abnormalities of craniofacial structure. (authors)

  18. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

    Directory of Open Access Journals (Sweden)

    Nachon Raethong

    2016-01-01

    Full Text Available Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes, electrochemical potential-driven transporters (33 genes, and primary active transporters (15 genes. To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction.

  19. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study.

    Science.gov (United States)

    Raethong, Nachon; Wong-Ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H(+)-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction.

  20. Screening the sequence selectivity of DNA-binding molecules using a gold nanoparticle-based colorimetric approach.

    Science.gov (United States)

    Hurst, Sarah J; Han, Min Su; Lytton-Jean, Abigail K R; Mirkin, Chad A

    2007-09-15

    We have developed a novel competition assay that uses a gold nanoparticle (Au NP)-based, high-throughput colorimetric approach to screen the sequence selectivity of DNA-binding molecules. This assay hinges on the observation that the melting behavior of DNA-functionalized Au NP aggregates is sensitive to the concentration of the DNA-binding molecule in solution. When short, oligomeric hairpin DNA sequences were added to a reaction solution consisting of DNA-functionalized Au NP aggregates and DNA-binding molecules, these molecules may either bind to the Au NP aggregate interconnects or the hairpin stems based on their relative affinity for each. This relative affinity can be measured as a change in the melting temperature (Tm) of the DNA-modified Au NP aggregates in solution. As a proof of concept, we evaluated the selectivity of 4',6-diamidino-2-phenylindone (an AT-specific binder), ethidium bromide (a nonspecific binder), and chromomycin A (a GC-specific binder) for six sequences of hairpin DNA having different numbers of AT pairs in a five-base pair variable stem region. Our assay accurately and easily confirmed the known trends in selectivity for the DNA binders in question without the use of complicated instrumentation. This novel assay will be useful in assessing large libraries of potential drug candidates that work by binding DNA to form a drug/DNA complex.

  1. Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation | Center for Cancer Research

    Science.gov (United States)

    Dubbed "Tom's T" by Dhruba Chattoraj, the unusually conserved thymine at position +7 in bacteriophage P1 plasmid RepA DNA binding sites rises above repressor and acceptor sequence logos. The T appears to represent base flipping prior to helix opening in this DNA replication initation protein.

  2. Genetic analysis of Fasciola isolates from cattle in Korea based on second internal transcribed spacer (ITS-2) sequence of nuclear ribosomal DNA.

    Science.gov (United States)

    Choe, Se-Eun; Nguyen, Thuy Thi-Dieu; Kang, Tae-Gyu; Kweon, Chang-Hee; Kang, Seung-Won

    2011-09-01

    Nuclear ribosomal DNA sequence of the second internal transcribed spacer (ITS-2) has been used efficiently to identify the liver fluke species collected from different hosts and various geographic regions. ITS-2 sequences of 19 Fasciola samples collected from Korean native cattle were determined and compared. Sequence comparison including ITS-2 sequences of isolates from this study and reference sequences from Fasciola hepatica and Fasciola gigantica and intermediate Fasciola in Genbank revealed seven identical variable sites of investigated isolates. Among 19 samples, 12 individuals had ITS-2 sequences completely identical to that of pure F. hepatica, five possessed the sequences identical to F. gigantica type, whereas two shared the sequence of both F. hepatica and F. gigantica. No variations in length and nucleotide composition of ITS-2 sequence were observed within isolates that belonged to F. hepatica or F. gigantica. At the position of 218, five Fasciola containing a single-base substitution (C>T) formed a distinct branch inside the F. gigantica-type group which was similar to those of Asian-origin isolates. The phylogenetic tree of the Fasciola spp. based on complete ITS-2 sequences from this study and other representative isolates in different locations clearly showed that pure F. hepatica, F. gigantica type and intermediate Fasciola were observed. The result also provided additional genetic evidence for the existence of three forms of Fasciola isolated from native cattle in Korea by genetic approach using ITS-2 sequence.

  3. Fall Detection for Elderly from Partially Observed Depth-Map Video Sequences Based on View-Invariant Human Activity Representation

    Directory of Open Access Journals (Sweden)

    Rami Alazrai

    2017-03-01

    Full Text Available This paper presents a new approach for fall detection from partially-observed depth-map video sequences. The proposed approach utilizes the 3D skeletal joint positions obtained from the Microsoft Kinect sensor to build a view-invariant descriptor for human activity representation, called the motion-pose geometric descriptor (MPGD. Furthermore, we have developed a histogram-based representation (HBR based on the MPGD to construct a length-independent representation of the observed video subsequences. Using the constructed HBR, we formulate the fall detection problem as a posterior-maximization problem in which the posteriori probability for each observed video subsequence is estimated using a multi-class SVM (support vector machine classifier. Then, we combine the computed posteriori probabilities from all of the observed subsequences to obtain an overall class posteriori probability of the entire partially-observed depth-map video sequence. To evaluate the performance of the proposed approach, we have utilized the Kinect sensor to record a dataset of depth-map video sequences that simulates four fall-related activities of elderly people, including: walking, sitting, falling form standing and falling from sitting. Then, using the collected dataset, we have developed three evaluation scenarios based on the number of unobserved video subsequences in the testing videos, including: fully-observed video sequence scenario, single unobserved video subsequence of random lengths scenarios and two unobserved video subsequences of random lengths scenarios. Experimental results show that the proposed approach achieved an average recognition accuracy of 93 . 6 % , 77 . 6 % and 65 . 1 % , in recognizing the activities during the first, second and third evaluation scenario, respectively. These results demonstrate the feasibility of the proposed approach to detect falls from partially-observed videos.

  4. Genotype, phenotype and in silico pathogenicity analysis of HEXB mutations: Panel based sequencing for differential diagnosis of gangliosidosis.

    Science.gov (United States)

    Mahdieh, Nejat; Mikaeeli, Sahar; Tavasoli, Ali Reza; Rezaei, Zahra; Maleki, Majid; Rabbani, Bahareh

    2018-04-01

    Gangliosidosis is an inherited metabolic disorder causing neurodegeneration and motor regression. Preventive diagnosis is the first choice for the affected families due to lack of straightforward therapy. Genetic studies could confirm the diagnosis and help families for carrier screening and prenatal diagnosis. An update of HEXB gene variants concerning genotype, phenotype and in silico analysis are presented. Panel based next generation sequencing and direct sequencing of four cases were performed to confirm the clinical diagnosis and for reproductive planning. Bioinformatic analyses of the HEXB mutation database were also performed. Direct sequencing of HEXA and HEXB genes showed recurrent homozygous variants at c.509G>A (p.Arg170Gln) and c.850C>T (p.Arg284Ter), respectively. A novel variant at c.416T>A (p.Leu139Gln) was identified in the GLB1 gene. Panel based next generation sequencing was performed for an undiagnosed patient which showed a novel mutation at c.1602C>A (p.Cys534Ter) of HEXB gene. Bioinformatic analysis of the HEXB mutation database showed 97% consistency of in silico genotype analysis with the phenotype. Bioinformatic analysis of the novel variants predicted to be disease causing. In silico structural and functional analysis of the novel variants showed structural effect of HEXB and functional effect of GLB1 variants which would provide fast analysis of novel variants. Panel based studies could be performed for overlapping symptomatic patients. Consequently, genetic testing would help affected families for patients' management, carrier detection, and family planning's. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology

    Directory of Open Access Journals (Sweden)

    Chao Shiaoman

    2011-01-01

    Full Text Available Abstract Background Genetic markers are pivotal to modern genomics research; however, discovery and genotyping of molecular markers in oat has been hindered by the size and complexity of the genome, and by a scarcity of sequence data. The purpose of this study was to generate oat expressed sequence tag (EST information, develop a bioinformatics pipeline for SNP discovery, and establish a method for rapid, cost-effective, and straightforward genotyping of SNP markers in complex polyploid genomes such as oat. Results Based on cDNA libraries of four cultivated oat genotypes, approximately 127,000 contigs were assembled from approximately one million Roche 454 sequence reads. Contigs were filtered through a novel bioinformatics pipeline to eliminate ambiguous polymorphism caused by subgenome homology, and 96 in silico SNPs were selected from 9,448 candidate loci for validation using high-resolution melting (HRM analysis. Of these, 52 (54% were polymorphic between parents of the Ogle1040 × TAM O-301 (OT mapping population, with 48 segregating as single Mendelian loci, and 44 being placed on the existing OT linkage map. Ogle and TAM amplicons from 12 primers were sequenced for SNP validation, revealing complex polymorphism in seven amplicons but general sequence conservation within SNP loci. Whole-amplicon interrogation with HRM revealed insertions, deletions, and heterozygotes in secondary oat germplasm pools, generating multiple alleles at some primer targets. To validate marker utility, 36 SNP assays were used to evaluate the genetic diversity of 34 diverse oat genotypes. Dendrogram clusters corresponded generally to known genome composition and genetic ancestry. Conclusions The high-throughput SNP discovery pipeline presented here is a rapid and effective method for identification of polymorphic SNP alleles in the oat genome. The current-generation HRM system is a simple and highly-informative platform for SNP genotyping. These techniques provide

  6. Sequence analysis of annually normalized citation counts: an empirical analysis based on the characteristic scores and scales (CSS) method.

    Science.gov (United States)

    Bornmann, Lutz; Ye, Adam Y; Ye, Fred Y

    2017-01-01

    In bibliometrics, only a few publications have focused on the citation histories of publications, where the citations for each citing year are assessed. In this study, therefore, annual categories of field- and time-normalized citation scores (based on the characteristic scores and scales method: 0 = poorly cited, 1 = fairly cited, 2 = remarkably cited, and 3 = outstandingly cited) are used to study the citation histories of papers. As our dataset, we used all articles published in 2000 and their annual citation scores until 2015. We generated annual sequences of citation scores (e.g., [Formula: see text]) and compared the sequences of annual citation scores of six broader fields (natural sciences, engineering and technology, medical and health sciences, agricultural sciences, social sciences, and humanities). In agreement with previous studies, our results demonstrate that sequences with poorly cited (0) and fairly cited (1) elements dominate the publication set; sequences with remarkably cited (3) and outstandingly cited (4) periods are rare. The highest percentages of constantly poorly cited papers can be found in the social sciences; the lowest percentages are in the agricultural sciences and humanities. The largest group of papers with remarkably cited (3) and/or outstandingly cited (4) periods shows an increasing impact over the citing years with the following orders of sequences: [Formula: see text] (6.01%), which is followed by [Formula: see text] (1.62%). Only 0.11% of the papers ( n  = 909) are constantly on the outstandingly cited level.

  7. Sequence-based genotyping clarifies conflicting historical morphometric and biological data for 5 Eimeria species infecting turkeys.

    Science.gov (United States)

    El-Sherry, S; Ogedengbe, M E; Hafeez, M A; Sayf-Al-Din, M; Gad, N; Barta, J R

    2015-02-01

    Unlike with Eimeria species infecting chickens, specific identification and nomenclature of Eimeria species infecting turkeys is complicated, and in the absence of molecular data, imprecise. In an attempt to reconcile contradictory data reported on oocyst morphometrics and biological descriptions of various Eimeria species infecting turkey, we established single oocyst derived lines of 5 important Eimeria species infecting turkeys, Eimeria meleagrimitis (USMN08-01 strain), Eimeria adenoeides (Guelph strain), Eimeria gallopavonis (Weybridge strain), Eimeria meleagridis (USAR97-01 strain), and Eimeria dispersa (Briston strain). Short portions (514 bp) of mitochondrial cytochrome c oxidase subunit I gene (mt COI) from each were amplified and sequenced. Comparison of these sequences showed sufficient species-specific sequence variation to recommend these short mt COI sequences as species-specific markers. Uniformity of oocyst features (dimensions and oocyst structure) of each pure line was observed. Additional morphological features of the oocysts of these species are described as useful for the microscopic differentiation of these Eimeria species. Combined molecular and morphometric data on these single species lines compared with the original species descriptions and more recent data have helped to clarify some confusing, and sometimes conflicting, features associated with these Eimeria spp. For example, these new data suggest that the KCH and KR strains of E. adenoeides reported previously represent 2 distinct species, E. adenoeides and E. meleagridis, respectively. Likewise, analysis of the Weybridge strain of E. adenoeides, which has long been used as a reference strain in various studies conducted on the pathogenicity of E. adenoeides, indicates that this coccidium is actually a strain of E. gallopavonis. We highly recommend mt COI sequence-based genotyping be incorporated into all studies using Eimeria spp. of turkeys to confirm species identifications and so

  8. Next generation sequencing based transcriptome analysis of septic-injury responsive genes in the beetle Tribolium castaneum.

    Directory of Open Access Journals (Sweden)

    Boran Altincicek

    Full Text Available Beetles (Coleoptera are the most diverse animal group on earth and interact with numerous symbiotic or pathogenic microbes in their environments. The red flour beetle Tribolium castaneum is a genetically tractable model beetle species and its whole genome sequence has recently been determined. To advance our understanding of the molecular basis of beetle immunity here we analyzed the whole transcriptome of T. castaneum by high-throughput next generation sequencing technology. Here, we demonstrate that the Illumina/Solexa sequencing approach of cDNA samples from T. castaneum including over 9.7 million reads with 72 base pairs (bp length (approximately 700 million bp sequence information with about 30× transcriptome coverage confirms the expression of most predicted genes and enabled subsequent qualitative and quantitative transcriptome analysis. This approach recapitulates our recent quantitative real-time PCR studies of immune-challenged and naïve T. castaneum beetles, validating our approach. Furthermore, this sequencing analysis resulted in the identification of 73 differentially expressed genes upon immune-challenge with statistical significance by comparing expression data to calculated values derived by fitting to generalized linear models. We identified up regulation of diverse immune-related genes (e.g. Toll receptor, serine proteinases, DOPA decarboxylase and thaumatin and of numerous genes encoding proteins with yet unknown functions. Of note, septic-injury resulted also in the elevated expression of genes encoding heat-shock proteins or cytochrome P450s supporting the view that there is crosstalk between immune and stress responses in T. castaneum. The present study provides a first comprehensive overview of septic-injury responsive genes in T. castaneum beetles. Identified genes advance our understanding of T. castaneum specific gene expression alteration upon immune-challenge in particular and may help to understand beetle immunity

  9. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

    Directory of Open Access Journals (Sweden)

    Shi Weisong

    2011-06-01

    Full Text Available Abstract Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS. However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80% mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http

  10. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.

    Science.gov (United States)

    Nguyen, Tung; Shi, Weisong; Ruden, Douglas

    2011-06-06

    Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http

  11. Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data.

    Science.gov (United States)

    Zhang, Yun; Baheti, Saurabh; Sun, Zhifu

    2018-05-01

    High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.

  12. Comparison of sequencing based CNV discovery methods using monozygotic twin quartets.

    Directory of Open Access Journals (Sweden)

    Marc-André Legault

    Full Text Available The advent of high throughput sequencing methods breeds an important amount of technical challenges. Among those is the one raised by the discovery of copy-number variations (CNVs using whole-genome sequencing data. CNVs are genomic structural variations defined as a variation in the number of copies of a large genomic fragment, usually more than one kilobase. Here, we aim to compare different CNV calling methods in order to assess their ability to consistently identify CNVs by comparison of the calls in 9 quartets of identical twin pairs. The use of monozygotic twins provides a means of estimating the error rate of each algorithm by observing CNVs that are inconsistently called when considering the rules of Mendelian inheritance and the assumption of an identical genome between twins. The similarity between the calls from the different tools and the advantage of combining call sets were also considered.ERDS and CNVnator obtained the best performance when considering the inherited CNV rate with a mean of 0.74 and 0.70, respectively. Venn diagrams were generated to show the agreement between the different algorithms, before and after filtering out familial inconsistencies. This filtering revealed a high number of false positives for CNVer and Breakdancer. A low overall agreement between the methods suggested a high complementarity of the different tools when calling CNVs. The breakpoint sensitivity analysis indicated that CNVnator and ERDS achieved better resolution of CNV borders than the other tools. The highest inherited CNV rate was achieved through the intersection of these two tools (81%.This study showed that ERDS and CNVnator provide good performance on whole genome sequencing data with respect to CNV consistency across families, CNV breakpoint resolution and CNV call specificity. The intersection of the calls from the two tools would be valuable for CNV genotyping pipelines.

  13. Remote consulting based on ultrasonic digital immages and dynamic ultrasonic sequences

    Science.gov (United States)

    Margan, Anamarija; Rustemović, Nadan

    2006-03-01

    Telematic ultrasonic diagnostics is a relatively new tool in providing health care to patients in remote, islolated communities. Our project facility, "The Virtual Polyclinic - A Specialists' Consulting Network for the Islands", is located on the island of Cres in the Adriatic Sea in Croatia and has been extending telemedical services to the archipelago population since 2000. Telemedicine applications include consulting services by specialists at the University Clinical Hospital Center Rebro in Zagreb and at "Magdalena", a leading cardiology clinic in Croatia. After several years of experience with static high resolution ultrasonic digital immages for referral consulting diagnostics purposes, we now also use dynamic ultrasonic sequences in a project with the Department of Emmergency Gastroenterology at Rebro in Zagreb. The aim of the ongoing project is to compare the advantages and shortcomings in transmitting static ultrasonic digital immages and live sequences of ultrasonic examination in telematic diagnostics. Ultrasonic examination is a dynamic process in which the diagnostic accuracy is highly dependent on the dynamic moment of an ultrasound probe and signal. Our first results indicate that in diffuse parenchymal organ pathology the progression and the follow up of a disease is better presented to a remote consulting specialist by dynamic ultrasound sequences. However, the changes that involve only one part of a parenchymal organ can be suitably presented by static ultrasonic digital images alone. Furthermore, we need less time for digital imaging and such tele-consultations overall are more economical. Our previous telemedicine research and practice proved that we can greatly improve the level of medical care in remote healthcare facilities and cut healthcare costs considerably. The experience in the ongoing project points to a conclusion that we can further optimize remote diagnostics benefits by a right choice of telematic application thus reaching a

  14. Deep sequencing-based analysis of the Cymbidium ensifolium floral transcriptome.

    Directory of Open Access Journals (Sweden)

    Xiaobai Li

    Full Text Available Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs, 41,690 into 58 gene ontology (GO terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium.

  15. The use of sequence-based SSR mining for the development of a vast collection of microsatellites in Aquilegia Formosa

    Science.gov (United States)

    Brandon Schlautman; Vera Pfeiffer; Juan Zalapa; Johanne Brunet

    2014-01-01

    Numerous microsatellite markers were developed for Aquilegia formosafrom sequences deposited within the Expressed Sequence Tag (EST), Genomic Survey Sequence (GSS), and Nucleotide databases in NCBI. Microsatellites (SSRs) were identified and primers were designed for 9 SSR containing sequences in the Nucleotide database, 3803 sequences in the EST...

  16. An Exponential Combination Procedure for Set-Based Association Tests in Sequencing Studies

    OpenAIRE

    Chen, Lin S.; Hsu, Li; Gamazon, Eric R.; Cox, Nancy J.; Nicolae, Dan L.

    2012-01-01

    State-of-the-art next-generation-sequencing technologies can facilitate in-depth explorations of the human genome by investigating both common and rare variants. For the identification of genetic factors that are associated with disease risk or other complex phenotypes, methods have been proposed for jointly analyzing variants in a set (e.g., all coding SNPs in a gene). Variants in a properly defined set could be associated with risk or phenotype in a concerted fashion, and by accumulating in...

  17. Studying the evolutionary relationships and phylogenetic trees of 21 groups of tRNA sequences based on complex networks.

    Science.gov (United States)

    Wei, Fangping; Chen, Bowen

    2012-03-01

    To find out the evolutionary relationships among different tRNA sequences of 21 amino acids, 22 networks are constructed. One is constructed from whole tRNAs, and the other 21 networks are constructed from the tRNAs which carry the same amino acids. A new method is proposed such that the alignment scores of any two amino acids groups are determined by the average degree and the average clustering coefficient of their networks. The anticodon feature of isolated tRNA and the phylogenetic trees of 21 group networks are discussed. We find that some isolated tRNA sequences in 21 networks still connect with other tRNAs outside their group, which reflects the fact that those tRNAs might evolve by intercrossing among these 21 groups. We also find that most anticodons among the same cluster are only one base different in the same sites when S ≥ 70, and they stay in the same rank in the ladder of evolutionary relationships. Those observations seem to agree on that some tRNAs might mutate from the same ancestor sequences based on point mutation mechanisms.

  18. Methylene blue binding to DNA with alternating AT base sequence: minor groove binding is favored over intercalation.

    Science.gov (United States)

    Rohs, Remo; Sklenar, Heinz

    2004-04-01

    The results presented in this paper on methylene blue (MB) binding to DNA with AT alternating base sequence complement the data obtained in two former modeling studies of MB binding to GC alternating DNA. In the light of the large amount of experimental data for both systems, this theoretical study is focused on a detailed energetic analysis and comparison in order to understand their different behavior. Since experimental high-resolution structures of the complexes are not available, the analysis is based on energy minimized structural models of the complexes in different binding modes. For both sequences, four different intercalation structures and two models for MB binding in the minor and major groove have been proposed. Solvent electrostatic effects were included in the energetic analysis by using electrostatic continuum theory, and the dependence of MB binding on salt concentration was investigated by solving the non-linear Poisson-Boltzmann equation. We find that the relative stability of the different complexes is similar for the two sequences, in agreement with the interpretation of spectroscopic data. Subtle differences, however, are seen in energy decompositions and can be attributed to the change from symmetric 5'-YpR-3' intercalation to minor groove binding with increasing salt concentration, which is experimentally observed for the AT sequence at lower salt concentration than for the GC sequence. According to our results, this difference is due to the significantly lower non-electrostatic energy for the minor groove complex with AT alternating DNA, whereas the slightly lower binding energy to this sequence is caused by a higher deformation energy of DNA. The energetic data are in agreement with the conclusions derived from different spectroscopic studies and can also be structurally interpreted on the basis of the modeled complexes. The simple static modeling technique and the neglect of entropy terms and of non-electrostatic solute

  19. Investigating the role of sliding friction in rolling motion: a teaching sequence based on experiments and simulations

    Science.gov (United States)

    De Ambrosis, Anna; Malgieri, Massimiliano; Mascheretti, Paolo; Onorato, Pasquale

    2015-05-01

    We designed a teaching-learning sequence on rolling motion, rooted in previous research about student conceptions, and proposing an educational reconstruction strongly centred on the role of friction in different cases of rolling. A series of experiments based on video analysis is used to highlight selected key concepts and to motivate students in their exploration of the topic; and interactive simulations, which can be modified on the fly by students to model different physical situations, are used to stimulate autonomous investigation in enquiry activities. The activity sequence was designed for students on introductory physics courses and was tested with a group of student teachers. Comparisons between pre- and post-tests, and between our results and those reported in the literature, indicate that students’ understanding of rolling motion improved markedly and some typical difficulties were overcome.

  20. Carcinogen-DNA interaction study by base sequence footprinting. Progress report, July 1, 1985-January 21, 1986

    International Nuclear Information System (INIS)

    Bases, R.

    1986-01-01

    Acetyl-aminofluorene (AAF) modified plasmid pSV 2 CAT is being studied to learn how the adducts influence expression of chloramphenicol acetyltransferase (CAT) genes. phi X-174 RF DNA exhibits specific base sequence abnormalities induced by the formation of AAF adducts. The DNAase I sensitive state of AAF modified DNA sequences could presumably lead to enhanced expression of genes since it is a well-known characteristic of active or potentially active derepressed genes. DNAase I hypersensitive sites are necessary but not sufficient for transcription. We observed enhanced expression of CAT genes in CV-1 cells after transfection with modified plasmids, using electroporation to introduce the plasmids into the cells. 34 refs., 2 figs

  1. Optimization of dynamic economic dispatch with valve-point effect using chaotic sequence based differential evolution algorithms

    International Nuclear Information System (INIS)

    He Dakuo; Dong Gang; Wang Fuli; Mao Zhizhong

    2011-01-01

    A chaotic sequence based differential evolution (DE) approach for solving the dynamic economic dispatch problem (DEDP) with valve-point effect is presented in this paper. The proposed method combines the DE algorithm with the local search technique to improve the performance of the algorithm. DE is the main optimizer, while an approximated model for local search is applied to fine tune in the solution of the DE run. To accelerate convergence of DE, a series of constraints handling rules are adopted. An initial population obtained by using chaotic sequence exerts optimal performance of the proposed algorithm. The combined algorithm is validated for two test systems consisting of 10 and 13 thermal units whose incremental fuel cost function takes into account the valve-point loading effects. The proposed combined method outperforms other algorithms reported in literatures for DEDP considering valve-point effects.

  2. NetOglyc: prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility

    DEFF Research Database (Denmark)

    Hansen, Jan Erik; Lund, Ole; Tolstrup, Niels

    1998-01-01

    -glycosylated serine and threonine residues in independent test sets, thus proving more accurate than matrix statistics and vector projection methods. Predicition of O-glycosylation sites in the envelope glycoprotein gp120 from the primate lentiviruses HIV-1, HIV-2 and SIV are presented. The most conserved O...... structure and surface accessibility. The sequence context of glycosylated threonines was found to differ from that of serine, and the sites were found to cluster. Non-clustered sites had a sequence context different from that of clustered sites. charged residues were disfavoured at postition -1 and +3......-glycosylation signals in these evolutionary-related glycoproteins were found in their first hypervariable loop, V1. However, the strain variation for HIV-1 gp120 was significant. A computer server, available through WWW or E-mail, has been developed for prediction of mucin type O-glycosylation sites in proteins based...

  3. Molecular Phylogenetics and Temporal Diversification in the Genus Aeromonas Based on the Sequences of Five Housekeeping Genes

    Science.gov (United States)

    Lorén, J. Gaspar; Farfán, Maribel; Fusté, M. Carmen

    2014-01-01

    Several approaches have been developed to estimate both the relative and absolute rates of speciation and extinction within clades based on molecular phylogenetic reconstructions of evolutionary relationships, according to an underlying model of diversification. However, the macroevolutionary models established for eukaryotes have scarcely been used with prokaryotes. We have investigated the rate and pattern of cladogenesis in the genus Aeromonas (γ-Proteobacteria, Proteobacteria, Bacteria) using the sequences of five housekeeping genes and an uncorrelated relaxed-clock approach. To our knowledge, until now this analysis has never been applied to all the species described in a bacterial genus and thus opens up the possibility of establishing models of speciation from sequence data commonly used in phylogenetic studies of prokaryotes. Our results suggest that the genus Aeromonas began to diverge between 248 and 266 million years ago, exhibiting a constant divergence rate through the Phanerozoic, which could be described as a pure birth process. PMID:24586399

  4. Quasi-Coherent Noise Jamming to LFM Radar Based on Pseudo-random Sequence Phase-modulation

    Directory of Open Access Journals (Sweden)

    N. Tai

    2015-12-01

    Full Text Available A novel quasi-coherent noise jamming method is proposed against linear frequency modulation (LFM signal and pulse compression radar. Based on the structure of digital radio frequency memory (DRFM, the jamming signal is acquired by the pseudo-random sequence phase-modulation of sampled radar signal. The characteristic of jamming signal in time domain and frequency domain is analyzed in detail. Results of ambiguity function indicate that the blanket jamming effect along the range direction will be formed when jamming signal passes through the matched filter. By flexible controlling the parameters of interrupted-sampling pulse and pseudo-random sequence, different covering distances and jamming effects will be achieved. When the jamming power is equivalent, this jamming obtains higher process gain compared with non-coherent jamming. The jamming signal enhances the detection threshold and the real target avoids being detected. Simulation results and circuit engineering implementation validate that the jamming signal covers real target effectively.

  5. High-speed all-optical DNA local sequence alignment based on a three-dimensional artificial neural network.

    Science.gov (United States)

    Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra

    2017-07-01

    This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.

  6. Temporal sequence learning in winner-take-all networks of spiking neurons demonstrated in a brain-based device.

    Science.gov (United States)

    McKinstry, Jeffrey L; Edelman, Gerald M

    2013-01-01

    Animal behavior often involves a temporally ordered sequence of actions learned from experience. Here we describe simulations of interconnected networks of spiking neurons that learn to generate patterns of activity in correct temporal order. The simulation consists of large-scale networks of thousands of excitatory and inhibitory neurons that exhibit short-term synaptic plasticity and spike-timing dependent synaptic plasticity. The neural architecture within each area is arranged to evoke winner-take-all (WTA) patterns of neural activity that persist for tens of milliseconds. In order to generate and switch between consecutive firing patterns in correct temporal order, a reentrant exchange of signals between these areas was necessary. To demonstrate the capacity of this arrangement, we used the simulation to train a brain-based device responding to visual input by autonomously generating temporal sequences of motor actions.

  7. The occurrence of Toxocara malaysiensis in cats in China, confirmed by sequence-based analyses of ribosomal DNA.

    Science.gov (United States)

    Li, Ming-Wei; Zhu, Xing-Quan; Gasser, Robin B; Lin, Rui-Qing; Sani, Rehana A; Lun, Zhao-Rong; Jacobs, Dennis E

    2006-10-01

    Non-isotopic polymerase chain reaction (PCR)-based single-strand conformation polymorphism and sequence analyses of the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA (rDNA) were utilized to genetically characterise ascaridoids from dogs and cats from China by comparison with those from other countries. The study showed that Toxocara canis, Toxocara cati, and Toxascaris leonina from China were genetically the same as those from other geographical origins. Specimens from cats from Guangzhou, China, which were morphologically consistent with Toxocara malaysiensis, were the same genetically as those from Malaysia, with the exception of a polymorphism in the ITS-2 but no unequivocal sequence difference. This is the first report of T. malaysiensis in cats outside of Malaysia (from where it was originally described), supporting the proposal that this species has a broader geographical distribution. The molecular approach employed provides a powerful tool for elucidating the biology, epidemiology, and zoonotic significance of T. malaysiensis.

  8. SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences.

    Science.gov (United States)

    Pickett, B D; Karlinsey, S M; Penrod, C E; Cormier, M J; Ebbert, M T W; Shiozawa, D K; Whipple, C J; Ridge, P G

    2016-09-01

    Simple Sequence Repeats (SSRs) are used to address a variety of research questions in a variety of fields (e.g. population genetics, phylogenetics, forensics, etc.), due to their high mutability within and between species. Here, we present an innovative algorithm, SA-SSR, based on suffix and longest common prefix arrays for efficiently detecting SSRs in large sets of sequences. Existing SSR detection applications are hampered by one or more limitations (i.e. speed, accuracy, ease-of-use, etc.). Our algorithm addresses these challenges while being the most comprehensive and correct SSR detection software available. SA-SSR is 100% accurate and detected >1000 more SSRs than the second best algorithm, while offering greater control to the user than any existing software. SA-SSR is freely available at http://github.com/ridgelab/SA-SSR CONTACT: perry.ridge@byu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  9. Evaluation of the cranial base in amnion rupture sequence involving the anterior neural tube: implications regarding recurrence risk.

    Science.gov (United States)

    Jones, Kenneth Lyons; Robinson, Luther K; Benirschke, Kurt

    2006-09-01

    Amniotic bands can cause disruption of the cranial end of the developing fetus, leading in some cases to a neural tube closure defect. Although recurrence for unaffected parents of an affected child with a defect in which the neural tube closed normally but was subsequently disrupted by amniotic bands is negligible; for a primary defect in closure of the neural tube to which amnion has subsequently adhered, recurrence risk is 1.7%. In that primary defects of neural tube closure are characterized by typical abnormalities of the base of the skull, evaluation of the cranial base in such fetuses provides an approach for making a distinction between these 2 mechanisms. This distinction has implications regarding recurrence risk. The skull base of 2 fetuses with amnion rupture sequence involving the cranial end of the neural tube were compared to that of 1 fetus with anencephaly as well as that of a structurally normal fetus. The skulls were cleaned, fixed in 10% formalin, recleaned, and then exposed to 10% KOH solution. After washing and recleaning, the skulls were exposed to hydrogen peroxide for bleaching and photography. Despite involvement of the anterior neural tube in both fetuses with amnion rupture sequence, in Case 3 the cranial base was normal while in Case 4 the cranial base was similar to that seen in anencephaly. This technique provides a method for determining the developmental pathogenesis of anterior neural tube defects in cases of amnion rupture sequence. As such, it provides information that can be used to counsel parents of affected children with respect to recurrence risk.

  10. Modeling and optimizing periodically inspected software rejuvenation policy based on geometric sequences

    International Nuclear Information System (INIS)

    Meng, Haining; Liu, Jianjun; Hei, Xinhong

    2015-01-01

    Software aging is characterized by an increasing failure rate, progressive performance degradation and even a sudden crash in a long-running software system. Software rejuvenation is an effective method to counteract software aging. A periodically inspected rejuvenation policy for software systems is studied. The consecutive inspection intervals are assumed to be a decreasing geometric sequence, and upon the inspection times of software system and its failure features, software rejuvenation or system recovery is performed. The system availability function and cost rate function are obtained, and the optimal inspection time and rejuvenation interval are both derived to maximize system availability and minimize cost rate. Then, boundary conditions of the optimal rejuvenation policy are deduced. Finally, the numeric experiment result shows the effectiveness of the proposed policy. Further compared with the existing software rejuvenation policy, the new policy has higher system availability. - Highlights: • A periodically inspected rejuvenation policy for software systems is studied. • A decreasing geometric sequence is used to denote the consecutive inspection intervals. • The optimal inspection times and rejuvenation interval are found. • The new policy is capable of reducing average cost and improving system availability

  11. Use of whole exome sequencing for the identification of Ito-based arrhythmia mechanism and therapy.

    Science.gov (United States)

    Sturm, Amy C; Kline, Crystal F; Glynn, Patric; Johnson, Benjamin L; Curran, Jerry; Kilic, Ahmet; Higgins, Robert S D; Binkley, Philip F; Janssen, Paul M L; Weiss, Raul; Raman, Subha V; Fowler, Steven J; Priori, Silvia G; Hund, Thomas J; Carnes, Cynthia A; Mohler, Peter J

    2015-05-26

    Identified genetic variants are insufficient to explain all cases of inherited arrhythmia. We tested whether the integration of whole exome sequencing with well-established clinical, translational, and basic science platforms could provide rapid and novel insight into human arrhythmia pathophysiology and disease treatment. We report a proband with recurrent ventricular fibrillation, resistant to standard therapeutic interventions. Using whole-exome sequencing, we identified a variant in a previously unidentified exon of the dipeptidyl aminopeptidase-like protein-6 (DPP6) gene. This variant is the first identified coding mutation in DPP6 and augments cardiac repolarizing current (Ito) causing pathological changes in Ito and action potential morphology. We designed a therapeutic regimen incorporating dalfampridine to target Ito. Dalfampridine, approved for multiple sclerosis, normalized the ECG and reduced arrhythmia burden in the proband by >90-fold. This was combined with cilostazol to accelerate the heart rate to minimize the reverse-rate dependence of augmented Ito. We describe a novel arrhythmia mechanism and therapeutic approach to ameliorate the disease. Specifically, we identify the first coding variant of DPP6 in human ventricular fibrillation. These findings illustrate the power of genetic approaches for the elucidation and treatment of disease when carefully integrated with clinical and basic/translational research teams. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.

  12. Prediction of Toxin Genes from Chinese Yellow Catfish Based on Transcriptomic and Proteomic Sequencing

    Directory of Open Access Journals (Sweden)

    Bing Xie

    2016-04-01

    Full Text Available Fish venom remains a virtually untapped resource. There are so few fish toxin sequences for reference, which increases the difficulty to study toxins from venomous fish and to develop efficient and fast methods to dig out toxin genes or proteins. Here, we utilized Chinese yellow catfish (Pelteobagrus fulvidraco as our research object, since it is a representative species in Siluriformes with its venom glands embedded in the pectoral and dorsal fins. In this study, we set up an in-house toxin database and a novel toxin-discovering protocol to dig out precise toxin genes by combination of transcriptomic and proteomic sequencing. Finally, we obtained 15 putative toxin proteins distributed in five groups, namely Veficolin, Ink toxin, Adamalysin, Za2G and CRISP toxin. It seems that we have developed a novel bioinformatics method, through which we could identify toxin proteins with high confidence. Meanwhile, these toxins can also be useful for comparative studies in other fish and development of potential drugs.

  13. Predicting sumoylation sites using support vector machines based on various sequence features, conformational flexibility and disorder.

    Science.gov (United States)

    Yavuz, Ahmet Sinan; Sezerman, Osman Ugur

    2014-01-01

    Sumoylation, which is a reversible and dynamic post-translational modification, is one of the vital processes in a cell. Before a protein matures to perform its function, sumoylation may alter its localization, interactions, and possibly structural conformation. Abberations in protein sumoylation has been linked with a variety of disorders and developmental anomalies. Experimental approaches to identification of sumoylation sites may not be effective due to the dynamic nature of sumoylation, laborsome experiments and their cost. Therefore, computational approaches may guide experimental identification of sumoylation sites and provide insights for further understanding sumoylation mechanism. In this paper, the effectiveness of using various sequence properties in predicting sumoylation sites was investigated with statistical analyses and machine learning approach employing support vector machines. These sequence properties were derived from windows of size 7 including position-specific amino acid composition, hydrophobicity, estimated sub-window volumes, predicted disorder, and conformational flexibility. 5-fold cross-validation results on experimentally identified sumoylation sites revealed that our method successfully predicts sumoylation sites with a Matthew's correlation coefficient, sensitivity, specificity, and accuracy equal to 0.66, 73%, 98%, and 97%, respectively. Additionally, we have showed that our method compares favorably to the existing prediction methods and basic regular expressions scanner. By using support vector machines, a new, robust method for sumoylation site prediction was introduced. Besides, the possible effects of predicted conformational flexibility and disorder on sumoylation site recognition were explored computationally for the first time to our knowledge as an additional parameter that could aid in sumoylation site prediction.

  14. Genetic distances and phylogenetic trees of different Awassi sheep populations based on DNA sequencing.

    Science.gov (United States)

    Al-Atiyat, R M; Aljumaah, R S

    2014-08-27

    This study aimed to estimate evolutionary distances and to reconstruct phylogeny trees between different Awassi sheep populations. Thirty-two sheep individuals from three different geographical areas of Jordan and the Kingdom of Saudi Arabia (KSA) were randomly sampled. DNA was extracted from the tissue samples and sequenced using the T7 promoter universal primer. Different phylogenetic trees were reconstructed from 0.64-kb DNA sequences using the MEGA software with the best general time reverse distance model. Three methods of distance estimation were then used. The maximum composite likelihood test was considered for reconstructing maximum likelihood, neighbor-joining and UPGMA trees. The maximum likelihood tree indicated three major clusters separated by cytosine (C) and thymine (T). The greatest distance was shown between the South sheep and North sheep. On the other hand, the KSA sheep as an outgroup showed shorter evolutionary distance to the North sheep population than to the others. The neighbor-joining and UPGMA trees showed quite reliable clusters of evolutionary differentiation of Jordan sheep populations from the Saudi population. The overall results support geographical information and ecological types of the sheep populations studied. Summing up, the resulting phylogeny trees may contribute to the limited information about the genetic relatedness and phylogeny of Awassi sheep in nearby Arab countries.

  15. A new clade, based on partial LSU rDNA sequences, of unarmoured dinoflagellates.

    Science.gov (United States)

    Reñé, Albert; de Salas, Miguel; Camp, Jordi; Balagué, Vanessa; Garcés, Esther

    2013-09-01

    The order Gymnodiniales comprises unarmoured dinoflagellates. However, the lack of sequences hindered determining the phylogenetic positions and systematic relationships of several gymnodinioid taxa. In this study, a monophyletic clade was defined for the species Ceratoperidinium margalefii Loeblich III, Gyrodinium falcatum Kofoid & Swezy, three Cochlodinium species, and two Gymnodinium-like dinoflagellates. Despite their substantial morphotypic differentiation, Cochlodinium cf. helix, G. falcatum and 'Gymnodinium' sp. 1 share a common shape of the acrobase. The phylogenetic data led to the following conclusions: (1) C. margalefii is closely related to several unarmoured dinoflagellates. Its sulcus shape has been observed for the first time. (2) G. falcatum was erroneously assigned to the genus Gyrodinium and is transferred to Ceratoperidinium (C. falcatum (Kofoid & Swezy) Reñé & de Salas comb. nov.). (3) The genus Cochlodinium is polyphyletic and thus artificial; our data support its separation into three different genera. (4) The two Gymnodinium-like species could not be morphologically or phylogenetically related to any other gymnodinioid species sequenced to date. While not all studied species have been definitively transferred to the correct genus, our study is a step forward in the classification of inconspicuous unarmoured dinoflagellates. The family Ceratoperidiniaeceae and the genus Ceratoperidinium are emended. Copyright © 2013 Elsevier GmbH. All rights reserved.

  16. Phylogeny of minute carabid beetles and their relatives based upon DNA sequence data (Coleoptera, Carabidae, Trechitae

    Directory of Open Access Journals (Sweden)

    David Maddison

    2011-11-01

    Full Text Available The phylogeny of ground beetles of supertribe Trechitae is inferred using DNA sequences of genes that code for 28S ribosomal RNA, 18S ribosomal RNA, and wingless. Within the outgroups, austral psydrines are inferred to be monophyletic, and separate from the three genera of true Psydrina (Psydrus, Nomius, Laccocenus; the austral psydrines are formally removed from Psydrini and are treated herein as their own tribe, Moriomorphini Sloane. All three genes place Gehringia with Psydrina. Trechitae is inferred to be monophyletic, and sister to Patrobini.Within trechites, evidence is presented that Tasmanitachoides is not a tachyine, but is instead a member of Trechini. Perileptus is a member of subtribe Trechodina. Against Erwin’s hypothesis of anillines as a polyphyletic lineage derived from the tachyine genus Paratachys, the anillines sampled are monophyletic, and not related to Paratachys. Zolini, Pogonini, Tachyina, and Xystosomina are all monophyletic, with the latter two being sister groups. The relationships of the subtribe Bembidiina were studied in greater detail. Phrypeus is only distantly related to Bembidion, and there is no evidence from sequence data that it belongs within Bembidiina. Three groups that have been recently considered to be outside of the large genus Bembidion are shown to be derived members of Bembidion, related to subgroups: Cillenus is related to the Ocydromus complex of Bembidion, Zecillenus is related to the New Zealand subgenus Zeplataphus, and Hydrium is close to subgenus Metallina. The relationships among major lineages of Trechitae are not, however, resolved with these data.

  17. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Science.gov (United States)

    Shi, Jieming; Li, Xi; Dong, Min; Graham, Mitchell; Yadav, Nehul; Liang, Chun

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  18. JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures

    Science.gov (United States)

    Dong, Min; Graham, Mitchell; Yadav, Nehul

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416

  19. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Directory of Open Access Journals (Sweden)

    Jieming Shi

    Full Text Available Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  20. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    Science.gov (United States)

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  1. Frequent genes in rare diseases: panel-based next generation sequencing to disclose causal mutations in hereditary neuropathies.

    Science.gov (United States)

    Dohrn, Maike F; Glöckle, Nicola; Mulahasanovic, Lejla; Heller, Corina; Mohr, Julia; Bauer, Christine; Riesch, Erik; Becker, Andrea; Battke, Florian; Hörtnagel, Konstanze; Hornemann, Thorsten; Suriyanarayanan, Saranya; Blankenburg, Markus; Schulz, Jörg B; Claeys, Kristl G; Gess, Burkhard; Katona, Istvan; Ferbert, Andreas; Vittore, Debora; Grimm, Alexander; Wolking, Stefan; Schöls, Ludger; Lerche, Holger; Korenke, G Christoph; Fischer, Dirk; Schrank, Bertold; Kotzaeridou, Urania; Kurlemann, Gerhard; Dräger, Bianca; Schirmacher, Anja; Young, Peter; Schlotter-Weigel, Beate; Biskup, Saskia

    2017-12-01

    Hereditary neuropathies comprise a wide variety of chronic diseases associated to more than 80 genes identified to date. We herein examined 612 index patients with either a Charcot-Marie-Tooth phenotype, hereditary sensory neuropathy, familial amyloid neuropathy, or small fiber neuropathy using a customized multigene panel based on the next generation sequencing technique. In 121 cases (19.8%), we identified at least one putative pathogenic mutation. Of these, 54.4% showed an autosomal dominant, 33.9% an autosomal recessive, and 11.6% an X-linked inheritance. The most frequently affected genes were PMP22 (16.4%), GJB1 (10.7%), MPZ, and SH3TC2 (both 9.9%), and MFN2 (8.3%). We further detected likely or known pathogenic variants in HINT1, HSPB1, NEFL, PRX, IGHMBP2, NDRG1, TTR, EGR2, FIG4, GDAP1, LMNA, LRSAM1, POLG, TRPV4, AARS, BIC2, DHTKD1, FGD4, HK1, INF2, KIF5A, PDK3, REEP1, SBF1, SBF2, SCN9A, and SPTLC2 with a declining frequency. Thirty-four novel variants were considered likely pathogenic not having previously been described in association with any disorder in the literature. In one patient, two homozygous mutations in HK1 were detected in the multigene panel, but not by whole exome sequencing. A novel missense mutation in KIF5A was considered pathogenic because of the highly compatible phenotype. In one patient, the plasma sphingolipid profile could functionally prove the pathogenicity of a mutation in SPTLC2. One pathogenic mutation in MPZ was identified after being previously missed by Sanger sequencing. We conclude that panel based next generation sequencing is a useful, time- and cost-effective approach to assist clinicians in identifying the correct diagnosis and enable causative treatment considerations. © 2017 International Society for Neurochemistry.

  2. Development of SSR Markers Based on Transcriptome Sequencing and Association Analysis with Drought Tolerance in Perennial Grass Miscanthus from China

    Directory of Open Access Journals (Sweden)

    Gang Nie

    2017-05-01

    Full Text Available Drought has become a critical environmental stress affecting on plant in temperate area. As one of the promising bio-energy crops to sustainable biomass production, the genus Miscanthus has been widely studied around the world. However, the most widely used hybrid cultivar among this genus, Miscanthus × giganteus is proved poor drought tolerance compared to some parental species. Here we mainly focused on Miscanthus sinensis, which is one of the progenitors of M. × giganteus providing a comparable yield and well abiotic stress tolerance in some places. The main objectives were to characterize the physiological and photosynthetic respond to drought stress and to develop simple sequence repeats (SSRs markers associated with drought tolerance by transcriptome sequencing within an originally collection of 44 Miscanthus genotypes from southwest China. Significant phenotypic differences were observed among genotypes, and the average of leaf relative water content (RWC were severely affected by drought stress decreasing from 88.27 to 43.21%, which could well contribute to separating the drought resistant and drought sensitive genotype of Miscanthus. Furthermore, a total of 16,566 gene-associated SSRs markers were identified based on Illumina RNA sequencing under drought conditions, and 93 of them were randomly selected to validate. In total, 70 (75.3% SSRs were successfully amplified and the generated loci from 30 polymorphic SSRs were used to estimate the genetic differentiation and population structure. Finally, two optimum subgroups of the population were determined by structure analysis and based on association analysis, seven significant associations were identified including two markers with leaf RWC and five markers with photosynthetic traits. With the rich sequencing resources annotation, such associations would serve an efficient tool for Miscanthus drought response mechanism study and facilitate genetic improvement of drought resistant for

  3. Sequence-based separation of single-stranded DNA using nucleotides in capillary electrophoresis: focus on phosphate.

    Science.gov (United States)

    Zhang, Xueru; McGown, Linda B

    2013-06-01

    DNA analysis has widespread applicability in biology, medicine, biotechnology, and forensics. DNA separation by length is readily achieved using sieving gels in electrophoresis. Separation by sequence is less simple, generally requiring adequate differences in native or induced conformation or differences in thermal or chemical stability of the strands that are hybridized prior to measurement. We previously demonstrated separation of four single-stranded DNA 76-mers that differ by only a few A-G substitutions based solely on sequence using guanosine-5'-monophosphate (GMP) in the running buffer. We attributed separation to the unique self-assembly of GMP to form higher order structures. Here, we examine an expanded set of 76-mers designed to probe the mechanism of the separation and effects of experimental conditions. We were surprised to find that other ribonucleotides achieved the similar separation to GMP, and that some separation was achieved using sodium phosphate instead of GMP. Potassium phosphate achieved almost as good separations as the ribonucleotides. This suggests that the separation medium provides a physicochemical environment for the DNA that effects strand migration in a sequence-selective manner. Further investigation is needed to determine whether the mechanism involves specific interactions between the phosphates and the DNA strands or is a result of other properties of the separation medium. Phosphate generally has been avoided in DNA separations by capillary gel electrophoresis because its high ionic strength exacerbates Joule heating. Our results suggest that phosphate compounds should be examined for separation of DNA based on sequence. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    Science.gov (United States)

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal