WorldWideScience

Sample records for sequence analysis based

  1. Movement Pattern Analysis Based on Sequence Signatures

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  2. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  3. Noncoding sequence classification based on wavelet transform analysis: part I

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  4. Streaming support for data intensive cloud-based sequence analysis.

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  5. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Shadi A. Issa

    2013-01-01

    Full Text Available Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  6. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  7. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  8. Illumina-based de novo transcriptome sequencing and analysis

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  9. Whole-genome sequence-based analysis of thyroid function

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  10. Recent advances in nanopore-based nucleic acid analysis and sequencing

    Shi, Jidong; Fang, Ying; Hou, Junfeng

    2016-01-01

    Nanopore-based sequencing platforms are transforming the field of genomic science. This review (containing 116 references) highlights some recent progress on nanopore-based nucleic acid analysis and sequencing. These studies are classified into three categories, biological, solid-state, and hybrid nanopores, according to their nanoporous materials. We begin with a brief description of the translocation-based detection mechanism of nanopores. Next, specific examples are given in nanopore-based nucleic acid analysis and sequencing, with an emphasis on identifying strategies that can improve the resolution of nanopores. This review concludes with a discussion of future research directions that will advance the practical applications of nanopore technology. (author)

  11. A base composition analysis of natural patterns for the preprocessing of metagenome sequences.

    Bonham-Carter, Oliver; Ali, Hesham; Bastola, Dhundy

    2013-01-01

    On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.

  12. Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikos; Vinther, Jeppe

    2015-01-01

    time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data...

  13. Image registration based on virtual frame sequence analysis

    Chen, H.; Ng, W.S. [Nanyang Technological University, Computer Integrated Medical Intervention Laboratory, School of Mechanical and Aerospace Engineering, Singapore (Singapore); Shi, D. (Nanyang Technological University, School of Computer Engineering, Singapore, Singpore); Wee, S.B. [Tan Tock Seng Hospital, Department of General Surgery, Singapore (Singapore)

    2007-08-15

    This paper is to propose a new framework for medical image registration with large nonrigid deformations, which still remains one of the biggest challenges for image fusion and further analysis in many medical applications. Registration problem is formulated as to recover a deformation process with the known initial state and final state. To deal with large nonlinear deformations, virtual frames are proposed to be inserted to model the deformation process. A time parameter is introduced and the deformation between consecutive frames is described with a linear affine transformation. Experiments are conducted with simple geometric deformation as well as complex deformations presented in MRI and ultrasound images. All the deformations are characterized with nonlinearity. The positive results demonstrated the effectiveness of this algorithm. The framework proposed in this paper is feasible to register medical images with large nonlinear deformations and is especially useful for sequential images. (orig.)

  14. Establishment of screening technique for mutant cell and analysis of base sequence in the mutation

    Sofuni, Toshio; Nomi, Takehiko; Yamada, Masami; Masumura, Kenichi

    2000-01-01

    This research project aimed to establish an easy and quick detection method for radiation-induced mutation using molecular-biological techniques and an effective analyzing method for the molecular changes in base sequence. In this year, Spi mutants derived from γ-radiation exposed mouse were analyzed by PCR method and DNA sequence method. Male transgenic mice were exposed to γ-ray at 5,10, 50 Gy and the transgene was taken out from the genome DNA from the spleen in vivo packaging method. Spi mutant plaques were obtained by infecting the recovered phage to E. coli. Sequence analysis for the mutants was made using ALFred DNA sequencer and SequiTherm TM Long-Red Cycle sequencing kit. Sequence analysis was carried out for 41 of 50 independent Spi mutants obtained. The deletions were classified into 4 groups; Group 1 included 15 mutants that were characterized with a large deletion (43 bp-10 kb) with a short homologous sequence. Group 2 included 11 mutants of a large deletion having no homologous sequence at the connecting region. Group 3 included 11 mutants having a short deletion of less than 20 bp, which occurred in the non-repetitive sequence of gam gene and possibly caused by oxidative breakage of DNA or recombination of DNA fragment produced by the breakage. Group 4 included 4 mutants having deletions as short as 20 bp or less in the repetitive sequence of gam gene, resulting in an alteration of the reading frame. Thus, the synthesis of Gam protein was terminated by the appearance of TGA between code 13 and 14 of redB gene, leading to inactivation of gam gene and redBA gene. These results indicated that most of Spi mutants had a deletion in red/gam region and the deletions in more than half mutants occurred in homologous sequences as short as 8 bp. (M.N.)

  15. An Analysis of Delay-based and Integrator-based Sequence Detectors for Grid-Connected Converters

    Khazraj, Hesam; Silva, Filipe Miguel Faria da; Bak, Claus Leth

    2017-01-01

    -signal cancellation operators are the main members of the delay-based sequence detectors. The aim of this paper is to provide a theoretical and experimental comparative study between integrator and delay based sequence detectors. The theoretical analysis is conducted based on the small-signal modelling......Detecting and separating positive and negative sequence components of the grid voltage or current is of vital importance in the control of grid-connected power converters, HVDC systems, etc. To this end, several techniques have been proposed in recent years. These techniques can be broadly...... classified into two main classes: The integrator-based techniques and Delay-based techniques. The complex-coefficient filter-based technique, dual second-order generalized integrator-based method, multiple reference frame approach are the main members of the integrator-based sequence detector and the delay...

  16. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  17. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  18. Estimation of physiological parameters using knowledge-based factor analysis of dynamic nuclear medicine image sequences

    Yap, J.T.; Chen, C.T.; Cooper, M.

    1995-01-01

    The authors have previously developed a knowledge-based method of factor analysis to analyze dynamic nuclear medicine image sequences. In this paper, the authors analyze dynamic PET cerebral glucose metabolism and neuroreceptor binding studies. These methods have shown the ability to reduce the dimensionality of the data, enhance the image quality of the sequence, and generate meaningful functional images and their corresponding physiological time functions. The new information produced by the factor analysis has now been used to improve the estimation of various physiological parameters. A principal component analysis (PCA) is first performed to identify statistically significant temporal variations and remove the uncorrelated variations (noise) due to Poisson counting statistics. The statistically significant principal components are then used to reconstruct a noise-reduced image sequence as well as provide an initial solution for the factor analysis. Prior knowledge such as the compartmental models or the requirement of positivity and simple structure can be used to constrain the analysis. These constraints are used to rotate the factors to the most physically and physiologically realistic solution. The final result is a small number of time functions (factors) representing the underlying physiological processes and their associated weighting images representing the spatial localization of these functions. Estimation of physiological parameters can then be performed using the noise-reduced image sequence generated from the statistically significant PCs and/or the final factor images and time functions. These results are compared to the parameter estimation using standard methods and the original raw image sequences. Graphical analysis was performed at the pixel level to generate comparable parametric images of the slope and intercept (influx constant and distribution volume)

  19. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  20. Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.

    Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2014-05-23

    The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.

  1. Phylogenetic analysis of Demodex caprae based on mitochondrial 16S rDNA sequence.

    Zhao, Ya-E; Hu, Li; Ma, Jun-Xian

    2013-11-01

    Demodex caprae infests the hair follicles and sebaceous glands of goats worldwide, which not only seriously impairs goat farming, but also causes a big economic loss. However, there are few reports on the DNA level of D. caprae. To reveal the taxonomic position of D. caprae within the genus Demodex, the present study conducted phylogenetic analysis of D. caprae based on mt16S rDNA sequence data. D. caprae adults and eggs were obtained from a skin nodule of the goat suffering demodicidosis. The mt16S rDNA sequences of individual mite were amplified using specific primers, and then cloned, sequenced, and aligned. The sequence divergence, genetic distance, and transition/transversion rate were computed, and the phylogenetic trees in Demodex were reconstructed. Results revealed the 339-bp partial sequences of six D. caprae isolates were obtained, and the sequence identity was 100% among isolates. The pairwise divergences between D. caprae and Demodex canis or Demodex folliculorum or Demodex brevis were 22.2-24.0%, 24.0-24.9%, and 22.9-23.2%, respectively. The corresponding average genetic distances were 2.840, 2.926, and 2.665, and the average transition/transversion rates were 0.70, 0.55, and 0.54, respectively. The divergences, genetic distances, and transition/transversion rates of D. caprae versus the other three species all reached interspecies level. The five phylogenetic trees all presented that D. caprae clustered with D. brevis first, and then with D. canis, D. folliculorum, and Demodex injai in sequence. In conclusion, D. caprae is an independent species, and it is closer to D. brevis than to D. canis, D. folliculorum, or D. injai.

  2. MetaSeq: privacy preserving meta-analysis of sequencing-based association studies.

    Singh, Angad Pal; Zafer, Samreen; Pe'er, Itsik

    2013-01-01

    Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for metaanalysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exome-sequencing data from multiple studies, simulating phenotypic information for powerful meta-analysis. The MetaSeq software is publicly available as open source.

  3. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

    2018-01-04

    Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  5. Sequence-based analysis of the microbial composition of water kefir from multiple sources.

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2013-11-01

    Water kefir is a water-sucrose-based beverage, fermented by a symbiosis of bacteria and yeast to produce a final product that is lightly carbonated, acidic and that has a low alcohol percentage. The microorganisms present in water kefir are introduced via water kefir grains, which consist of a polysaccharide matrix in which the microorganisms are embedded. We aimed to provide a comprehensive sequencing-based analysis of the bacterial population of water kefir beverages and grains, while providing an initial insight into the corresponding fungal population. To facilitate this objective, four water kefirs were sourced from the UK, Canada and the United States. Culture-independent, high-throughput, sequencing-based analyses revealed that the bacterial fraction of each water kefir and grain was dominated by Zymomonas, an ethanol-producing bacterium, which has not previously been detected at such a scale. The other genera detected were representatives of the lactic acid bacteria and acetic acid bacteria. Our analysis of the fungal component established that it was comprised of the genera Dekkera, Hanseniaspora, Saccharomyces, Zygosaccharomyces, Torulaspora and Lachancea. This information will assist in the ultimate identification of the microorganisms responsible for the potentially health-promoting attributes of these beverages. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  6. Biological sequence analysis

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  7. Retrospective Evaluations of Sequences: Testing the Predictions of a Memory-Based Analysis.

    Aldrovandi, Silvio; Poirier, Marie; Kusev, Petko; Ayton, Peter

    2015-01-01

    Retrospective evaluation (RE) of event sequences is known to be biased in various ways. The present paper presents a series of studies that examined the suggestion that the moments that are the most accessible in memory at the point of RE contribute to these biases. As predicted by this memory-based analysis, Experiment 1 showed that pleasantness ratings of word lists were biased by the presentation position of a negative item and by how easy the negative information was to retrieve. Experiment 2 ruled out the hypothesis that these findings were due to the dual nature of the task called upon. Experiment 3 further manipulated the memorability of the negative items--and corresponding changes in RE were as predicted. Finally, Experiment 4 extended the findings to more complex stimuli involving event narratives. Overall, the results suggest that assessments were adjusted based on the retrieval of the most readily available information.

  8. Identification of somatic mutations in cancer through Bayesian-based analysis of sequenced genome pairs.

    Christoforides, Alexis; Carpten, John D; Weiss, Glen J; Demeure, Michael J; Von Hoff, Daniel D; Craig, David W

    2013-05-04

    The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.

  9. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  10. Geographic Distribution of Leishmania Species in Ecuador Based on the Cytochrome B Gene Sequence Analysis

    Kato, Hirotomo; Gomez, Eduardo A.; Martini-Robles, Luiggi; Muzzio, Jenny; Velez, Lenin; Calvopiña, Manuel; Romero-Alvarez, Daniel; Mimori, Tatsuyuki; Uezato, Hiroshi; Hashiguchi, Yoshihisa

    2016-01-01

    A countrywide epidemiological study was performed to elucidate the current geographic distribution of causative species of cutaneous leishmaniasis (CL) in Ecuador by using FTA card-spotted samples and smear slides as DNA sources. Putative Leishmania in 165 samples collected from patients with CL in 16 provinces of Ecuador were examined at the species level based on the cytochrome b gene sequence analysis. Of these, 125 samples were successfully identified as Leishmania (Viannia) guyanensis, L. (V.) braziliensis, L. (V.) naiffi, L. (V.) lainsoni, and L. (Leishmania) mexicana. Two dominant species, L. (V.) guyanensis and L. (V.) braziliensis, were widely distributed in Pacific coast subtropical and Amazonian tropical areas, respectively. Recently reported L. (V.) naiffi and L. (V.) lainsoni were identified in Amazonian areas, and L. (L.) mexicana was identified in an Andean highland area. Importantly, the present study demonstrated that cases of L. (V.) braziliensis infection are increasing in Pacific coast areas. PMID:27410039

  11. Geographic Distribution of Leishmania Species in Ecuador Based on the Cytochrome B Gene Sequence Analysis.

    Kato, Hirotomo; Gomez, Eduardo A; Martini-Robles, Luiggi; Muzzio, Jenny; Velez, Lenin; Calvopiña, Manuel; Romero-Alvarez, Daniel; Mimori, Tatsuyuki; Uezato, Hiroshi; Hashiguchi, Yoshihisa

    2016-07-01

    A countrywide epidemiological study was performed to elucidate the current geographic distribution of causative species of cutaneous leishmaniasis (CL) in Ecuador by using FTA card-spotted samples and smear slides as DNA sources. Putative Leishmania in 165 samples collected from patients with CL in 16 provinces of Ecuador were examined at the species level based on the cytochrome b gene sequence analysis. Of these, 125 samples were successfully identified as Leishmania (Viannia) guyanensis, L. (V.) braziliensis, L. (V.) naiffi, L. (V.) lainsoni, and L. (Leishmania) mexicana. Two dominant species, L. (V.) guyanensis and L. (V.) braziliensis, were widely distributed in Pacific coast subtropical and Amazonian tropical areas, respectively. Recently reported L. (V.) naiffi and L. (V.) lainsoni were identified in Amazonian areas, and L. (L.) mexicana was identified in an Andean highland area. Importantly, the present study demonstrated that cases of L. (V.) braziliensis infection are increasing in Pacific coast areas.

  12. Geographic Distribution of Leishmania Species in Ecuador Based on the Cytochrome B Gene Sequence Analysis.

    Hirotomo Kato

    2016-07-01

    Full Text Available A countrywide epidemiological study was performed to elucidate the current geographic distribution of causative species of cutaneous leishmaniasis (CL in Ecuador by using FTA card-spotted samples and smear slides as DNA sources. Putative Leishmania in 165 samples collected from patients with CL in 16 provinces of Ecuador were examined at the species level based on the cytochrome b gene sequence analysis. Of these, 125 samples were successfully identified as Leishmania (Viannia guyanensis, L. (V. braziliensis, L. (V. naiffi, L. (V. lainsoni, and L. (Leishmania mexicana. Two dominant species, L. (V. guyanensis and L. (V. braziliensis, were widely distributed in Pacific coast subtropical and Amazonian tropical areas, respectively. Recently reported L. (V. naiffi and L. (V. lainsoni were identified in Amazonian areas, and L. (L. mexicana was identified in an Andean highland area. Importantly, the present study demonstrated that cases of L. (V. braziliensis infection are increasing in Pacific coast areas.

  13. VisRseq: R-based visual framework for analysis of sequencing data

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven JM

    2015-01-01

    Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for ...

  14. Typing of canine parvovirus isolates using mini-sequencing based single nucleotide polymorphism analysis.

    Naidu, Hariprasad; Subramanian, B Mohana; Chinchkar, Shankar Ramchandra; Sriraman, Rajan; Rana, Samir Kumar; Srinivasan, V A

    2012-05-01

    The antigenic types of canine parvovirus (CPV) are defined based on differences in the amino acids of the major capsid protein VP2. Type specificity is conferred by a limited number of amino acid changes and in particular by few nucleotide substitutions. PCR based methods are not particularly suitable for typing circulating variants which differ in a few specific nucleotide substitutions. Assays for determining SNPs can detect efficiently nucleotide substitutions and can thus be adapted to identify CPV types. In the present study, CPV typing was performed by single nucleotide extension using the mini-sequencing technique. A mini-sequencing signature was established for all the four CPV types (CPV2, 2a, 2b and 2c) and feline panleukopenia virus. The CPV typing using the mini-sequencing reaction was performed for 13 CPV field isolates and the two vaccine strains available in our repository. All the isolates had been typed earlier by full-length sequencing of the VP2 gene. The typing results obtained from mini-sequencing matched completely with that of sequencing. Typing could be achieved with less than 100 copies of standard plasmid DNA constructs or ≤10¹ FAID₅₀ of virus by mini-sequencing technique. The technique was also efficient for detecting multiple types in mixed infections. Copyright © 2012 Elsevier B.V. All rights reserved.

  15. System risk evolution analysis and risk critical event identification based on event sequence diagram

    Luo, Pengcheng; Hu, Yang

    2013-01-01

    During system operation, the environmental, operational and usage conditions are time-varying, which causes the fluctuations of the system state variables (SSVs). These fluctuations change the accidents’ probabilities and then result in the system risk evolution (SRE). This inherent relation makes it feasible to realize risk control by monitoring the SSVs in real time, herein, the quantitative analysis of SRE is essential. Besides, some events in the process of SRE are critical to system risk, because they act like the “demarcative points” of safety and accident, and this characteristic makes each of them a key point of risk control. Therefore, analysis of SRE and identification of risk critical events (RCEs) are remarkably meaningful to ensure the system to operate safely. In this context, an event sequence diagram (ESD) based method of SRE analysis and the related Monte Carlo solution are presented; RCE and risk sensitive variable (RSV) are defined, and the corresponding identification methods are also proposed. Finally, the proposed approaches are exemplified with an accident scenario of an aircraft getting into the icing region

  16. Direct, rapid RNA sequence analysis

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  17. A stochastic context free grammar based framework for analysis of protein sequences

    Nebel Jean-Christophe

    2009-10-01

    Full Text Available Abstract Background In the last decade, there have been many applications of formal language theory in bioinformatics such as RNA structure prediction and detection of patterns in DNA. However, in the field of proteomics, the size of the protein alphabet and the complexity of relationship between amino acids have mainly limited the application of formal language theory to the production of grammars whose expressive power is not higher than stochastic regular grammars. However, these grammars, like other state of the art methods, cannot cover any higher-order dependencies such as nested and crossing relationships that are common in proteins. In order to overcome some of these limitations, we propose a Stochastic Context Free Grammar based framework for the analysis of protein sequences where grammars are induced using a genetic algorithm. Results This framework was implemented in a system aiming at the production of binding site descriptors. These descriptors not only allow detection of protein regions that are involved in these sites, but also provide insight in their structure. Grammars were induced using quantitative properties of amino acids to deal with the size of the protein alphabet. Moreover, we imposed some structural constraints on grammars to reduce the extent of the rule search space. Finally, grammars based on different properties were combined to convey as much information as possible. Evaluation was performed on sites of various sizes and complexity described either by PROSITE patterns, domain profiles or a set of patterns. Results show the produced binding site descriptors are human-readable and, hence, highlight biologically meaningful features. Moreover, they achieve good accuracy in both annotation and detection. In addition, findings suggest that, unlike current state-of-the-art methods, our system may be particularly suited to deal with patterns shared by non-homologous proteins. Conclusion A new Stochastic Context Free

  18. Antibody-based screening for hereditary nonpolyposis colorectal carcinoma compared with microsatellite analysis and sequencing

    Christensen, Mariann; Katballe, Niels; Wikman, Friedrik

    2002-01-01

    BACKGROUND: Germline mutations in the DNA mismatch repair genes, MSH2, MLH1, and others are associated with hereditary nonpolyposis colorectal cancer (HNPCC). Due to the high costs of sequencing, cheaper screening methods are needed to identify HNPCC cases. Ideally, these methods should have a high...... carcinoma of whom 11 met the Amsterdam criteria and 31 were suspected to belong to HNPCC families. Thirty-five patients were examined by microsatellite analysis, 40 by immunohistochemical staining, and in 31 patients both the MLH1 and MSH2 genes were sequenced. RESULTS: Ninety-two percent of patients...... the three methods was found in 74 % of the tumors. CONCLUSIONS: The authors suggest that immunohistochemistry should be used in combination with microsatellite analysis to prescreen suspected HNPCC patients for the selection of cases where sequencing of the MLH1 and MSH2 mismatch repair genes is indicated....

  19. Sequence-based analysis of the bacterial and fungal compositions of multiple kombucha (tea fungus) samples.

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2014-04-01

    Kombucha is a sweetened tea beverage that, as a consequence of fermentation, contains ethanol, carbon dioxide, a high concentration of acid (gluconic, acetic and lactic) as well as a number of other metabolites and is thought to contain a number of health-promoting components. The sucrose-tea solution is fermented by a symbiosis of bacteria and yeast embedded within a cellulosic pellicle, which forms a floating mat in the tea, and generates a new layer with each successful fermentation. The specific identity of the microbial populations present has been the focus of attention but, to date, the majority of studies have relied on culture-based analyses. To gain a more comprehensive insight into the kombucha microbiota we have carried out the first culture-independent, high-throughput sequencing analysis of the bacterial and fungal populations of 5 distinct pellicles as well as the resultant fermented kombucha at two time points. Following the analysis it was established that the major bacterial genus present was Gluconacetobacter, present at >85% in most samples, with only trace populations of Acetobacter detected (kombucha, also being revealed. The yeast populations were found to be dominated by Zygosaccharomyces at >95% in the fermented beverage, with a greater fungal diversity present in the cellulosic pellicle, including numerous species not identified in kombucha previously. Ultimately, this study represents the most accurate description of the microbiology of kombucha to date. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. Image sequence analysis

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  2. Security Analysis of a Block Encryption Algorithm Based on Dynamic Sequences of Multiple Chaotic Systems

    Du, Mao-Kang; He, Bo; Wang, Yong

    2011-01-01

    Recently, the cryptosystem based on chaos has attracted much attention. Wang and Yu (Commun. Nonlin. Sci. Numer. Simulat. 14 (2009) 574) proposed a block encryption algorithm based on dynamic sequences of multiple chaotic systems. We analyze the potential flaws in the algorithm. Then, a chosen-plaintext attack is presented. Some remedial measures are suggested to avoid the flaws effectively. Furthermore, an improved encryption algorithm is proposed to resist the attacks and to keep all the merits of the original cryptosystem.

  3. Sequence and 3D structure based analysis of TNT degrading proteins in Arabidopsis thaliana.

    Bhattacherjee, Amrita; Mandal, Rahul Shubhra; Das, Santasabuj; Kundu, Sudip

    2014-03-01

    TNT, accidentally released at several manufacturing sites, contaminates ground water and soil. It has a toxic effect to algae and invertebrate, and chronic exposure to TNT also causes harmful effects to human. On the other hand, many plants including Arabidopsis thaliana have the ability to metabolize TNT either completely or at least to a reduced less toxic form. In A. thaliana, the enzyme UDP glucosyltransferase (UDPGT) can further conjugate the reduced forms 2-HADNT and 4-HADNT (2-hydroxylamino-4, 6- dinitrotoluene and 4-hydroxylamino-2, 6- dinitrotoluene) of TNT. Based on the experimental analysis, existing literature and phylogenetic analysis, it is evident that among 107 UDPGT proteins only six are involved in the TNT degrading process. A total of 13 UDPGT proteins including five of these TNT degrading proteins fall within the same group of phylogeny. Thus, these 13 UDPGT proteins have been classified into two groups, TNT-degrading and TNT-non-degrading proteins. To understand the differences in TNT-degrading capacities; using homology modeling we first predicted two structures, taking one representative sequence from both the groups. Next, we performed molecular docking of the modeled structure and TNT reduced form 2-hydroxylamino-4, 6- dinitrotoluene (2-HADNT). We observed that while the Trp residue located within the active site region of the TNT- degrading protein showed π-Cation interaction; such type of interaction was absent in TNT-non-degrading protein, as the respective Trp residue lay outside of the pocket in this case. We observed the conservation of this π-Cation interaction during MD simulation of TNT-degrading protein. Thus, the position and the orientation of the active site residue Trp could explain the presence and absence of TNT-degrading capacity of the UDPGT proteins.

  4. VisRseq: R-based visual framework for analysis of sequencing data.

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven J M

    2015-01-01

    Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.

  5. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  6. High Throughput Sample Preparation and Analysis for DNA Sequencing, PCR and Combinatorial Screening of Catalysis Based on Capillary Array Technique

    Zhang, Yonghua [Iowa State Univ., Ames, IA (United States)

    2000-01-01

    Sample preparation has been one of the major bottlenecks for many high throughput analyses. The purpose of this research was to develop new sample preparation and integration approach for DNA sequencing, PCR based DNA analysis and combinatorial screening of homogeneous catalysis based on multiplexed capillary electrophoresis with laser induced fluorescence or imaging UV absorption detection. The author first introduced a method to integrate the front-end tasks to DNA capillary-array sequencers. protocols for directly sequencing the plasmids from a single bacterial colony in fused-silica capillaries were developed. After the colony was picked, lysis was accomplished in situ in the plastic sample tube using either a thermocycler or heating block. Upon heating, the plasmids were released while chromsomal DNA and membrane proteins were denatured and precipitated to the bottom of the tube. After adding enzyme and Sanger reagents, the resulting solution was aspirated into the reaction capillaries by a syringe pump, and cycle sequencing was initiated. No deleterious effect upon the reaction efficiency, the on-line purification system, or the capillary electrophoresis separation was observed, even though the crude lysate was used as the template. Multiplexed on-line DNA sequencing data from 8 parallel channels allowed base calling up to 620 bp with an accuracy of 98%. The entire system can be automatically regenerated for repeated operation. For PCR based DNA analysis, they demonstrated that capillary electrophoresis with UV detection can be used for DNA analysis starting from clinical sample without purification. After PCR reaction using cheek cell, blood or HIV-1 gag DNA, the reaction mixtures was injected into the capillary either on-line or off-line by base stacking. The protocol was also applied to capillary array electrophoresis. The use of cheaper detection, and the elimination of purification of DNA sample before or after PCR reaction, will make this approach an

  7. Next Generation Sequencing-Based Analysis of Repetitive DNA in the Model Dioceous Plant Silene latifolia

    Macas, Jiří; Kejnovský, Eduard; Neumann, Pavel; Novák, Petr; Koblížková, Andrea; Vyskot, Boris

    2011-01-01

    Roč. 6, č. 11 (2011), e27335 E-ISSN 1932-6203 R&D Projects: GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004; GA MŠk(CZ) LH11058; GA ČR(CZ) GAP501/10/0102; GA ČR(CZ) GAP305/10/0930 Institutional research plan: CEZ:AV0Z50510513; CEZ:AV0Z50040702 Keywords : Plant genome * Sequencing-Based Analyses * Repetitive DNA * Silene latifolia Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.092, year: 2011

  8. Automated Clustering Analysis of Immunoglobulin Sequences in Chronic Lymphocytic Leukemia Based on 3D Structural Descriptors

    Marcatili, Paolo; Mochament, Konstantinos; Agathangelidis, Andreas

    2016-01-01

    study, we used the structure prediction tools PIGS and I-TASSER for creating the 3D models and the TM-align algorithm to superpose them. The innovation of the current methodology resides in the usage of methods adapted from 3D content-based search methodologies to determine the local structural...... determine it are extremely laborious and demanding. Hence, the ability to gain insight into the structure of Igs at large relies on the availability of tools and algorithms for producing accurate Ig structural models based on their primary sequence alone. These models can then be used to determine...... to achieve an optimal solution to this task yet their results were hindered mainly due to the lack of efficient clustering methods based on the similarity of 3D structure descriptors. Here, we present a novel workflow for robust Ig 3D modeling and automated clustering. We validated our protocol in chronic...

  9. A phylogenetic analysis of Diurideae (Orchidaceae) based on plastid DNA sequence data.

    Kores, P J; Molvray, M; Weston, P H; Hopper, S D; Brown, A P; Cameron, K M; Chase, M W

    2001-10-01

    DNA sequence data from plastid matK and trnL-F regions were used in phylogenetic analyses of Diurideae, which indicate that Diurideae are not monophyletic as currently delimited. However, if Chloraeinae and Pterostylidinae are excluded from Diurideae, the remaining subtribes form a well-supported, monophyletic group that is sister to a "spiranthid" clade. Chloraea, Gavilea, and Megastylis pro parte (Chloraeinae) are all placed among the spiranthid orchids and form a grade with Pterostylis leading to a monophyletic Cranichideae. Codonorchis, previously included among Chloraeinae, is sister to Orchideae. Within the more narrowly delimited Diurideae two major lineages are apparent. One includes Diuridinae, Cryptostylidinae, Thelymitrinae, and an expanded Drakaeinae; the other includes Caladeniinae s.s., Prasophyllinae, and Acianthinae. The achlorophyllous subtribe Rhizanthellinae is a member of Diurideae, but its placement is otherwise uncertain. The sequence-based trees indicate that some morphological characters used in previous classifications, such as subterranean storage organs, anther position, growth habit, fungal symbionts, and pollination syndromes have more complex evolutionary histories than previously hypothesized. Treatments based upon these characters have produced conflicting classifications, and molecular data offer a tool for reevaluating these phylogenetic hypotheses.

  10. Genotype, phenotype and in silico pathogenicity analysis of HEXB mutations: Panel based sequencing for differential diagnosis of gangliosidosis.

    Mahdieh, Nejat; Mikaeeli, Sahar; Tavasoli, Ali Reza; Rezaei, Zahra; Maleki, Majid; Rabbani, Bahareh

    2018-04-01

    Gangliosidosis is an inherited metabolic disorder causing neurodegeneration and motor regression. Preventive diagnosis is the first choice for the affected families due to lack of straightforward therapy. Genetic studies could confirm the diagnosis and help families for carrier screening and prenatal diagnosis. An update of HEXB gene variants concerning genotype, phenotype and in silico analysis are presented. Panel based next generation sequencing and direct sequencing of four cases were performed to confirm the clinical diagnosis and for reproductive planning. Bioinformatic analyses of the HEXB mutation database were also performed. Direct sequencing of HEXA and HEXB genes showed recurrent homozygous variants at c.509G>A (p.Arg170Gln) and c.850C>T (p.Arg284Ter), respectively. A novel variant at c.416T>A (p.Leu139Gln) was identified in the GLB1 gene. Panel based next generation sequencing was performed for an undiagnosed patient which showed a novel mutation at c.1602C>A (p.Cys534Ter) of HEXB gene. Bioinformatic analysis of the HEXB mutation database showed 97% consistency of in silico genotype analysis with the phenotype. Bioinformatic analysis of the novel variants predicted to be disease causing. In silico structural and functional analysis of the novel variants showed structural effect of HEXB and functional effect of GLB1 variants which would provide fast analysis of novel variants. Panel based studies could be performed for overlapping symptomatic patients. Consequently, genetic testing would help affected families for patients' management, carrier detection, and family planning's. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Event-sequence time series analysis in ground-based gamma-ray astronomy

    Barres de Almeida, U.; Chadwick, P.; Daniel, M.; Nolan, S.; McComb, L.

    2008-01-01

    The recent, extreme episodes of variability detected from Blazars by the leading atmospheric Cerenkov experiments motivate the development and application of specialized statistical techniques that enable the study of this rich data set to its furthest extent. The identification of the shortest variability timescales supported by the data and the actual variability structure observed in the light curves of these sources are some of the fundamental aspects being studied, that answers can bring new developments on the understanding of the physics of these objects and on the mechanisms of production of VHE gamma-rays in the Universe. Some of our efforts in studying the time variability of VHE sources involve the application of dynamic programming algorithms to the problem of detecting change-points in a Poisson sequence. In this particular paper we concentrate on the more primary issue of the applicability of counting statistics to the analysis of time-series on VHE gamma-ray astronomy.

  12. Analysis of genetically modified organisms by pyrosequencing on a portable photodiode-based bioluminescence sequencer.

    Song, Qinxin; Wei, Guijiang; Zhou, Guohua

    2014-07-01

    A portable bioluminescence analyser for detecting the DNA sequence of genetically modified organisms (GMOs) was developed by using a photodiode (PD) array. Pyrosequencing on eight genes (zSSIIb, Bt11 and Bt176 gene of genetically modified maize; Lectin, 35S-CTP4, CP4EPSPS, CaMV35S promoter and NOS terminator of the genetically modified Roundup ready soya) was successfully detected with this instrument. The corresponding limit of detection (LOD) was 0.01% with 35 PCR cycles. The maize and soya available from three different provenances in China were detected. The results indicate that pyrosequencing using the small size of the detector is a simple, inexpensive, and reliable way in a farm/field test of GMO analysis. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. STING Millennium: a web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence

    Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.

    2003-01-01

    STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333

  14. A rank-based sequence aligner with applications in phylogenetic analysis.

    Liviu P Dinu

    Full Text Available Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD. The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.

  15. Sequence analysis of annually normalized citation counts: an empirical analysis based on the characteristic scores and scales (CSS) method.

    Bornmann, Lutz; Ye, Adam Y; Ye, Fred Y

    2017-01-01

    In bibliometrics, only a few publications have focused on the citation histories of publications, where the citations for each citing year are assessed. In this study, therefore, annual categories of field- and time-normalized citation scores (based on the characteristic scores and scales method: 0 = poorly cited, 1 = fairly cited, 2 = remarkably cited, and 3 = outstandingly cited) are used to study the citation histories of papers. As our dataset, we used all articles published in 2000 and their annual citation scores until 2015. We generated annual sequences of citation scores (e.g., [Formula: see text]) and compared the sequences of annual citation scores of six broader fields (natural sciences, engineering and technology, medical and health sciences, agricultural sciences, social sciences, and humanities). In agreement with previous studies, our results demonstrate that sequences with poorly cited (0) and fairly cited (1) elements dominate the publication set; sequences with remarkably cited (3) and outstandingly cited (4) periods are rare. The highest percentages of constantly poorly cited papers can be found in the social sciences; the lowest percentages are in the agricultural sciences and humanities. The largest group of papers with remarkably cited (3) and/or outstandingly cited (4) periods shows an increasing impact over the citing years with the following orders of sequences: [Formula: see text] (6.01%), which is followed by [Formula: see text] (1.62%). Only 0.11% of the papers ( n  = 909) are constantly on the outstandingly cited level.

  16. Next generation sequencing based transcriptome analysis of septic-injury responsive genes in the beetle Tribolium castaneum.

    Boran Altincicek

    Full Text Available Beetles (Coleoptera are the most diverse animal group on earth and interact with numerous symbiotic or pathogenic microbes in their environments. The red flour beetle Tribolium castaneum is a genetically tractable model beetle species and its whole genome sequence has recently been determined. To advance our understanding of the molecular basis of beetle immunity here we analyzed the whole transcriptome of T. castaneum by high-throughput next generation sequencing technology. Here, we demonstrate that the Illumina/Solexa sequencing approach of cDNA samples from T. castaneum including over 9.7 million reads with 72 base pairs (bp length (approximately 700 million bp sequence information with about 30× transcriptome coverage confirms the expression of most predicted genes and enabled subsequent qualitative and quantitative transcriptome analysis. This approach recapitulates our recent quantitative real-time PCR studies of immune-challenged and naïve T. castaneum beetles, validating our approach. Furthermore, this sequencing analysis resulted in the identification of 73 differentially expressed genes upon immune-challenge with statistical significance by comparing expression data to calculated values derived by fitting to generalized linear models. We identified up regulation of diverse immune-related genes (e.g. Toll receptor, serine proteinases, DOPA decarboxylase and thaumatin and of numerous genes encoding proteins with yet unknown functions. Of note, septic-injury resulted also in the elevated expression of genes encoding heat-shock proteins or cytochrome P450s supporting the view that there is crosstalk between immune and stress responses in T. castaneum. The present study provides a first comprehensive overview of septic-injury responsive genes in T. castaneum beetles. Identified genes advance our understanding of T. castaneum specific gene expression alteration upon immune-challenge in particular and may help to understand beetle immunity

  17. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology.

    Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi

    2012-07-02

    Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  18. Transcriptome analysis of carnation (Dianthus caryophyllus L. based on next-generation sequencing technology

    Tanase Koji

    2012-07-01

    Full Text Available Abstract Background Carnation (Dianthus caryophyllus L., in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380 of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  19. Development of SSR Markers Based on Transcriptome Sequencing and Association Analysis with Drought Tolerance in Perennial Grass Miscanthus from China

    Gang Nie

    2017-05-01

    Full Text Available Drought has become a critical environmental stress affecting on plant in temperate area. As one of the promising bio-energy crops to sustainable biomass production, the genus Miscanthus has been widely studied around the world. However, the most widely used hybrid cultivar among this genus, Miscanthus × giganteus is proved poor drought tolerance compared to some parental species. Here we mainly focused on Miscanthus sinensis, which is one of the progenitors of M. × giganteus providing a comparable yield and well abiotic stress tolerance in some places. The main objectives were to characterize the physiological and photosynthetic respond to drought stress and to develop simple sequence repeats (SSRs markers associated with drought tolerance by transcriptome sequencing within an originally collection of 44 Miscanthus genotypes from southwest China. Significant phenotypic differences were observed among genotypes, and the average of leaf relative water content (RWC were severely affected by drought stress decreasing from 88.27 to 43.21%, which could well contribute to separating the drought resistant and drought sensitive genotype of Miscanthus. Furthermore, a total of 16,566 gene-associated SSRs markers were identified based on Illumina RNA sequencing under drought conditions, and 93 of them were randomly selected to validate. In total, 70 (75.3% SSRs were successfully amplified and the generated loci from 30 polymorphic SSRs were used to estimate the genetic differentiation and population structure. Finally, two optimum subgroups of the population were determined by structure analysis and based on association analysis, seven significant associations were identified including two markers with leaf RWC and five markers with photosynthetic traits. With the rich sequencing resources annotation, such associations would serve an efficient tool for Miscanthus drought response mechanism study and facilitate genetic improvement of drought resistant for

  20. Multifractal Detrended Fluctuation Analysis of Regional Precipitation Sequences Based on the CEEMDAN-WPT

    Liu, Dong; Cheng, Chen; Fu, Qiang; Liu, Chunlei; Li, Mo; Faiz, Muhammad Abrar; Li, Tianxiao; Khan, Muhammad Imran; Cui, Song

    2018-03-01

    In this paper, the complete ensemble empirical mode decomposition with the adaptive noise (CEEMDAN) algorithm is introduced into the complexity research of precipitation systems to improve the traditional complexity measure method specific to the mode mixing of the Empirical Mode Decomposition (EMD) and incomplete decomposition of the ensemble empirical mode decomposition (EEMD). We combined the CEEMDAN with the wavelet packet transform (WPT) and multifractal detrended fluctuation analysis (MF-DFA) to create the CEEMDAN-WPT-MFDFA, and used it to measure the complexity of the monthly precipitation sequence of 12 sub-regions in Harbin, Heilongjiang Province, China. The results show that there are significant differences in the monthly precipitation complexity of each sub-region in Harbin. The complexity of the northwest area of Harbin is the lowest and its predictability is the best. The complexity and predictability of the middle and Midwest areas of Harbin are about average. The complexity of the southeast area of Harbin is higher than that of the northwest, middle, and Midwest areas of Harbin and its predictability is worse. The complexity of Shuangcheng is the highest and its predictability is the worst of all the studied sub-regions. We used terrain and human activity as factors to analyze the causes of the complexity of the local precipitation. The results showed that the correlations between the precipitation complexity and terrain are obvious, and the correlations between the precipitation complexity and human influence factors vary. The distribution of the precipitation complexity in this area may be generated by the superposition effect of human activities and natural factors such as terrain, general atmospheric circulation, land and sea location, and ocean currents. To evaluate the stability of the algorithm, the CEEMDAN-WPT-MFDFA was compared with the equal probability coarse graining LZC algorithm, fuzzy entropy, and wavelet entropy. The results show

  1. Deep sequencing-based transcriptome analysis of Plutella xylostella larvae parasitized by Diadegma semiclausum

    2011-01-01

    Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the

  2. Different Somatic Hypermutation Levels among Antibody Subclasses Disclosed by a New Next-Generation Sequencing-Based Antibody Repertoire Analysis

    Kazutaka Kitaura

    2017-05-01

    Full Text Available A diverse antibody repertoire is primarily generated by the rearrangement of V, D, and J genes and subsequent somatic hypermutation (SHM. Class-switch recombination (CSR produces various isotypes and subclasses with different functional properties. Although antibody isotypes and subclasses are considered to be produced by both direct and sequential CSR, it is still not fully understood how SHMs accumulate during the process in which antibody subclasses are generated. Here, we developed a new next-generation sequencing (NGS-based antibody repertoire analysis capable of identifying all antibody isotype and subclass genes and used it to examine the peripheral blood mononuclear cells of 12 healthy individuals. Using a total of 5,480,040 sequences, we compared percentage frequency of variable (V, junctional (J sequence, and a combination of V and J, diversity, length, and amino acid compositions of CDR3, SHM, and shared clones in the IgM, IgD, IgG3, IgG1, IgG2, IgG4, IgA1, IgE, and IgA2 genes. The usage and diversity were similar among the immunoglobulin (Ig subclasses. Clonally related sequences sharing identical V, D, J, and CDR3 amino acid sequences were frequently found within multiple Ig subclasses, especially between IgG1 and IgG2 or IgA1 and IgA2. SHM occurred most frequently in IgG4, while IgG3 genes were the least mutated among all IgG subclasses. The shared clones had almost the same SHM levels among Ig subclasses, while subclass-specific clones had different levels of SHM dependent on the genomic location. Given the sequential CSR, these results suggest that CSR occurs sequentially over multiple subclasses in the order corresponding to the genomic location of IGHCs, but CSR is likely to occur more quickly than SHMs accumulate within Ig genes under physiological conditions. NGS-based antibody repertoire analysis should provide critical information on how various antibodies are generated in the immune system.

  3. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  4. Deep sequencing-based analysis of the Cymbidium ensifolium floral transcriptome.

    Xiaobai Li

    Full Text Available Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs, 41,690 into 58 gene ontology (GO terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium.

  5. Complete sequence analysis of 18S rDNA based on genomic DNA extraction from individual Demodex mites (Acari: Demodicidae).

    Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang

    2012-05-01

    The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. Robustness analysis of chiller sequencing control

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  7. Sequence analysis of Leukemia DNA

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  8. Deep sequencing-based analysis of the anaerobic stimulon in Neisseria gonorrhoeae

    Clark Virginia L

    2011-01-01

    Full Text Available Abstract Background Maintenance of an anaerobic denitrification system in the obligate human pathogen, Neisseria gonorrhoeae, suggests that an anaerobic lifestyle may be important during the course of infection. Furthermore, mounting evidence suggests that reduction of host-produced nitric oxide has several immunomodulary effects on the host. However, at this point there have been no studies analyzing the complete gonococcal transcriptome response to anaerobiosis. Here we performed deep sequencing to compare the gonococcal transcriptomes of aerobically and anaerobically grown cells. Using the information derived from this sequencing, we discuss the implications of the robust transcriptional response to anaerobic growth. Results We determined that 198 chromosomal genes were differentially expressed (~10% of the genome in response to anaerobic conditions. We also observed a large induction of genes encoded within the cryptic plasmid, pJD1. Validation of RNA-seq data using translational-lacZ fusions or RT-PCR demonstrated the RNA-seq results to be very reproducible. Surprisingly, many genes of prophage origin were induced anaerobically, as well as several transcriptional regulators previously unknown to be involved in anaerobic growth. We also confirmed expression and regulation of a small RNA, likely a functional equivalent of fnrS in the Enterobacteriaceae family. We also determined that many genes found to be responsive to anaerobiosis have also been shown to be responsive to iron and/or oxidative stress. Conclusions Gonococci will be subject to many forms of environmental stress, including oxygen-limitation, during the course of infection. Here we determined that the anaerobic stimulon in gonococci was larger than previous studies would suggest. Many new targets for future research have been uncovered, and the results derived from this study may have helped to elucidate factors or mechanisms of virulence that may have otherwise been overlooked.

  9. Sequence Analysis and Characterization of Active Human Alu Subfamilies Based on the 1000 Genomes Pilot Project.

    Konkel, Miriam K; Walker, Jerilyn A; Hotard, Ashley B; Ranck, Megan C; Fontenot, Catherine C; Storer, Jessica; Stewart, Chip; Marth, Gabor T; Batzer, Mark A

    2015-08-29

    The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. An innovative experimental sequence on electromagnetic induction and eddy currents based on video analysis and cheap data acquisition

    Bonanno, A; Sapia, P; Bozzo, G

    2017-01-01

    In this work, we present a coherent sequence of experiments on electromagnetic (EM) induction and eddy currents, appropriate for university undergraduate students, based on a magnet falling through a drilled aluminum disk. The sequence, leveraging on the didactical interplay between the EM and mechanical aspects of the experiments, allows us to exploit the students’ awareness of mechanics to elicit their comprehension of EM phenomena. The proposed experiments feature two kinds of measurements: (i) kinematic measurements (performed by means of high-speed video analysis) give information on the system’s kinematics and, via appropriate numerical data processing, allow us to get dynamic information, in particular on energy dissipation; (ii) induced electromagnetic field (EMF) measurements (by using a homemade multi-coil sensor connected to a cheap data acquisition system) allow us to quantitatively determine the inductive effects of the moving magnet on its neighborhood. The comparison between experimental results and the predictions from an appropriate theoretical model (of the dissipative coupling between the moving magnet and the conducting disk) offers many educational hints on relevant topics related to EM induction, such as Maxwell’s displacement current, magnetic field flux variation, and the conceptual link between induced EMF and induced currents. Moreover, the didactical activity gives students the opportunity to be trained in video analysis, data acquisition and numerical data processing. (paper)

  11. Sequence detection analysis based on canonical correlation for steady-state visual evoked potential brain computer interfaces.

    Cao, Lei; Ju, Zhengyu; Li, Jie; Jian, Rongjun; Jiang, Changjun

    2015-09-30

    Steady-state visual evoked potential (SSVEP) has been widely applied to develop brain computer interface (BCI) systems. The essence of SSVEP recognition is to recognize the frequency component of target stimulus focused by a subject significantly present in EEG spectrum. In this paper, a novel statistical approach based on sequence detection (SD) is proposed for improving the performance of SSVEP recognition. This method uses canonical correlation analysis (CCA) coefficients to observe SSVEP signal sequence. And then, a threshold strategy is utilized for SSVEP recognition. The result showed the classification performance with the longer duration of time window achieved the higher accuracy for most subjects. And the average time costing per trial was lower than the predefined recognition time. It was implicated that our approach could improve the speed of BCI system in contrast to other methods. Comparison with existing method(s): In comparison with other resultful algorithms, experimental accuracy of SD approach was better than those using a widely used CCA-based method and two newly proposed algorithms, least absolute shrinkage and selection operator (LASSO) recognition model as well as multivariate synchronization index (MSI) method. Furthermore, the information transfer rate (ITR) obtained by SD approach was higher than those using other three methods for most participants. These conclusions demonstrated that our proposed method was promising for a high-speed online BCI. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Phylogenetic Analysis of Phytophthora Species Based on Mitochondrial and Nuclear DNA Sequences

    Kroon, L.P.N.M.; Bakker, F.T.; Bosch, van den G.B.M.; Bonants, P.J.M.; Flier, W.G.

    2004-01-01

    A molecular phylogenetic analysis of the genus Phytophthora was performed, 113 isolates from 48 Phytophthora species were included in this analysis. Phylogenetic analyses were performed on regions of mitochondrial (cytochrome c oxidase subunit 1; NADH dehydrogenase subunit 1) and nuclear gene

  13. In-depth motivic analysis based on multiparametric closed pattern and cyclic sequence mining

    Lartillot, Olivier

    2014-01-01

    presents a much simpler description and justification of this general strategy, as well as significant simplifications of the model, in particular concerning the management of pattern cyclicity. A new method for automated bundling of patterns belonging to same motivic or thematic classes is also presented....... The good performance of the method is shown through the analysis of a piece from the JKUPDD database. Ground-truth motives are detected, while additional relevant information completes the ground-truth musicological analysis. The system, implemented in Matlab, is made publicly available as part of Mining......Suite, a new open-source framework for audio and music analysis....

  14. Integrated sequence analysis. Final report

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  15. Fibonacci Sequences Quasiperiodic A5B6C7 Ferroelectric Based Photonic Crystal: FDTD analysis

    S. Simsek

    2017-10-01

    Full Text Available In this study, we present an investigation of the optical properties and band structures for the conventional and Fibonacci photonic crystals (PCs based on some A5B6C7 ferroelectrics (SbSBr and BiTeCl. Here, we use one dimensional SbSBr and BiTeCl based layers in air background. We have theoretically calculated the photonic band structure and transmission spectra of SbSBr and BiTeCl based PC superlattices. The position of minima in the transmission spectrum correlates with the gaps obtained in the calculation. The intensity of the transmission depths is more intense in the case of higher refractive index contrast between the layers. In our simulation, we employed the finite-difference time domain  technique and the plane wave expansion method, which implies the solution of Maxwell equations with centered finite-difference expressions for the space and time derivatives.

  16. Cellulase linkers are optimized based on domain type and function: insights from sequence analysis, biophysical measurements, and molecular simulation.

    Deanne W Sammond

    Full Text Available Cellulase enzymes deconstruct cellulose to glucose, and are often comprised of glycosylated linkers connecting glycoside hydrolases (GHs to carbohydrate-binding modules (CBMs. Although linker modifications can alter cellulase activity, the functional role of linkers beyond domain connectivity remains unknown. Here we investigate cellulase linkers connecting GH Family 6 or 7 catalytic domains to Family 1 or 2 CBMs, from both bacterial and eukaryotic cellulases to identify conserved characteristics potentially related to function. Sequence analysis suggests that the linker lengths between structured domains are optimized based on the GH domain and CBM type, such that linker length may be important for activity. Longer linkers are observed in eukaryotic GH Family 6 cellulases compared to GH Family 7 cellulases. Bacterial GH Family 6 cellulases are found with structured domains in either N to C terminal order, and similar linker lengths suggest there is no effect of domain order on length. O-glycosylation is uniformly distributed across linkers, suggesting that glycans are required along entire linker lengths for proteolysis protection and, as suggested by simulation, for extension. Sequence comparisons show that proline content for bacterial linkers is more than double that observed in eukaryotic linkers, but with fewer putative O-glycan sites, suggesting alternative methods for extension. Conversely, near linker termini where linkers connect to structured domains, O-glycosylation sites are observed less frequently, whereas glycines are more prevalent, suggesting the need for flexibility to achieve proper domain orientations. Putative N-glycosylation sites are quite rare in cellulase linkers, while an N-P motif, which strongly disfavors the attachment of N-glycans, is commonly observed. These results suggest that linkers exhibit features that are likely tailored for optimal function, despite possessing low sequence identity. This study suggests

  17. Segmentation of Gait Sequences in Sensor-Based Movement Analysis: A Comparison of Methods in Parkinson’s Disease

    Nooshin Haji Ghassemi

    2018-01-01

    Full Text Available Robust gait segmentation is the basis for mobile gait analysis. A range of methods have been applied and evaluated for gait segmentation of healthy and pathological gait bouts. However, a unified evaluation of gait segmentation methods in Parkinson’s disease (PD is missing. In this paper, we compare four prevalent gait segmentation methods in order to reveal their strengths and drawbacks in gait processing. We considered peak detection from event-based methods, two variations of dynamic time warping from template matching methods, and hierarchical hidden Markov models (hHMMs from machine learning methods. To evaluate the methods, we included two supervised and instrumented gait tests that are widely used in the examination of Parkinsonian gait. In the first experiment, a sequence of strides from instructed straight walks was measured from 10 PD patients. In the second experiment, a more heterogeneous assessment paradigm was used from an additional 34 PD patients, including straight walks and turning strides as well as non-stride movements. The goal of the latter experiment was to evaluate the methods in challenging situations including turning strides and non-stride movements. Results showed no significant difference between the methods for the first scenario, in which all methods achieved an almost 100% accuracy in terms of F-score. Hence, we concluded that in the case of a predefined and homogeneous sequence of strides, all methods can be applied equally. However, in the second experiment the difference between methods became evident, with the hHMM obtaining a 96% F-score and significantly outperforming the other methods. The hHMM also proved promising in distinguishing between strides and non-stride movements, which is critical for clinical gait analysis. Our results indicate that both the instrumented test procedure and the required stride segmentation algorithm have to be selected adequately in order to support and complement classical

  18. Structure-sequence based analysis for identification of conserved regions in proteins

    Zemla, Adam T; Zhou, Carol E; Lam, Marisa W; Smith, Jason R; Pardes, Elizabeth

    2013-05-28

    Disclosed are computational methods, and associated hardware and software products for scoring conservation in a protein structure based on a computationally identified family or cluster of protein structures. A method of computationally identifying a family or cluster of protein structures in also disclosed herein.

  19. Fractals in DNA sequence analysis

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  20. Phylogenomic analysis of the Chilean clade of Liolaemus lizards (Squamata: Liolaemidae based on sequence capture data

    Alejandra Panzera

    2017-10-01

    Full Text Available The genus Liolaemus is one of the most ecologically diverse and species-rich genera of lizards worldwide. It currently includes more than 250 recognized species, which have been subject to many ecological and evolutionary studies. Nevertheless, Liolaemus lizards have a complex taxonomic history, mainly due to the incongruence between morphological and genetic data, incomplete taxon sampling, incomplete lineage sorting and hybridization. In addition, as many species have restricted and remote distributions, this has hampered their examination and inclusion in molecular systematic studies. The aims of this study are to infer a robust phylogeny for a subsample of lizards representing the Chilean clade (subgenus Liolaemus sensu stricto, and to test the monophyly of several of the major species groups. We use a phylogenomic approach, targeting 541 ultra-conserved elements (UCEs and 44 protein-coding genes for 16 taxa. We conduct a comparison of phylogenetic analyses using maximum-likelihood and several species tree inference methods. The UCEs provide stronger support for phylogenetic relationships compared to the protein-coding genes; however, the UCEs outnumber the protein-coding genes by 10-fold. On average, the protein-coding genes contain over twice the number of informative sites. Based on our phylogenomic analyses, all the groups sampled are polyphyletic. Liolaemus tenuis tenuis is difficult to place in the phylogeny, because only a few loci (nine were recovered for this species. Topologies or support values did not change dramatically upon exclusion of L. t. tenuis from analyses, suggesting that missing data did not had a significant impact on phylogenetic inference in this data set. The phylogenomic analyses provide strong support for sister group relationships between L. fuscus, L. monticola, L. nigroviridis and L. nitidus, and L. platei and L. velosoi. Despite our limited taxon sampling, we have provided a reliable starting hypothesis for

  1. Integrated sequence analysis. Final report

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  2. Sequencing-Based Analysis of the Bacterial and Fungal Composition of Kefir Grains and Milks from Multiple Sources

    Marsh, Alan J.; O’Sullivan, Orla; Hill, Colin; Ross, R. Paul; Cotter, Paul D.

    2013-01-01

    Kefir is a fermented milk-based beverage to which a number of health-promoting properties have been attributed. The microbes responsible for the fermentation of milk to produce kefir consist of a complex association of bacteria and yeasts, bound within a polysaccharide matrix, known as the kefir grain. The consistency of this microbial population, and that present in the resultant beverage, has been the subject of a number of previous, almost exclusively culture-based, studies which have indicated differences depending on geographical location and culture conditions. However, culture-based identification studies are limited by virtue of only detecting species with the ability to grow on the specific medium used and thus culture-independent, molecular-based techniques offer the potential for a more comprehensive analysis of such communities. Here we describe a detailed investigation of the microbial population, both bacterial and fungal, of kefir, using high-throughput sequencing to analyse 25 kefir milks and associated grains sourced from 8 geographically distinct regions. This is the first occasion that this technology has been employed to investigate the fungal component of these populations or to reveal the microbial composition of such an extensive number of kefir grains or milks. As a result several genera and species not previously identified in kefir were revealed. Our analysis shows that the bacterial populations in kefir are dominated by 2 phyla, the Firmicutes and the Proteobacteria. It was also established that the fungal populations of kefir were dominated by the genera Kazachstania, Kluyveromyces and Naumovozyma, but that a variable sub-dominant population also exists. PMID:23894461

  3. Sequencing-based analysis of the bacterial and fungal composition of kefir grains and milks from multiple sources.

    Alan J Marsh

    Full Text Available Kefir is a fermented milk-based beverage to which a number of health-promoting properties have been attributed. The microbes responsible for the fermentation of milk to produce kefir consist of a complex association of bacteria and yeasts, bound within a polysaccharide matrix, known as the kefir grain. The consistency of this microbial population, and that present in the resultant beverage, has been the subject of a number of previous, almost exclusively culture-based, studies which have indicated differences depending on geographical location and culture conditions. However, culture-based identification studies are limited by virtue of only detecting species with the ability to grow on the specific medium used and thus culture-independent, molecular-based techniques offer the potential for a more comprehensive analysis of such communities. Here we describe a detailed investigation of the microbial population, both bacterial and fungal, of kefir, using high-throughput sequencing to analyse 25 kefir milks and associated grains sourced from 8 geographically distinct regions. This is the first occasion that this technology has been employed to investigate the fungal component of these populations or to reveal the microbial composition of such an extensive number of kefir grains or milks. As a result several genera and species not previously identified in kefir were revealed. Our analysis shows that the bacterial populations in kefir are dominated by 2 phyla, the Firmicutes and the Proteobacteria. It was also established that the fungal populations of kefir were dominated by the genera Kazachstania, Kluyveromyces and Naumovozyma, but that a variable sub-dominant population also exists.

  4. Sequencing-based analysis of the bacterial and fungal composition of kefir grains and milks from multiple sources.

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2013-01-01

    Kefir is a fermented milk-based beverage to which a number of health-promoting properties have been attributed. The microbes responsible for the fermentation of milk to produce kefir consist of a complex association of bacteria and yeasts, bound within a polysaccharide matrix, known as the kefir grain. The consistency of this microbial population, and that present in the resultant beverage, has been the subject of a number of previous, almost exclusively culture-based, studies which have indicated differences depending on geographical location and culture conditions. However, culture-based identification studies are limited by virtue of only detecting species with the ability to grow on the specific medium used and thus culture-independent, molecular-based techniques offer the potential for a more comprehensive analysis of such communities. Here we describe a detailed investigation of the microbial population, both bacterial and fungal, of kefir, using high-throughput sequencing to analyse 25 kefir milks and associated grains sourced from 8 geographically distinct regions. This is the first occasion that this technology has been employed to investigate the fungal component of these populations or to reveal the microbial composition of such an extensive number of kefir grains or milks. As a result several genera and species not previously identified in kefir were revealed. Our analysis shows that the bacterial populations in kefir are dominated by 2 phyla, the Firmicutes and the Proteobacteria. It was also established that the fungal populations of kefir were dominated by the genera Kazachstania, Kluyveromyces and Naumovozyma, but that a variable sub-dominant population also exists.

  5. Comparison of base composition analysis and Sanger sequencing of mitochondrial DNA for four U.S. population groups.

    Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M

    2014-01-01

    A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays. Published by Elsevier Ireland Ltd.

  6. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  7. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

    Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

    2018-04-01

    Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  8. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L..

    Bin Han

    Full Text Available Microsatellites or simple sequence repeats (SSRs are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR; 70,564 (23.9% were found to be monomorphic and 224,703 (76.1% were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3% amplified one locus, 8 (17.8% amplified multiple identical loci, and 13 (28.9% did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising

  9. A proposal for accident management optimization based on the study of accident sequence analysis for a BWR

    Sobajima, M.

    1998-01-01

    The paper describes a proposal for accident management optimization based on the study of accident sequence and source term analyses for a BWR. In Japan, accident management measures are to be implemented in all LWRs by the year 2000 in accordance with the recommendation of the regulatory organization and based on the PSAs carried out by the utilities. Source terms were evaluated by the Japan Atomic Energy Research Institute (JAERI) with the THALES code for all BWR sequences in which loss of decay heat removal resulted in the largest release. Identification of the priority and importance of accident management measures was carried out for the sequences with larger risk contributions. Considerations for optimizing emergency operation guides are believed to be essential for risk reduction. (author)

  10. Phylogenetic analysis of Fusobacterium prausnitzii based upon the 16S rRNA gene sequence and PCR confirmation.

    Wang, R F; Cao, W W; Cerniglia, C E

    1996-01-01

    In order to develop a PCR method to detect Fusobacterium prausnitzii in human feces and to clarify the phylogenetic position of this species, its 16S rRNA gene sequence was determined. The sequence described in this paper is different from the 16S rRNA gene sequence is specific for F. prausnitzii, and the results of this assay confirmed that F. prausnitzii is the most common species in human feces. However, a PCR assay based on the original GenBank sequence was negative when it was performed with two strains of F. prausnitzii obtained from the American Type Culture Collection. A phylogenetic tree based on the new 16S rRNA gene sequence was constructed. On this tree F. prausnitzii was not a member of the Fusobacterium group but was closer to some Eubacterium spp. and located between Clostridium "clusters III and IV" (M.D. Collins, P.A. Lawson, A. Willems, J.J. Cordoba, J. Fernandez-Garayzabal, P. Garcia, J. Cai, H. Hippe, and J.A.E. Farrow, Int. J. Syst. Bacteriol. 44:812-826, 1994).

  11. Preliminary hazard analysis using sequence tree method

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  12. Internal tectonic structure of the Central American Wadati-Benioff zone based on analysis of aftershock sequences

    Špičák, Aleš; Hanuš, Václav; Vaněk, Jiří; Běhounková, Marie

    2007-09-01

    Relocated Engdahl et al. (1998) global seismological data for 10 aftershock sequences were used to analyze the internal tectonic structure of the Central American subduction zone; the main shocks of several of these were the most destructive and often referenced earthquakes in the region (e.g., the 1970 Chiapas, 1983 Osa, 1992 Nicaragua, 1999 Quepos, 2001 El Salvador earthquakes). The spatial analysis of aftershock foci distribution was performed in a rotated Cartesian coordinate system (x, y, z) related to the Wadati-Benioff zone, and not in a standard coordinate system ($\\varphi$, λ, h are latitude, longitude, focal depth, respectively). Available fault plane solutions were also transformed into the plane approximating the Wadati-Benioff zone. The spatial distribution of earthquakes in each aftershock sequence was modeled as either a plane fit using a least squares approximation or a volume fit with a minimum thickness rectangular box. The analysis points to a quasi-planar distribution of earthquake foci in all aftershock sequences, manifesting the appurtenance of aftershocks to fracture zones. Geometrical parameters of fracture zones (strike, dip, and dimensions) hosting individual sequences were calculated and compared with the seafloor morphology of the Cocos Plate. The smooth character of the seafloor correlates with the aftershock fracture zones oriented parallel to the trench and commonly subparallel to the subducting slab, whereas subduction of the Cocos Ridge and seamounts around the Quepos Plateau coincides with steeply dipping fracture zones. Transformed focal mechanisms are almost exclusively (>90%) of normal character.

  13. Genetic diversity and population structure analysis in Perilla frutescens from Northern areas of China based on simple sequence repeats.

    Ma, S J; Sa, K J; Hong, T K; Lee, J K

    2017-09-21

    In this study, 21 simple sequence repeat (SSR) markers were used to evaluate the genetic diversity and population structure among 77 Perilla accessions from high-latitude and middle-latitude areas of China. Ninety-five alleles were identified with an average of 4.52 alleles per locus. The average polymorphic information content (PIC) and genetic diversity values were 0.346 and 0.372, respectively. The level of genetic diversity and PIC value for cultivated accessions of Perilla frutescens var. frutescens from middle-latitude areas were higher than accessions from high-latitude areas. Based on the dendrogram of unweighted pair group method with arithmetic mean (UPGMA), all accessions were classified into four major groups with a genetic similarity of 46%. All accessions of the cultivated var. frutescens were discriminated from the cultivated P. frutescens var. crispa. Furthermore, most accessions of the cultivated var. frutescens collected in high-latitude and middle-latitude areas were distinguished depending on their geographical location. However, the geographical locations of several accessions of the cultivated var. frutescens have no relation with their positions in the UPGMA dendrogram and population structure. This result implies that the diffusion of accessions of the cultivated Perilla crop in the northern areas of China might be through multiple routes. On the population structure analysis, 77 Perilla accessions were divided into Group I, Group II, and an admixed group based on a membership probability threshold of 0.8. Finally, the findings in this study can provide useful theoretical knowledge for further study on the population structure and genetic diversity of Perilla and benefit for Perilla crop breeding and germplasm conservation.

  14. Bacterial Pathogens and Community Composition in Advanced Sewage Treatment Systems Revealed by Metagenomics Analysis Based on High-Throughput Sequencing

    Lu, Xin; Zhang, Xu-Xiang; Wang, Zhu; Huang, Kailong; Wang, Yuan; Liang, Weigang; Tan, Yunfei; Liu, Bo; Tang, Junying

    2015-01-01

    This study used 454 pyrosequencing, Illumina high-throughput sequencing and metagenomic analysis to investigate bacterial pathogens and their potential virulence in a sewage treatment plant (STP) applying both conventional and advanced treatment processes. Pyrosequencing and Illumina sequencing consistently demonstrated that Arcobacter genus occupied over 43.42% of total abundance of potential pathogens in the STP. At species level, potential pathogens Arcobacter butzleri, Aeromonas hydrophila and Klebsiella pneumonia dominated in raw sewage, which was also confirmed by quantitative real time PCR. Illumina sequencing also revealed prevalence of various types of pathogenicity islands and virulence proteins in the STP. Most of the potential pathogens and virulence factors were eliminated in the STP, and the removal efficiency mainly depended on oxidation ditch. Compared with sand filtration, magnetic resin seemed to have higher removals in most of the potential pathogens and virulence factors. However, presence of the residual A. butzleri in the final effluent still deserves more concerns. The findings indicate that sewage acts as an important source of environmental pathogens, but STPs can effectively control their spread in the environment. Joint use of the high-throughput sequencing technologies is considered a reliable method for deep and comprehensive overview of environmental bacterial virulence. PMID:25938416

  15. Digital image sequence processing, compression, and analysis

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  16. Mitochondrial DNA sequence-based phylogenetic relationship ...

    cophaga ranges from 0.037–0.106 and 0.049–0.207 for COI and ND5 genes, respectively (tables 2 and 3). Analysis of genetic distance on the basis of sequence difference for both the mitochondrial genes shows very little genetic difference. The discrepancy in the phylogenetic trees based on individ- ual genes may be due ...

  17. OTU analysis using metagenomic shotgun sequencing data.

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  18. Enriching Genomic Resources and Transcriptional Profile Analysis of Miscanthus sinensis under Drought Stress Based on RNA Sequencing

    Gang Nie

    2017-01-01

    Full Text Available Miscanthus × giganteus is wildly cultivated as a potential biofuel feedstock around the world; however, the narrow genetic basis and sterile characteristics have become a limitation for its utilization. As a progenitor of M. × giganteus, M. sinensis is widely distributed around East Asia providing well abiotic stress tolerance. To enrich the M. sinensis genomic databases and resources, we sequenced and annotated the transcriptome of M. sinensis by using an Illumina HiSeq 2000 platform. Approximately 316 million high-quality trimmed reads were generated from 349 million raw reads, and a total of 114,747 unigenes were obtained after de novo assembly. Furthermore, 95,897 (83.57% unigenes were annotated to at least one database including NR, Swiss-Prot, KEGG, COG, GO, and NT, supporting that the sequences obtained were annotated properly. Differentially expressed gene analysis indicates that drought stress 15 days could be a critical period for M. sinensis response to drought stress. The high-throughput transcriptome sequencing of M. sinensis under drought stress has greatly enriched the current genomic available resources. The comparison of DEGs under different periods of drought stress identified a wealth of candidate genes involved in drought tolerance regulatory networks, which will facilitate further genetic improvement and molecular studies of the M. sinensis.

  19. Comparative genomics beyond sequence-based alignments

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  20. Sequence analysis-based characterization and identification of neurovirulence-associated variants of 36 EV71 strains from China.

    Xu, Jun; Wang, Fang; Zhao, Desheng; Liu, Jiang; Su, Hong; Wang, Baolong

    2018-03-30

    Enterovirus 71 (EV71) is the main pathogen of hand-foot-mouth disease (HFMD) and causes several neurological complications. As new strains of EV71 are constantly discovered, it is important to understand the genomic characteristics of the viruses and the mechanism of virulence. Herein, we isolated five strains of EV71 from HFMD patients with or without neurovirulence and sequenced their whole genomes. We then performed whole genome sequence analysis of totally 36 EV71 strains. The phylogenetic analysis of the VP1 region revealed all five isolated strains are clustered into C4a of C4 subgenotype. In addition, by comparing the complete genome sequences of 36 strains, 253 variable amino acid positions were found, 14 of which were identified to be associated with neurovirulence (P < 0.05). Moreover, a similar pattern of amino acid variants combination was identified in four strains without neurovirulence, indicating this type of variant pattern might be associated with avirulence. The strains with neurovirulence appeared to be distinguished from those without neurovirulence by the variants in VP1 and P2 regions, implying VP1 and P2 are the important regions associated with neurovirulence. Indeed, 3-D modeling of VP1 and P2 regions of non-neurovirulent and neurovirulent strains revealed that the different variants resulted in different protein structures and amino acid composition of ligand binding site, which might account for their difference in neurovirulence. In summary, our study reveals 14 variable amino acid positions of VP1, P2 and P3 regions are related to the virulence and that mutations in the capsid proteins of EV71 might contribute to neurovirulence. © 2018 Wiley Periodicals, Inc.

  1. Comparative analysis of transcriptomes in aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing

    Taketo Okada

    2016-12-01

    Full Text Available Ephedra plants are taxonomically classified as gymnosperms, and are medicinally important as the botanical origin of crude drugs and as bioresources that contain pharmacologically active chemicals. Here we show a comparative analysis of the transcriptomes of aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing by RNA-Seq. De novo assembly of short cDNA sequence reads generated 23,358, 13,373, and 28,579 contigs longer than 200 bases from aerial stems, roots, or both aerial stems and roots, respectively. The presumed functions encoded by these contig sequences were annotated by BLAST (blastx. Subsequently, these contigs were classified based on gene ontology slims, Enzyme Commission numbers, and the InterPro database. Furthermore, comparative gene expression analysis was performed between aerial stems and roots. These transcriptome analyses revealed differences and similarities between the transcriptomes of aerial stems and roots in E. sinica. Deep transcriptome sequencing of Ephedra should open the door to molecular biological studies based on the entire transcriptome, tissue- or organ-specific transcriptomes, or targeted genes of interest.

  2. A PC Mathcad-based computational aid for severe accident analysis and its application to a BWR small LOCA sequence

    Wu, Laung-Kuang T.; Lee, S.J.

    2004-01-01

    A PC-based Mathcad program is used to develop a computational aid for analyzing severe accident phenomena. This computational aid uses simple engineering expressions and empirical correlations to estimate key quantities and timings at various stages of accident progressions. In this paper, the computational aid is applied to analyze an early phase of a BWR small LOCA sequence. The accident phenomena analyzed include: break flow rates, boiled-up water level in the core, core uncovery time, depressurization of the reactor pressure vessel, core heat-up, onset of clad oxidation, hydrogen generation, and onset of fuel relocation. The results are compared with those obtained running the MAAP 3.0B code. This PC-based computational aid can be used to train plant personnel in understanding severe accident phenomena and to assist them in managing severe accidents. (author)

  3. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  4. Complete genome sequence of a Chinese isolate of pepper vein yellows virus and evolutionary analysis based on the CP, MP and RdRp coding regions.

    Liu, Maoyan; Liu, Xiangning; Li, Xun; Zhang, Deyong; Dai, Liangyin; Tang, Qianjun

    2016-03-01

    The genome sequence of pepper vein yellows virus (PeVYV) (PeVYV-HN, accession number KP326573), isolated from pepper plants (Capsicum annuum L.) grown at the Hunan Vegetables Institute (Changsha, Hunan, China), was determined by deep sequencing of small RNAs. The PeVYV-HN genome consists of 6244 nucleotides, contains six open reading frames (ORFs), and is similar to that of an isolate (AB594828) from Japan. Its genomic organization is similar to that of members of the genus Polerovirus. Sequence analysis revealed that PeVYV-HN shared 92% sequence identity with the Japanese PeVYV genome at both the nucleotide and amino acid levels. Evolutionary analysis based on the coat protein (CP), movement protein (MP), and RNA-dependent RNA polymerase (RdRP) showed that PeVYV could be divided into two major lineages corresponding to their geographical origins. The Asian isolates have a higher population expansion frequency than the African isolates. Negative selection and genetic drift (founder effect) were found to be the potential drivers of the molecular evolution of PeVYV. Moreover, recombination was not the distinct cause of PeVYV evolution. This is the first report of a complete genomic sequence of PeVYV in China.

  5. [Complete genome sequencing and sequence analysis of BCG Tice].

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  6. Computer-aided visualization and analysis system for sequence evaluation

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  7. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies.

    Patrick D Schloss

    Full Text Available Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of beta-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results

  8. The historical biogeography of Pteroglossus aracaris (Aves, Piciformes, Ramphastidae based on Bayesian analysis of mitochondrial DNA sequences

    Sérgio L. Pereira

    2008-01-01

    Full Text Available Most Neotropical birds, including Pteroglossus aracaris, do not have an adequate fossil record to be used as time constraints in molecular dating. Hence, the evolutionary timeframe of the avian biota can only be inferred using alternative time constraints. We applied a Bayesian relaxed clock approach to propose an alternative interpretation for the historical biogeography of Pteroglossus based on mitochondrial DNA sequences, using different combinations of outgroups and time constraints obtained from outgroup fossils, vicariant barriers and molecular time estimates. The results indicated that outgroup choice has little effect on the Bayesian posterior distribution of divergence times within Pteroglossus , that geological and molecular time constraints seem equally suitable to estimate the Bayesian posterior distribution of divergence times for Pteroglossus , and that the fossil record alone overestimates divergence times within the fossil-lacking ingroup. The Bayesian estimates of divergence times suggest that the radiation of Pteroglossus occurred from the Late Miocene to the Pliocene (three times older than estimated by the “standard” mitochondrial rate of 2% sequence divergence per million years, likely triggered by Andean uplift, multiple episodes of marine transgressions in South America, and formation of present-day river basins. The time estimates are in agreement with other Neotropical taxa with similar geographic distributions.

  9. Construction and Analysis of a Novel 2-D Optical Orthogonal Codes Based on Modified One-coincidence Sequence

    Ji, Jianhua; Wang, Yanfen; Wang, Ke; Xu, Ming; Zhang, Zhipeng; Yang, Shuwen

    2013-09-01

    A new two-dimensional OOC (optical orthogonal codes) named PC/MOCS is constructed, using PC (prime code) for time spreading and MOCS (modified one-coincidence sequence) for wavelength hopping. Compared with PC/PC, the number of wavelengths for PC/MOCS is not limited to a prime number. Compared with PC/OCS, the length of MOCS need not be expanded to the same length of PC. PC/MOCS can be constructed flexibly, and also can use available wavelengths effectively. Theoretical analysis shows that PC/MOCS can reduce the bit error rate (BER) of OCDMA system, and can support more users than PC/PC and PC/OCS.

  10. An integrated genetic data environment (GDE)-based LINUX interface for analysis of HIV-1 and other microbial sequences.

    De Oliveira, T; Miller, R; Tarin, M; Cassol, S

    2003-01-01

    Sequence databases encode a wealth of information needed to develop improved vaccination and treatment strategies for the control of HIV and other important pathogens. To facilitate effective utilization of these datasets, we developed a user-friendly GDE-based LINUX interface that reduces input/output file formatting. GDE was adapted to the Linux operating system, bioinformatics tools were integrated with microbe-specific databases, and up-to-date GDE menus were developed for several clinically important viral, bacterial and parasitic genomes. Each microbial interface was designed for local access and contains Genbank, BLAST-formatted and phylogenetic databases. GDE-Linux is available for research purposes by direct application to the corresponding author. Application-specific menus and support files can be downloaded from (http://www.bioafrica.net).

  11. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  12. Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions

    Debnath Bhattacharyya

    2013-01-01

    Full Text Available We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant as well as query sequence (virus. Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size. This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length.

  13. Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions

    Mandal, Bijoy Kumar; Kim, Tai-hoon

    2013-01-01

    We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant) as well as query sequence (virus). Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size). This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length. PMID:24000321

  14. Phylogenetic relationships within the cyst-forming nematodes (Nematoda, Heteroderidae) based on analysis of sequences from the ITS regions of ribosomal DNA.

    Subbotin, S A; Vierstraete, A; De Ley, P; Rowe, J; Waeyenberge, L; Moens, M; Vanfleteren, J R

    2001-10-01

    The ITS1, ITS2, and 5.8S gene sequences of nuclear ribosomal DNA from 40 taxa of the family Heteroderidae (including the genera Afenestrata, Cactodera, Heterodera, Globodera, Punctodera, Meloidodera, Cryphodera, and Thecavermiculatus) were sequenced and analyzed. The ITS regions displayed high levels of sequence divergence within Heteroderinae and compared to outgroup taxa. Unlike recent findings in root knot nematodes, ITS sequence polymorphism does not appear to complicate phylogenetic analysis of cyst nematodes. Phylogenetic analyses with maximum-parsimony, minimum-evolution, and maximum-likelihood methods were performed with a range of computer alignments, including elision and culled alignments. All multiple alignments and phylogenetic methods yielded similar basic structure for phylogenetic relationships of Heteroderidae. The cyst-forming nematodes are represented by six main clades corresponding to morphological characters and host specialization, with certain clades assuming different positions depending on alignment procedure and/or method of phylogenetic inference. Hypotheses of monophyly of Punctoderinae and Heteroderinae are, respectively, strongly and moderately supported by the ITS data across most alignments. Close relationships were revealed between the Avenae and the Sacchari groups and between the Humuli group and the species H. salixophila within Heteroderinae. The Goettingiana group occupies a basal position within this subfamily. The validity of the genera Afenestrata and Bidera was tested and is discussed based on molecular data. We conclude that ITS sequence data are appropriate for studies of relationships within the different species groups and less so for recovery of more ancient speciations within Heteroderidae. Copyright 2001 Academic Press.

  15. Next-generation sequencing-based transcriptome analysis of Helicoverpa armigera Larvae immune-primed with Photorhabdus luminescens TT01.

    Zengyang Zhao

    Full Text Available Although invertebrates are incapable of adaptive immunity, immunal reactions which are functionally similar to the adaptive immunity of vertebrates have been described in many studies of invertebrates including insects. The phenomenon was termed immune priming. In order to understand the molecular mechanism of immune priming, we employed Illumina/Solexa platform to investigate the transcriptional changes of the hemocytes and fat body of Helicoverpa armigera larvae immune-primed with the pathogenic bacteria Photorhabdus luminescens TT01. A total of 43.6 and 65.1 million clean reads with 4.4 and 6.5 gigabase sequence data were obtained from the TT01 (the immune-primed and PBS (non-primed cDNA libraries and assembled into 35,707 all-unigenes (non-redundant transcripts, which has a length varied from 201 to 16,947 bp and a N50 length of 1,997 bp. For 35,707 all-unigenes, 20,438 were functionally annotated and 2,494 were differentially expressed after immune priming. The differentially expressed genes (DEGs are mainly related to immunity, detoxification, development and metabolism of the host insect. Analysis on the annotated immune related DEGs supported a hypothesis that we proposed previously: the immune priming phenomenon observed in H. armigera larvae was achieved by regulation of key innate immune elements. The transcriptome profiling data sets (especially the sequences of 1,022 unannotated DEGs and the clues (such as those on immune-related signal and regulatory pathways obtained from this study will facilitate immune-related novel gene discovery and provide valuable information for further exploring the molecular mechanism of immune priming of invertebrates. All these will increase our understanding of invertebrate immunity which may provide new approaches to control insect pests or prevent epidemic of infectious diseases in economic invertebrates in the future.

  16. Phytoplasma phylogenetics based on analysis of secA and 23S rRNA gene sequences for improved resolution of candidate species of 'Candidatus Phytoplasma'.

    Hodgetts, Jennifer; Boonham, Neil; Mumford, Rick; Harrison, Nigel; Dickinson, Matthew

    2008-08-01

    Phytoplasma phylogenetics has focused primarily on sequences of the non-coding 16S rRNA gene and the 16S-23S rRNA intergenic spacer region (16-23S ISR), and primers that enable amplification of these regions from all phytoplasmas by PCR are well established. In this study, primers based on the secA gene have been developed into a semi-nested PCR assay that results in a sequence of the expected size (about 480 bp) from all 34 phytoplasmas examined, including strains representative of 12 16Sr groups. Phylogenetic analysis of secA gene sequences showed similar clustering of phytoplasmas when compared with clusters resolved by similar sequence analyses of a 16-23S ISR-23S rRNA gene contig or of the 16S rRNA gene alone. The main differences between trees were in the branch lengths, which were elongated in the 16-23S ISR-23S rRNA gene tree when compared with the 16S rRNA gene tree and elongated still further in the secA gene tree, despite this being a shorter sequence. The improved resolution in the secA gene-derived phylogenetic tree resulted in the 16SrII group splitting into two distinct clusters, while phytoplasmas associated with coconut lethal yellowing-type diseases split into three distinct groups, thereby supporting past proposals that they represent different candidate species within 'Candidatus Phytoplasma'. The ability to differentiate 16Sr groups and subgroups by virtual RFLP analysis of secA gene sequences suggests that this gene may provide an informative alternative molecular marker for pathogen identification and diagnosis of phytoplasma diseases.

  17. Genome Sequencing and Analysis Conference IV

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  18. Understanding north-western Mediterranean climate variability: a multi-proxy and multi-sequence approach based on wavelet analysis.

    Azuara, Julien; Lebreton, Vincent; Jalali, Bassem; Sicre, Marie-Alexandrine; Sabatier, Pierre; Dezileau, Laurent; Peyron, Odile; Frigola, Jaime; Combourieu-Nebout, Nathalie

    2017-04-01

    Forcings and physical mechanisms underlying Holocene climate variability still remain poorly understood. Comparison of different paleoclimatic reconstructions using spectral analysis allows to investigate their common periodicities and helps to understand the causes of past climate changes. Wavelet analysis applied on several proxy time series from the Atlantic domain already revealed the first key-issues on the origin of Holocene climate variability. However the differences in duration, resolution and variance between the time-series are important issues for comparing paleoclimatic sequences in the frequency domain. This work compiles 7 paleoclimatic proxy records from 4 time-series from the north-western Mediterranean all ranging from 7000 to 1000 yrs cal BP: -pollen and clay mineral contents from the lagoonal sediment core PB06 recovered in southern France, -Sea Surface Temperatures (SST) derived from alkenones, concentration of terrestrial alkanes and their average chain length (ACL) from core KSGC-31_GolHo-1B recovered in the Gulf of Lion inner-shelf, - δ18O record from speleothems recovered in the Asiul Cave in north-western Spain, -grain size record from the deep basin sediment drift core MD99-2343 north of Minorca island. A comparison of their frequency content is proposed using wavelet analysis and cluster analysis of wavelet power spectra. Common cyclicities are assessed using cross-wavelet analysis. In addition, a new algorithm is used in order to propagate the age model errors within wavelet power spectra. Results are consistents with a non-stationnary Holocene climate variability. The Halstatt cycles (2000-2500 years) depicted in many proxies (ACL, errestrial alkanes and SSTs) demonstrate solar activity influence in the north-western Mediterranean climate. Cluster analysis shows that pollen and ACL proxies, both indicating changes in aridity, are clearly distinct from other proxies and share significant common periodicities around 1000 and 600 years

  19. Grateloupia tenuis Wang et Luan sp. nov. (Halymeniaceae, Rhodophyta: A New Species from South China Sea Based on Morphological Observation and rbcL Gene Sequences Analysis

    Ling Yu

    2013-01-01

    Full Text Available Grateloupia tenuis Wang et Luan sp. nov. is a new species described from Lingshui, Hainan Province, South China Sea. Based on the external form and internal structure, combined with rbcL gene sequence analysis, Grateloupia tenuis is distinct from other Grateloupia species as follows: (1 thalli is slippery and cartilaginous in texture; possess fewer branches, relatively slight main axes, and two or three dichotomous branches; (2 cortex is 5-6 layers; medulla is solid when young, but hollow in old branches; reproductive structures are dispersed in main axes of thalli and lower portions of branchlets; exhibits Grateloupia-type auxiliary cell ampullae; (3 the four studied G. tenuis sequences were positioned in a large Grateloupia clade of Halymeniaceae, which included sister group generitype G. filicina with 68 bp differences; G. tenuis was determined to be a sister taxon to the G. catenata, G. ramosissima, G. orientalis, and G. filiformis subclade. The pairwise distances between G. tenuis and these species were 39 to 50 bp. The sequences of G. tenuis differed by 81–108 bp from the sequences of other samples in Grateloupia; there are 114–133 bp changes between G. tenuis and other genera of Halymeniaceae. In final analysis, we considered Grateloupia tenuis Wang et Luan sp. nov. to be a new species of genus Grateloupia.

  20. Grateloupia tenuis Wang et Luan sp. nov. (Halymeniaceae, Rhodophyta): a new species from South China Sea based on morphological observation and rbcL gene sequences analysis.

    Yu, Ling; Wang, Hongwei; Luan, Rixiao

    2013-01-01

    Grateloupia tenuis Wang et Luan sp. nov. is a new species described from Lingshui, Hainan Province, South China Sea. Based on the external form and internal structure, combined with rbcL gene sequence analysis, Grateloupia tenuis is distinct from other Grateloupia species as follows: (1) thalli is slippery and cartilaginous in texture; possess fewer branches, relatively slight main axes, and two or three dichotomous branches; (2) cortex is 5-6 layers; medulla is solid when young, but hollow in old branches; reproductive structures are dispersed in main axes of thalli and lower portions of branchlets; exhibits Grateloupia-type auxiliary cell ampullae; (3) the four studied G. tenuis sequences were positioned in a large Grateloupia clade of Halymeniaceae, which included sister group generitype G. filicina with 68 bp differences; G. tenuis was determined to be a sister taxon to the G. catenata, G. ramosissima, G. orientalis, and G. filiformis subclade. The pairwise distances between G. tenuis and these species were 39 to 50 bp. The sequences of G. tenuis differed by 81-108 bp from the sequences of other samples in Grateloupia; there are 114-133 bp changes between G. tenuis and other genera of Halymeniaceae. In final analysis, we considered Grateloupia tenuis Wang et Luan sp. nov. to be a new species of genus Grateloupia.

  1. Grateloupia tenuis Wang et Luan sp. nov. (Halymeniaceae, Rhodophyta): A New Species from South China Sea Based on Morphological Observation and rbcL Gene Sequences Analysis

    Wang, Hongwei; Luan, Rixiao

    2013-01-01

    Grateloupia tenuis Wang et Luan sp. nov. is a new species described from Lingshui, Hainan Province, South China Sea. Based on the external form and internal structure, combined with rbcL gene sequence analysis, Grateloupia tenuis is distinct from other Grateloupia species as follows: (1) thalli is slippery and cartilaginous in texture; possess fewer branches, relatively slight main axes, and two or three dichotomous branches; (2) cortex is 5-6 layers; medulla is solid when young, but hollow in old branches; reproductive structures are dispersed in main axes of thalli and lower portions of branchlets; exhibits Grateloupia-type auxiliary cell ampullae; (3) the four studied G. tenuis sequences were positioned in a large Grateloupia clade of Halymeniaceae, which included sister group generitype G. filicina with 68 bp differences; G. tenuis was determined to be a sister taxon to the G. catenata, G. ramosissima, G. orientalis, and G. filiformis subclade. The pairwise distances between G. tenuis and these species were 39 to 50 bp. The sequences of G. tenuis differed by 81–108 bp from the sequences of other samples in Grateloupia; there are 114–133 bp changes between G. tenuis and other genera of Halymeniaceae. In final analysis, we considered Grateloupia tenuis Wang et Luan sp. nov. to be a new species of genus Grateloupia. PMID:24455703

  2. Computational analysis of sequence selection mechanisms.

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  3. Phylogenetic analysis of Thai oyster (Ostreidae) based on partial sequences of the mitochondrial 16S rDNA gene

    Bussarawit, Somchai; Gravlund, Peter; Glenner, Henrik

    2006-01-01

    Ten oyster species of the family Ostreidae (Subfamilies Crassostreinae and Lophinae) from Thailand were studied using morphological data and mitochondrial 16S rDNA gene sequences. Additional sequence data from five specimens of Ostreidae and one specimen of Tridacna gigas were downloaded from Gen...

  4. Analysis of genetic relationships and identification of lily cultivars based on inter-simple sequence repeat markers.

    Cui, G F; Wu, L F; Wang, X N; Jia, W J; Duan, Q; Ma, L L; Jiang, Y L; Wang, J H

    2014-07-29

    Inter-simple sequence repeat (ISSR) markers were used to discriminate 62 lily cultivars of 5 hybrid series. Eight ISSR primers generated 104 bands in total, which all showed 100% polymorphism, and an average of 13 bands were amplified by each primer. Two software packages, POPGENE 1.32 and NTSYSpc 2.1, were used to analyze the data matrix. Our results showed that the observed number of alleles (NA), effective number of alleles (NE), Nei's genetic diversity (H), and Shannon's information index (I) were 1.9630, 1.4179, 0.2606, and 0.4080, respectively. The highest genetic similarity (0.9601) was observed between the Oriental x Trumpet and Oriental lilies, which indicated that the two hybrids had a close genetic relationship. An unweighted pair-group method with arithmetic means dendrogram showed that the 62 lily cultivars clustered into two discrete groups. The first group included the Oriental and OT cultivars, while the Asiatic, LA, and Longiflorum lilies were placed in the second cluster. The distribution of individuals in the principal component analysis was consistent with the clustering of the dendrogram. Fingerprints of all lily cultivars built from 8 primers could be separated completely. This study confirmed the effect and efficiency of ISSR identification in lily cultivars.

  5. Spatiotemporal dynamics and epistatic interaction sites in dengue virus type 1: a comprehensive sequence-based analysis.

    Pei-Yu Chu

    Full Text Available The continuing threat of dengue fever necessitates a comprehensive characterisation of its epidemiological trends. Phylogenetic and recombination events were reconstructed based on 100 worldwide dengue virus (DENV type 1 genome sequences with an outgroup (prototypes of DENV2-4. The phylodynamic characteristics and site-specific variation were then analysed using data without the outgroup. Five genotypes (GI-GV and a ladder-like structure with short terminal branch topology were observed in this study. Apparently, the transmission of DENV1 was geographically random before gradual localising with human activity as GI-GIII in South Asia, GIV in the South Pacific, and GV in the Americas. Genotypes IV and V have recently shown higher population densities compared to older genotypes. All codon regions and all tree branches were skewed toward a negative selection, which indicated that their variation was restricted by protein function. Notably, multi-epistatic interaction sites were found in both PrM 221 and NS3 1730. Recombination events accumulated in regions E, NS3-NS4A, and particularly in region NS5. The estimated coevolution pattern also highlights the need for further study of the biological role of protein PrM 221 and NS3 1730. The recent transmission of emergent GV sublineages into Central America and Europe mandates closely monitoring of genotype interaction and succession.

  6. Array-based FMR1 sequencing and deletion analysis in patients with a fragile X syndrome-like phenotype.

    Stephen C Collins

    2010-03-01

    Full Text Available Fragile X syndrome (FXS is caused by loss of function mutations in the FMR1 gene. Trinucleotide CGG-repeat expansions, resulting in FMR1 gene silencing, are the most common mutations observed at this locus. Even though the repeat expansion mutation is a functional null mutation, few conventional mutations have been identified at this locus, largely due to the clinical laboratory focus on the repeat tract.To more thoroughly evaluate the frequency of conventional mutations in FXS-like patients, we used an array-based method to sequence FMR1 in 51 unrelated males exhibiting several features characteristic of FXS but with normal CGG-repeat tracts of FMR1. One patient was identified with a deletion in FMR1, but none of the patients were found to have other conventional mutations.These data suggest that missense mutations in FMR1 are not a common cause of the FXS phenotype in patients who have normal-length CGG-repeat tracts. However, screening for small deletions of FMR1 may be of clinically utility.

  7. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis.

    Hong, Yanbin; Pandey, Manish K; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut.

  8. FAST: FAST Analysis of Sequences Toolbox

    Travis J. Lawrence

    2015-05-01

    Full Text Available FAST (FAST Analysis of Sequences Toolbox provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU’s Not Unix Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics makes FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format. Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.

  9. Late Life Employment Histories and Their Association With Work and Family Formation During Adulthood: A Sequence Analysis Based on ELSA.

    Wahrendorf, Morten; Zaninotto, Paola; Hoven, Hanno; Head, Jenny; Carr, Ewan

    2017-05-31

    To extend research on workforce participation beyond age 50 by describing entire employment histories in later life and testing their links to prior life course conditions. We use data from the English Longitudinal Study of Ageing, with retrospective information on employment histories between age 50 and 70 for 1,103 men and 1,195 women (n = 2,298). We apply sequence analysis and group respondents into eight clusters with similar histories. Using multinomial regressions, we then test their links to labor market participation, partnership, and parenthood histories during early (age 20-34) and mid-adulthood (age 35-49). Three clusters include histories dominated by full-time employees but with varying age of retirement (before, at, and after age 60). One cluster is dominated by self-employment with comparatively later retirement. Remaining clusters include part-time work (retirement around age 60 or no retirement), continuous domestic work (mostly women), or other forms of nonemployment. Those who had strong attachments to the labor market during adulthood are more likely to have histories of full-time work up until and beyond age 60, especially men. Parenthood in early adulthood is related to later retirement (for men only). Continued domestic work was not linked to parenthood. Partnered women tend to work part-time or do domestic work. The findings remain consistent after adjusting for birth cohort, childhood adversity, life course health, and occupational position. Policies aimed at increasing the proportion of older workers not only need to address later stages of the life course but also early and mid-adulthood. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America.

  10. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  11. Characterization of cereal cyst nematodes (Heterodera spp. in Morocco based on morphology, morphometrics and rDNA-ITS sequence analysis

    Mokrini Fouad

    2017-09-01

    Full Text Available Morphological and molecular diversity among 11 populations of cereal cyst nematodes from different wheat production areas in Morocco was investigated using light microscopy, species-specific primers, complemented by the ITS-rDNA sequences. Morphometrics of cysts and second-stage juveniles (J2s were generally within the expected ranges for Heterodera avenae; only the isolate from Aïn Jmaa showed morphometrics conforming to those of H. latipons. When using species-specific primers for H. avenae and H. latipons, the specific bands of 109 bp and 204 bp, respectively, confirmed the morphological identification. In addition, the internal transcribed spacer (ITS regions were sequenced to study the diversity of the 11 populations. These sequences were compared with those of Heterodera species available in the GenBank database (www.ncbi.nlm.nih.gov and confirmed again the identity of the species. Ten sequences of the ITS-rDNA were similar (99–100% to the sequences of H. avenae published in GenBank and three sequences, corresponding with one population, were similar (97–99% to H. latipons.

  12. Establishing a framework for comparative analysis of genome sequences

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  13. Genetic Diversity and Phylogenetic Analysis of the Iranian Leishmania Parasites Based on HSP70 Gene PCR-RFLP and Sequence Analysis.

    Nemati, Sara; Fazaeli, Asghar; Hajjaran, Homa; Khamesipour, Ali; Anbaran, Mohsen Falahati; Bozorgomid, Arezoo; Zarei, Fatah

    2017-08-01

    Despite the broad distribution of leishmaniasis among Iranians and animals across the country, little is known about the genetic characteristics of the causative agents. Applying both HSP70 PCR-RFLP and sequence analyses, this study aimed to evaluate the genetic diversity and phylogenetic relationships among Leishmania spp. isolated from Iranian endemic foci and available reference strains. A total of 36 Leishmania isolates from almost all districts across the country were genetically analyzed for the HSP70 gene using both PCR-RFLP and sequence analysis. The original HSP70 gene sequences were aligned along with homologous Leishmania sequences retrieved from NCBI, and subjected to the phylogenetic analysis. Basic parameters of genetic diversity were also estimated. The HSP70 PCR-RFLP presented 3 different electrophoretic patterns, with no further intraspecific variation, corresponding to 3 Leishmania species available in the country, L. tropica, L. major, and L. infantum. Phylogenetic analyses presented 5 major clades, corresponding to 5 species complexes. Iranian lineages, including L. major, L. tropica, and L. infantum, were distributed among 3 complexes L. major, L. tropica, and L. donovani. However, within the L. major and L. donovani species complexes, the HSP70 phylogeny was not able to distinguish clearly between the L. major and L. turanica isolates, and between the L. infantum, L. donovani, and L. chagasi isolates, respectively. Our results indicated that both HSP70 PCR-RFLP and sequence analyses are medically applicable tools for identification of Leishmania species in Iranian patients. However, the reduced genetic diversity of the target gene makes it inevitable that its phylogeny only resolves the major groups, namely, the species complexes.

  14. Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis

    Guanghong Zuo

    2015-03-01

    Full Text Available A tripartite comparison of Archaea phylogeny and taxonomy at and above the rank order is reported: (1 the whole-genome-based and alignment-free CVTree using 179 genomes; (2 the 16S rRNA analysis exemplified by the All-Species Living Tree with 366 archaeal sequences; and (3 the Second Edition of Bergey’s Manual of Systematic Bacteriology complemented by some current literature. A high degree of agreement is reached at these ranks. From the newly proposed archaeal phyla, Korarchaeota, Thaumarchaeota, Nanoarchaeota and Aigarchaeota, to the recent suggestion to divide the class Halobacteria into three orders, all gain substantial support from CVTree. In addition, the CVTree helped to determine the taxonomic position of some newly sequenced genomes without proper lineage information. A few discrepancies between the CVTree and the 16S rRNA approaches call for further investigation.

  15. Bayesian Correlation Analysis for Sequence Count Data.

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  16. Image sequence analysis workstation for multipoint motion analysis

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  17. QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

    Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

    2017-08-31

    As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.

  18. Cloning and sequence analysis of benzo-a-pyreneinducible ...

    The phylogenetic tree based on the amino acid sequences clearly shows tilapia CYP1A and killifish CYP1A to be more closely related to each other than to the other CYP1A subfamilies. Sequence analysis of 3727 bp of genomic DNA showed that the clone obtained was the structural gene of CYP1A which consists of ...

  19. Biological sequence analysis: probabilistic models of proteins and nucleic acids

    Durbin, Richard

    1998-01-01

    ... analysis methods are now based on principles of probabilistic modelling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference...

  20. miRDis: a Web tool for endogenous and exogenous microRNA discovery based on deep-sequencing data analysis.

    Zhang, Hanyuan; Vieira Resende E Silva, Bruno; Cui, Juan

    2018-05-01

    Small RNA sequencing is the most widely used tool for microRNA (miRNA) discovery, and shows great potential for the efficient study of miRNA cross-species transport, i.e., by detecting the presence of exogenous miRNA sequences in the host species. Because of the increased appreciation of dietary miRNAs and their far-reaching implication in human health, research interests are currently growing with regard to exogenous miRNAs bioavailability, mechanisms of cross-species transport and miRNA function in cellular biological processes. In this article, we present microRNA Discovery (miRDis), a new small RNA sequencing data analysis pipeline for both endogenous and exogenous miRNA detection. Specifically, we developed and deployed a Web service that supports the annotation and expression profiling data of known host miRNAs and the detection of novel miRNAs, other noncoding RNAs, and the exogenous miRNAs from dietary species. As a proof-of-concept, we analyzed a set of human plasma sequencing data from a milk-feeding study where 225 human miRNAs were detected in the plasma samples and 44 show elevated expression after milk intake. By examining the bovine-specific sequences, data indicate that three bovine miRNAs (bta-miR-378, -181* and -150) are present in human plasma possibly because of the dietary uptake. Further evaluation based on different sets of public data demonstrates that miRDis outperforms other state-of-the-art tools in both detection and quantification of miRNA from either animal or plant sources. The miRDis Web server is available at: http://sbbi.unl.edu/miRDis/index.php.

  1. Sequence-Based Analysis of Structural Organization and Composition of the Cultivated Sunflower (Helianthus annuus L. Genome

    Navdeep Gill

    2014-04-01

    Full Text Available Sunflower is an important oilseed crop, as well as a model system for evolutionary studies, but its 3.6 gigabase genome has proven difficult to assemble, in part because of the high repeat content of its genome. Here we report on the sequencing, assembly, and analyses of 96 randomly chosen BACs from sunflower to provide additional information on the repeat content of the sunflower genome, assess how repetitive elements in the sunflower genome are organized relative to genes, and compare the genomic distribution of these repeats to that found in other food crops and model species. We also examine the expression of transposable element-related transcripts in EST databases for sunflower to determine the representation of repeats in the transcriptome and to measure their transcriptional activity. Our data confirm previous reports in suggesting that the sunflower genome is >78% repetitive. Sunflower repeats share very little similarity to other plant repeats such as those of Arabidopsis, rice, maize and wheat; overall 28% of repeats are “novel” to sunflower. The repetitive sequences appear to be randomly distributed within the sequenced BACs. Assuming the 96 BACs are representative of the genome as a whole, then approximately 5.2% of the sunflower genome comprises non TE-related genic sequence, with an average gene density of 18kbp/gene. Expression levels of these transposable elements indicate tissue specificity and differential expression in vegetative and reproductive tissues, suggesting that expressed TEs might contribute to sunflower development. The assembled BACs will also be useful for assessing the quality of several different draft assemblies of the sunflower genome and for annotating the reference sequence.

  2. Sequence-Based Analysis of Structural Organization and Composition of the Cultivated Sunflower (Helianthus annuus L.) Genome

    Gill, Navdeep; Buti, Matteo; Kane, Nolan; Bellec, Arnaud; Helmstetter, Nicolas; Berges, Hélène; Rieseberg, Loren H.

    2014-01-01

    Sunflower is an important oilseed crop, as well as a model system for evolutionary studies, but its 3.6 gigabase genome has proven difficult to assemble, in part because of the high repeat content of its genome. Here we report on the sequencing, assembly, and analyses of 96 randomly chosen BACs from sunflower to provide additional information on the repeat content of the sunflower genome, assess how repetitive elements in the sunflower genome are organized relative to genes, and compare the genomic distribution of these repeats to that found in other food crops and model species. We also examine the expression of transposable element-related transcripts in EST databases for sunflower to determine the representation of repeats in the transcriptome and to measure their transcriptional activity. Our data confirm previous reports in suggesting that the sunflower genome is >78% repetitive. Sunflower repeats share very little similarity to other plant repeats such as those of Arabidopsis, rice, maize and wheat; overall 28% of repeats are “novel” to sunflower. The repetitive sequences appear to be randomly distributed within the sequenced BACs. Assuming the 96 BACs are representative of the genome as a whole, then approximately 5.2% of the sunflower genome comprises non TE-related genic sequence, with an average gene density of 18kbp/gene. Expression levels of these transposable elements indicate tissue specificity and differential expression in vegetative and reproductive tissues, suggesting that expressed TEs might contribute to sunflower development. The assembled BACs will also be useful for assessing the quality of several different draft assemblies of the sunflower genome and for annotating the reference sequence. PMID:24833511

  3. Genetic analysis of Fasciola isolates from cattle in Korea based on second internal transcribed spacer (ITS-2) sequence of nuclear ribosomal DNA.

    Choe, Se-Eun; Nguyen, Thuy Thi-Dieu; Kang, Tae-Gyu; Kweon, Chang-Hee; Kang, Seung-Won

    2011-09-01

    Nuclear ribosomal DNA sequence of the second internal transcribed spacer (ITS-2) has been used efficiently to identify the liver fluke species collected from different hosts and various geographic regions. ITS-2 sequences of 19 Fasciola samples collected from Korean native cattle were determined and compared. Sequence comparison including ITS-2 sequences of isolates from this study and reference sequences from Fasciola hepatica and Fasciola gigantica and intermediate Fasciola in Genbank revealed seven identical variable sites of investigated isolates. Among 19 samples, 12 individuals had ITS-2 sequences completely identical to that of pure F. hepatica, five possessed the sequences identical to F. gigantica type, whereas two shared the sequence of both F. hepatica and F. gigantica. No variations in length and nucleotide composition of ITS-2 sequence were observed within isolates that belonged to F. hepatica or F. gigantica. At the position of 218, five Fasciola containing a single-base substitution (C>T) formed a distinct branch inside the F. gigantica-type group which was similar to those of Asian-origin isolates. The phylogenetic tree of the Fasciola spp. based on complete ITS-2 sequences from this study and other representative isolates in different locations clearly showed that pure F. hepatica, F. gigantica type and intermediate Fasciola were observed. The result also provided additional genetic evidence for the existence of three forms of Fasciola isolated from native cattle in Korea by genetic approach using ITS-2 sequence.

  4. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology

    Tanase Koji; Nishitani Chikako; Hirakawa Hideki; Isobe Sachiko; Tabata Satoshi; Ohmiya Akemi; Onozaki Takashi

    2012-01-01

    Abstract Background Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, ...

  5. Probabilistic accident sequence recovery analysis

    Stutzke, Martin A.; Cooper, Susan E.

    2004-01-01

    Recovery analysis is a method that considers alternative strategies for preventing accidents in nuclear power plants during probabilistic risk assessment (PRA). Consideration of possible recovery actions in PRAs has been controversial, and there seems to be a widely held belief among PRA practitioners, utility staff, plant operators, and regulators that the results of recovery analysis should be skeptically viewed. This paper provides a framework for discussing recovery strategies, thus lending credibility to the process and enhancing regulatory acceptance of PRA results and conclusions. (author)

  6. Diagnosis of Fanconi Anemia: Mutation Analysis by Multiplex Ligation-Dependent Probe Amplification and PCR-Based Sanger Sequencing

    Gille, Johan J. P.; Floor, Karijn; Kerkhoven, Lianne; Ameziane, Najim; Joenje, Hans; de Winter, Johan P.

    2012-01-01

    Fanconi anemia (FA) is a rare inherited disease characterized by developmental defects, short stature, bone marrow failure, and a high risk of malignancies. FA is heterogeneous: 15 genetic subtypes have been distinguished so far. A clinical diagnosis of FA needs to be confirmed by testing cells for sensitivity to cross-linking agents in a chromosomal breakage test. As a second step, DNA testing can be employed to elucidate the genetic subtype of the patient and to identify the familial mutations. This knowledge allows preimplantation genetic diagnosis (PGD) and enables prenatal DNA testing in future pregnancies. Although simultaneous testing of all FA genes by next generation sequencing will be possible in the near future, this technique will not be available immediately for all laboratories. In addition, in populations with strong founder mutations, a limited test using Sanger sequencing and MLPA will be a cost-effective alternative. We describe a strategy and optimized conditions for the screening of FANCA, FANCB, FANCC, FANCE, FANCF, and FANCG and present the results obtained in a cohort of 54 patients referred to our diagnostic service since 2008. In addition, the follow up with respect to genetic counseling and carrier screening in the families is discussed. PMID:22778927

  7. Diagnosis of Fanconi Anemia: Mutation Analysis by Multiplex Ligation-Dependent Probe Amplification and PCR-Based Sanger Sequencing

    Johan J. P. Gille

    2012-01-01

    Full Text Available Fanconi anemia (FA is a rare inherited disease characterized by developmental defects, short stature, bone marrow failure, and a high risk of malignancies. FA is heterogeneous: 15 genetic subtypes have been distinguished so far. A clinical diagnosis of FA needs to be confirmed by testing cells for sensitivity to cross-linking agents in a chromosomal breakage test. As a second step, DNA testing can be employed to elucidate the genetic subtype of the patient and to identify the familial mutations. This knowledge allows preimplantation genetic diagnosis (PGD and enables prenatal DNA testing in future pregnancies. Although simultaneous testing of all FA genes by next generation sequencing will be possible in the near future, this technique will not be available immediately for all laboratories. In addition, in populations with strong founder mutations, a limited test using Sanger sequencing and MLPA will be a cost-effective alternative. We describe a strategy and optimized conditions for the screening of FANCA, FANCB, FANCC, FANCE, FANCF, and FANCG and present the results obtained in a cohort of 54 patients referred to our diagnostic service since 2008. In addition, the follow up with respect to genetic counseling and carrier screening in the families is discussed.

  8. Analysis and prediction of translation rate based on sequence and functional features of the mRNA.

    Tao Huang

    Full Text Available Protein concentrations depend not only on the mRNA level, but also on the translation rate and the degradation rate. Prediction of mRNA's translation rate would provide valuable information for in-depth understanding of the translation mechanism and dynamic proteome. In this study, we developed a new computational model to predict the translation rate, featured by (1 integrating various sequence-derived and functional features, (2 applying the maximum relevance & minimum redundancy method and incremental feature selection to select features to optimize the prediction model, and (3 being able to predict the translation rate of RNA into high or low translation rate category. The prediction accuracies under rich and starvation condition were 68.8% and 70.0%, respectively, evaluated by jackknife cross-validation. It was found that the following features were correlated with translation rate: codon usage frequency, some gene ontology enrichment scores, number of RNA binding proteins known to bind its mRNA product, coding sequence length, protein abundance and 5'UTR free energy. These findings might provide useful information for understanding the mechanisms of translation and dynamic proteome. Our translation rate prediction model might become a high throughput tool for annotating the translation rate of mRNAs in large-scale.

  9. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  10. Analysis of genetic diversity and population structure of oil palm (Elaeis guineensis) from China and Malaysia based on species-specific simple sequence repeat markers.

    Zhou, L X; Xiao, Y; Xia, W; Yang, Y D

    2015-12-08

    Genetic diversity and patterns of population structure of the 94 oil palm lines were investigated using species-specific simple sequence repeat (SSR) markers. We designed primers for 63 SSR loci based on their flanking sequences and conducted amplification in 94 oil palm DNA samples. The amplification result showed that a relatively high level of genetic diversity was observed between oil palm individuals according a set of 21 polymorphic microsatellite loci. The observed heterozygosity (Ho) was 0.3683 and 0.4035, with an average of 0.3859. The Ho value was a reliable determinant of the discriminatory power of the SSR primer combinations. The principal component analysis and unweighted pair-group method with arithmetic averaging cluster analysis showed the 94 oil palm lines were grouped into one cluster. These results demonstrated that the oil palm in Hainan Province of China and the germplasm introduced from Malaysia may be from the same source. The SSR protocol was effective and reliable for assessing the genetic diversity of oil palm. Knowledge of the genetic diversity and population structure will be crucial for establishing appropriate management stocks for this species.

  11. Tools for integrated sequence-structure analysis with UCSF Chimera

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  12. Sequence analysis by iterated maps, a review.

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  13. Reconstruction of the Evolutionary History of Saccharomyces cerevisiae x S. kudriavzevii Hybrids Based on Multilocus Sequence Analysis

    Peris, David; Lopes, Christian A.; Arias, Armando; Barrio, Eladio

    2012-01-01

    In recent years, interspecific hybridization and introgression are increasingly recognized as significant events in the evolution of Saccharomyces yeasts. These mechanisms have probably been involved in the origin of novel yeast genotypes and phenotypes, which in due course were to colonize and predominate in the new fermentative environments created by human manipulation. The particular conditions in which hybrids arose are still unknown, as well as the number of possible hybridization events that generated the whole set of natural hybrids described in the literature during recent years. In this study, we could infer at least six different hybridization events that originated a set of 26 S. cerevisiae x S. kudriavzevii hybrids isolated from both fermentative and non-fermentative environments. Different wine S. cerevisiae strains and European S. kudriavzevii strains were probably involved in the hybridization events according to gene sequence information, as well as from previous data on their genome composition and ploidy. Finally, we postulate that these hybrids may have originated after the introduction of vine growing and winemaking practices by the Romans to the present Northern vine-growing limits and spread during the expansion of improved viticulture and enology practices that occurred during the Late Middle Ages. PMID:23049811

  14. Reconstruction of the evolutionary history of Saccharomyces cerevisiae x S. kudriavzevii hybrids based on multilocus sequence analysis.

    David Peris

    Full Text Available In recent years, interspecific hybridization and introgression are increasingly recognized as significant events in the evolution of Saccharomyces yeasts. These mechanisms have probably been involved in the origin of novel yeast genotypes and phenotypes, which in due course were to colonize and predominate in the new fermentative environments created by human manipulation. The particular conditions in which hybrids arose are still unknown, as well as the number of possible hybridization events that generated the whole set of natural hybrids described in the literature during recent years. In this study, we could infer at least six different hybridization events that originated a set of 26 S. cerevisiae x S. kudriavzevii hybrids isolated from both fermentative and non-fermentative environments. Different wine S. cerevisiae strains and European S. kudriavzevii strains were probably involved in the hybridization events according to gene sequence information, as well as from previous data on their genome composition and ploidy. Finally, we postulate that these hybrids may have originated after the introduction of vine growing and winemaking practices by the Romans to the present Northern vine-growing limits and spread during the expansion of improved viticulture and enology practices that occurred during the Late Middle Ages.

  15. Deep sequencing-based transcriptome analysis of chicken spleen in response to avian pathogenic Escherichia coli (APEC infection.

    Qinghua Nie

    Full Text Available Avian pathogenic Escherichia coli (APEC leads to economic losses in poultry production and is also a threat to human health. The goal of this study was to characterize the chicken spleen transcriptome and to identify candidate genes for response and resistance to APEC infection using Solexa sequencing. We obtained 14422935, 14104324, and 14954692 Solexa read pairs for non-challenged (NC, challenged-mild pathology (MD, and challenged-severe pathology (SV, respectively. A total of 148197 contigs and 98461 unigenes were assembled, of which 134949 contigs and 91890 unigenes match the chicken genome. In total, 12272 annotated unigenes take part in biological processes (11664, cellular components (11927, and molecular functions (11963. Summing three specific contrasts, 13650 significantly differentially expressed unigenes were found in NC Vs. MD (6844, NC Vs. SV (7764, and MD Vs. SV (2320. Some unigenes (e.g. CD148, CD45 and LCK were involved in crucial pathways, such as the T cell receptor (TCR signaling pathway and microbial metabolism in diverse environments. This study facilitates understanding of the genetic architecture of the chicken spleen transcriptome, and has identified candidate genes for host response to APEC infection.

  16. SNAD: sequence name annotation-based designer

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  17. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  18. Frequency characteristic measurement of a fiber optic gyroscope using a correlation spectrum analysis method based on a pseudo-random sequence

    Li, Yang; Chen, Xingfan; Liu, Cheng

    2015-01-01

    The frequency characteristic is an important indicator of a system’s dynamic performance. The identification of a fiber optic gyroscope (FOG)’s frequency characteristic using a correlation spectrum analysis method based on a pseudo-random sequence is proposed. Taking the angle vibrator as the source of the test rotation stimulation and a pseudo-random sequence as the test signal, the frequency characteristic of a FOG is calculated according to the power spectral density of the rotation rate signal and the cross-power spectral density of the FOG’s output signal and rotation rate signal. A theoretical simulation is done to confirm the validity of this method. An experiment system is built and the test results indicate that the measurement error of the normalized amplitude–frequency response is less than 0.01, that the error of the phase–frequency response is less than 0.3 rad, and the overall measurement accuracy is superior to the traditional frequency-sweep method. By using this method, the FOG’s amplitude–frequency response and phase–frequency response can be measured simultaneously, quickly, accurately, and with a high frequency resolution. The described method meets the requirements of engineering applications. (paper)

  19. Phylogenetic Analysis of a ?Jewel Orchid? Genus Goodyera (Orchidaceae) Based on DNA Sequence Data from Nuclear and Plastid Regions

    Hu, Chao; Tian, Huaizhen; Li, Hongqing; Hu, Aiqun; Xing, Fuwu; Bhattacharjee, Avishek; Hsu, Tianchuan; Kumar, Pankaj; Chung, Shihwen

    2016-01-01

    A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae) based on the nuclear ribosomal internal transcribed spacer (ITS) region and two chloroplast loci (matK and trnL-F) was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal th...

  20. Phylogenetic Analysis of a 'Jewel Orchid' Genus Goodyera (Orchidaceae) Based on DNA Sequence Data from Nuclear and Plastid Regions.

    Hu, Chao; Tian, Huaizhen; Li, Hongqing; Hu, Aiqun; Xing, Fuwu; Bhattacharjee, Avishek; Hsu, Tianchuan; Kumar, Pankaj; Chung, Shihwen

    2016-01-01

    A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae) based on the nuclear ribosomal internal transcribed spacer (ITS) region and two chloroplast loci (matK and trnL-F) was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal the intrageneric relationships of Goodyera and its intergeneric relationships to related genera. The results indicate that: 1) Goodyera is not monophyletic; 2) Goodyera could be divided into four sections, viz., Goodyera, Otosepalum, Reticulum and a new section; 3) sect. Reticulum can be further divided into two subsections, viz., Reticulum and Foliosum, whereas sect. Goodyera can in turn be divided into subsections Goodyera and a new subsection.

  1. Phylogenetic Analysis of a 'Jewel Orchid' Genus Goodyera (Orchidaceae Based on DNA Sequence Data from Nuclear and Plastid Regions.

    Chao Hu

    Full Text Available A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae based on the nuclear ribosomal internal transcribed spacer (ITS region and two chloroplast loci (matK and trnL-F was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal the intrageneric relationships of Goodyera and its intergeneric relationships to related genera. The results indicate that: 1 Goodyera is not monophyletic; 2 Goodyera could be divided into four sections, viz., Goodyera, Otosepalum, Reticulum and a new section; 3 sect. Reticulum can be further divided into two subsections, viz., Reticulum and Foliosum, whereas sect. Goodyera can in turn be divided into subsections Goodyera and a new subsection.

  2. Molecular analysis of the genus Asparagus based on matK sequences and its application to identify A. racemosus, a medicinally phytoestrogenic species.

    Boonsom, Teerawat; Waranuch, Neti; Ingkaninan, Kornkanok; Denduangboripant, Jessada; Sukrong, Suchada

    2012-07-01

    The plant Asparagus racemosus is one of the most widely used sources of phytoestrogens because of its high content of the steroidal saponins, shatavarins I-IV, in roots. The dry root of A. racemosus, known as "Rak-Sam-Sip" in Thai, is one of the most popular herbal medicines, used as an anti-inflammatory, an aphrodisiac and a galactagogue. Recently, the interest in plant-derived estrogens has increased tremendously, making A. racemosus particularly important and a possible target for fraudulent labeling. However, the identification of A. racemosus is generally difficult due to its similar morphology to other Asparagus spp. Thus, accurate authentication of A. racemosus is essential. In this study, 1557-bp nucleotide sequences of the maturase K (matK) gene of eight Asparagus taxa were analyzed. A phylogenetic relationship based on the matK gene was also constructed. Ten polymorphic sites of nucleotide substitutions were found within the matK sequences. A. racemosus showed different nucleotide substitutions to the other species. A polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) analysis of the matK gene was developed to discriminate A. racemosus from others. Only the 650-bp PCR product from A. racemosus could be digested with BssKI into two fragments of 397 and 253-bp while the products of other species remained undigested. Ten commercially crude drugs were analyzed and revealed that eight samples were derived from A. racemosus while two samples of that were not. Thus, the PCR-RFLP analysis of matK gene was shown to be an effective method for authentication of the medicinally phytoestrogenic species, A. racemosus. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. DSAP: deep-sequencing small RNA analysis pipeline.

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  4. Sequence comparison and phylogenetic analysis of core gene of ...

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  5. Isolation of Exosome-Like Nanoparticles and Analysis of MicroRNAs Derived from Coconut Water Based on Small RNA High-Throughput Sequencing.

    Zhao, Zhehao; Yu, Siran; Li, Min; Gui, Xin; Li, Ping

    2018-03-21

    In this study, the presence of microRNAs in coconut water was identified by real-time polymerase chain reaction (PCR) based on the results of high-throughput small RNA sequencing. In addition, the differences in microRNA content between immature and mature coconut water were compared. A total of 47 known microRNAs belonging to 25 families and 14 new microRNAs were identified in coconut endosperm. Through analysis using a target gene prediction software, potential microRNA target genes were identified in the human genome. Real-time PCR showed that the level of most microRNAs was higher in mature coconut water than in immature coconut water. Then, exosome-like nanoparticles were isolated from coconut water. After ultracentrifugation, some particle structures were seen in coconut water samples using 1,1'-dioctadecyl-3,3,3',3'-tetramethylindocarbocyanine perchlorate fluorescence staining. Subsequent scanning electron microscopy observation and dynamic light scattering analysis also revealed some exosome-like nanoparticles in coconut water, and the mean diameters of the particles detected by the two methods were 13.16 and 59.72 nm, respectively. In conclusion, there are extracellular microRNAs in coconut water, and their levels are higher in mature coconut water than in immature coconut water. Some exosome-like nanoparticles were isolated from coconut water, and the diameter of these particles was smaller than that of animal-derived exosomes.

  6. An optimum analysis sequence for environmental gamma-ray spectrometry

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L., E-mail: fta777@hotmail.co [Universidad Autonoma de Zacatecas, Centro Regional de Estudis Nucleares, Calle Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2010-10-15

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced {chi}{sup 2} criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  7. An optimum analysis sequence for environmental gamma-ray spectrometry

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  8. Sequence Matching Analysis for Curriculum Development

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  9. Morphological character evolution of Amorphophallus (Araceae) based on a combined phylogenetic analysis of trnL, rbcL, and LEAFY second intron sequences

    Sedayu, A.; Eurlings, M.C.M.; Gravendeel, B.; Hetterscheid, W.L.A.

    2010-01-01

    Sequences of three different genes in 69 taxa of Amorphophallus were combined to reconstruct the molecular phylogeny of this species-rich Aroid genus. The data set was analyzed by three different methods, Maximum Parsimony, Maximum Likelihood and Bayesian analysis, producing slightly different tree

  10. Routine DNA analysis based on 12S rRNA gene sequencing as a tool in the management of captive primates

    van der Kuyl, A. C.; van Gennep, D. R.; Dekker, J. T.; Goudsmit, J.

    2000-01-01

    Automated DNA sequencing of a fragment of the relatively slowly evolving mitochondrial 12S rRNA gene was used to distinguish primate species, and the method was compared with species determination based upon classical taxonomy. DNA from blood from 53 monkeys housed at the Stichting AAP Shelter for

  11. Construction of an integrated database to support genomic sequence analysis

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  12. Sequence comparison and phylogenetic analysis of core gene of ...

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... and antisense primers, a single band of 573 base pairs .... Amino acid sequence alignment of Cluster I and Cluster II of phylogenetic tree. First ten sequences ... sequence weighting, postion-spiecific gap penalties and weight.

  13. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  14. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  15. Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution.

    Pausch, Hubert; Emmerling, Reiner; Gredler-Grandl, Birgit; Fries, Ruedi; Daetwyler, Hans D; Goddard, Michael E

    2017-11-09

    Genotyping and whole-genome sequencing data have been generated for hundreds of thousands of cattle. International consortia used these data to compile imputation reference panels that facilitate the imputation of sequence variant genotypes for animals that have been genotyped using dense microarrays. Association studies with imputed sequence variant genotypes allow for the characterization of quantitative trait loci (QTL) at nucleotide resolution particularly when individuals from several breeds are included in the mapping populations. We imputed genotypes for 28 million sequence variants in 17,229 cattle of the Braunvieh, Fleckvieh and Holstein breeds in order to compile large mapping populations that provide high power to identify QTL for milk production traits. Association tests between imputed sequence variant genotypes and fat and protein percentages in milk uncovered between six and thirteen QTL (P < 1e-8) per breed. Eight of the detected QTL were significant in more than one breed. We combined the results across breeds using meta-analysis and identified a total of 25 QTL including six that were not significant in the within-breed association studies. Two missense mutations in the ABCG2 (p.Y581S, rs43702337, P = 4.3e-34) and GHR (p.F279Y, rs385640152, P = 1.6e-74) genes were the top variants at QTL on chromosomes 6 and 20. Another known causal missense mutation in the DGAT1 gene (p.A232K, rs109326954, P = 8.4e-1436) was the second top variant at a QTL on chromosome 14 but its allelic substitution effects were inconsistent across breeds. It turned out that the conflicting allelic substitution effects resulted from flaws in the imputed genotypes due to the use of a multi-breed reference population for genotype imputation. Many QTL for milk production traits segregate across breeds and across-breed meta-analysis has greater power to detect such QTL than within-breed association testing. Association testing between imputed sequence variant genotypes and

  16. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  17. High-Throughput Analysis With 96-Capillary Array Electrophoresis and Integrated Sample Preparation for DNA Sequencing Based on Laser Induced Fluorescence Detection

    Xue, Gang [Iowa State Univ., Ames, IA (United States)

    2001-01-01

    The purpose of this research was to improve the fluorescence detection for the multiplexed capillary array electrophoresis, extend its use beyond the genomic analysis, and to develop an integrated micro-sample preparation system for high-throughput DNA sequencing. The authors first demonstrated multiplexed capillary zone electrophoresis (CZE) and micellar electrokinetic chromatography (MEKC) separations in a 96-capillary array system with laser-induced fluorescence detection. Migration times of four kinds of fluoresceins and six polyaromatic hydrocarbons (PAHs) are normalized to one of the capillaries using two internal standards. The relative standard deviations (RSD) after normalization are 0.6-1.4% for the fluoresceins and 0.1-1.5% for the PAHs. Quantitative calibration of the separations based on peak areas is also performed, again with substantial improvement over the raw data. This opens up the possibility of performing massively parallel separations for high-throughput chemical analysis for process monitoring, combinatorial synthesis, and clinical diagnosis. The authors further improved the fluorescence detection by step laser scanning. A computer-controlled galvanometer scanner is adapted for scanning a focused laser beam across a 96-capillary array for laser-induced fluorescence detection. The signal at a single photomultiplier tube is temporally sorted to distinguish among the capillaries. The limit of detection for fluorescein is 3 x 10-11 M (S/N = 3) for 5-mW of total laser power scanned at 4 Hz. The observed cross-talk among capillaries is 0.2%. Advantages include the efficient utilization of light due to the high duty-cycle of step scan, good detection performance due to the reduction of stray light, ruggedness due to the small mass of the galvanometer mirror, low cost due to the simplicity of components, and flexibility due to the independent paths for excitation and emission.

  18. A New Images Hiding Scheme Based on Chaotic Sequences

    LIU Nian-sheng; GUO Dong-hui; WU Bo-xi; Parr G

    2005-01-01

    We propose a data hidding technique in a still image. This technique is based on chaotic sequence in the transform domain of covert image. We use different chaotic random sequences multiplied by multiple sensitive images, respectively, to spread the spectrum of sensitive images. Multiple sensitive images are hidden in a covert image as a form of noise. The results of theoretical analysis and computer simulation show the new hiding technique have better properties with high security, imperceptibility and capacity for hidden information in comparison with the conventional scheme such as LSB (Least Significance Bit).

  19. HIV-1 transmission patterns in antiretroviral therapy-naive, HIV-infected North Americans based on phylogenetic analysis by population level and ultra-deep DNA sequencing.

    Lisa L Ross

    Full Text Available Factors that contribute to the transmission of human immunodeficiency virus type 1 (HIV-1, especially drug-resistant HIV-1 variants remain a significant public health concern. In-depth phylogenetic analyses of viral sequences obtained in the screening phase from antiretroviral-naïve HIV-infected patients seeking enrollment in EPZ108859, a large open-label study in the USA, Canada and Puerto Rico (ClinicalTrials.gov NCT00440947 were examined for insights into the roles of drug resistance and epidemiological factors that could impact disease dissemination. Viral transmission clusters (VTCs were initially predicted from a phylogenetic analysis of population level HIV-1 pol sequences obtained from 690 antiretroviral-naïve subjects in 2007. Subsequently, the predicted VTCs were tested for robustness by ultra deep sequencing (UDS using pyrosequencing technology and further phylogenetic analyses. The demographic characteristics of clustered and non-clustered subjects were then compared. From 690 subjects, 69 were assigned to 1 of 30 VTCs, each containing 2 to 5 subjects. Race composition of VTCs were significantly more likely to be white (72% vs. 60%; p = 0.04. VTCs had fewer reverse transcriptase and major PI resistance mutations (9% vs. 24%; p = 0.002 than non-clustered sequences. Both men-who-have-sex-with-men (MSM (68% vs. 48%; p = 0.001 and Canadians (29% vs. 14%; p = 0.03 were significantly more frequent in VTCs than non-clustered sequences. Of the 515 subjects who initiated antiretroviral therapy, 33 experienced confirmed virologic failure through 144 weeks while only 3/33 were from VTCs. Fewer VTCs subjects (as compared to those with non-clustering virus had HIV-1 with resistance-associated mutations or experienced virologic failure during the course of the study. Our analysis shows specific geographical and drug resistance trends that correlate well with transmission clusters defined by HIV sequences of similarity

  20. A basic analysis toolkit for biological sequences

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  1. DNA Fingerprinting of Lactobacillus crispatus Strain CTV-05 by Repetitive Element Sequence-Based PCR Analysis in a Pilot Study of Vaginal Colonization

    Antonio, May A. D.; Hillier, Sharon L.

    2003-01-01

    Lactobacillus crispatus is one of the predominant hydrogen peroxide (H2O2)-producing species found in the vagina and is under development as a probiotic for the treatment of bacterial vaginosis. In this study, we assessed whether DNA fingerprinting by repetitive element sequence-based PCR (rep-PCR) can be used to distinguish the capsule strain of L. crispatus (CTV-05) from other endogenous strains as well as other species of vaginal lactobacilli. Vaginal and rectal lactobacilli were identifie...

  2. Network clustering coefficient approach to DNA sequence analysis

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  3. Whole genome sequence analysis of Mycobacterium suricattae

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  4. Whole genome sequence analysis of Mycobacterium suricattae

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  5. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  6. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering

  7. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  8. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

    Xiang Li-xin

    2010-08-01

    Full Text Available Abstract Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host

  9. DNA sequence modeling based on context trees

    Kusters, C.J.; Ignatenko, T.; Roland, J.; Horlin, F.

    2015-01-01

    Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such

  10. Skeleton-based human action recognition using multiple sequence alignment

    Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong

    2015-05-01

    Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.

  11. Comparative analysis of sequences from PT 2013

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  12. Characterization of Fusobacterium varium Fv113-g1 isolated from a patient with ulcerative colitis based on complete genome sequence and transcriptome analysis.

    Tsuyoshi Sekizuka

    Full Text Available Fusobacterium spp. present in the oral and gut flora is carcinogenic and is associated with the risk of pancreatic and colorectal cancers. Fusobacterium spp. is also implicated in a broad spectrum of human pathologies, including Crohn's disease and ulcerative colitis (UC. Here we report the complete genome sequence of Fusobacterium varium Fv113-g1 (genome size, 3.96 Mb isolated from a patient with UC. Comparative genome analyses totally suggested that Fv113-g1 is basically assigned as F. varium, in particular, it could be reclassified as notable F. varium subsp. similar to F. ulcerans because of partial shared orthologs. Compared with the genome sequences of F. varium ATCC 27725 (genome size, 3.30 Mb and other strains of Fusobacterium spp., Fv113-g1 possesses many accessary pan-genome sequences with noteworthy multiple virulence factors, including 44 autotransporters (type V secretion system, T5SS and 13 Fusobacterium adhesion (FadA paralogs involved in potential mucosal inflammation. Indeed, transcriptome analysis demonstrated that Fv113-g1-specific accessary genes, such as multiple T5SS and fadA paralogs, showed notably increased expression with D-MEM cultivation than with brain heart infusion broth. This implied that growth condition may enhance the expression of such potential virulence factors, leading to remarkable survival against other gut microorganisms and to the pathogenicity to human intestinal epithelium.

  13. Time fluctuation analysis of forest fire sequences

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  14. SVAMP: Sequence variation analysis, maps and phylogeny

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  15. Statistical analysis of next generation sequencing data

    Nettleton, Dan

    2014-01-01

    Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...

  16. Genetic diversity analysis of sugarcane germplasm based on fluorescence-labeled simple sequence repeat markers and a capillary electrophoresis-based genotyping platform

    Genetic diversity analysis, which refers to the elaboration of total extent of genetic characteristics in the genetic makeup of a certain species, constitutes a classical strategy for the study of diversity, population genetic structure, and breeding practices. In this study, fluorescence-labeled se...

  17. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  18. Sirius PSB: a generic system for analysis of biological sequences.

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  19. Amino Acids Sequence Based in Silico Analysis of RuBisCO (Ribulose-1,5 Bisphosphate Carboxylase Oxygenase Proteins in Some Carthamus L. ssp.

    Emre SEVİNDİK

    2017-06-01

    Full Text Available RuBisCO is an important enzyme for plants to photosynthesize and balance carbon dioxide in the atmosphere. This study aimed to perform sequence, physicochemical, phylogenetic and 3D (three-dimensional comparative analyses of RuBisCO proteins in the Carthamus ssp. using various bioinformatics tools. The sequence lengths of the RuBisCO proteins were between 166 and 477 amino acids, with an average length of 411.8 amino acids. Their molecular weights (Mw ranged from 18711.47 to 52843.09 Da; the most acidic and basic protein sequences were detected in C. tinctorius (pI = 5.99 and in C. tenuis (pI = 6.92, respectively. The extinction coefficients of RuBisCO proteins at 280 nm ranged from 17,670 to 69,830 M-1 cm-1, the instability index (II values for RuBisCO proteins ranged from 33.31 to 39.39, while the GRAVY values of RuBisCO proteins ranged from -0.313 to -0.250. The most abundant amino acid in the RuBisCO protein was Gly (9.7%, while the least amino acid ratio was Trp (1.6 %. The putative phosphorylation sites of RuBisCO proteins were determined by NetPhos 2.0. Phylogenetic analysis revealed that RuBisCO proteins formed two main clades. A RAMPAGE analysis revealed that 96.3%-97.6% of residues were located in the favoured region of RuBisCO proteins. To predict the three dimensional (3D structure of the RuBisCO proteins PyMOL was used. The results of the current study provide insights into fundamental characteristic of RuBisCO proteins in Carthamus ssp.

  20. Systematic relationships among Lutzomyia sand flies (Diptera: Psychodidae) of Peru and Colombia based on the analysis of 12S and 28S ribosomal DNA sequences.

    Beati, Lorenza; Cáceres, Abraham G; Lee, Jamie A; Munstermann, Leonard E

    2004-02-01

    Lutzomyia spp. are New World phlebotomine sand flies, many of which are involved in the transmission of human diseases, such as leishmaniases and bartonellosis. The systematic classification of the approximately 400 species in the genus has been based on morphological characters, but the relationships within the genus are still very much in question. We have inferred phylogenies of 32 species of phlebotomine sand flies belonging to seven sub-genera and two species groups, by using fragments of the mitochondrial small subunit (12SrRNA) and of the nuclear large subunit (28SrRNA) ribosomal gene sequences. The subgenus Helcocyrtomyia and the Verrucarum species group, prominent representatives of the Peruvian sand fly fauna, were represented by 11 and 7 species, respectively. Although based on a limited number of taxa, the resulting phylogenies, based on 837 characters, provide an initial phylogenetic backbone for the progressive reconstruction of infrageneric relationships within Lutzomyia.

  1. Illumina sequencing-based analysis of a microbial community enriched under anaerobic methane oxidation condition coupled to denitrification revealed coexistence of aerobic and anaerobic methanotrophs.

    Siniscalchi, Luciene Alves Batista; Leite, Laura Rabelo; Oliveira, Guilherme; Chernicharo, Carlos Augusto Lemos; de Araújo, Juliana Calabria

    2017-07-01

    Methane is produced in anaerobic environments, such as reactors used to treat wastewaters, and can be consumed by methanotrophs. The composition and structure of a microbial community enriched from anaerobic sewage sludge under methane-oxidation condition coupled to denitrification were investigated. Denaturing gradient gel electrophoresis (DGGE) analysis retrieved sequences of Methylocaldum and Chloroflexi. Deep sequencing analysis revealed a complex community that changed over time and was affected by methane concentration. Methylocaldum (8.2%), Methylosinus (2.3%), Methylomonas (0.02%), Methylacidiphilales (0.45%), Nitrospirales (0.18%), and Methanosarcinales (0.3%) were detected. Despite denitrifying conditions provided, Nitrospirales and Methanosarcinales, known to perform anaerobic methane oxidation coupled to denitrification (DAMO) process, were in very low abundance. Results demonstrated that aerobic and anaerobic methanotrophs coexisted in the reactor together with heterotrophic microorganisms, suggesting that a diverse microbial community was important to sustain methanotrophic activity. The methanogenic sludge was a good inoculum to enrich methanotrophs, and cultivation conditions play a selective role in determining community composition.

  2. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data.

    He, Zihuai; Xu, Bin; Lee, Seunggeun; Ionita-Laza, Iuliana

    2017-09-07

    Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  4. An Imaging And Graphics Workstation For Image Sequence Analysis

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  5. Molecular characterization and phylogenetic analysis of Explanatum explanatum in India based on nucleotide sequences of ribosomal ITS2 and the mitochondrial gene nad1.

    Hayashi, Kei; Mohanta, Uday K; Ohari, Yuma; Neeraja, Tambireddy; Singh, T Shantikumar; Sugiyama, Hiromu; Itagaki, Tadashi

    2016-12-01

    The aim of this study was to analyze the phylogenetic relationship between Explanatum explanatum populations in India and other countries of the Indian subcontinent. Seventy liver amphistomes collected from four localities in India were identified as E. explanatum based on the nucleotide sequences of ribosomal ITS2. The flukes were then analyzed phylogenetically based on the nucleotide sequence of the mitochondrial gene nad1 in comparison with flukes from Bangladesh and Nepal. In the resulting phylogenetic tree, the nad1 haplotypes from India were divided into four clades, and the flukes showing the haplotypes of clades A and C were predominant in India. The haplotypes of the clades A and C have also been detected in Bangladesh and Nepal, and therefore, it seems they occur commonly throughout the Indian subcontinent. The results of AMOVA suggested that gene flow was likely to occur between E. explanatum populations in these countries. These countries are geographically close and have been historically and culturally connected to each other, and therefore, the movements of host ruminants among these countries might have been involved in the migration of the flukes and their gene flow.

  6. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  7. Centroid based clustering of high throughput sequencing reads based on n-mer counts.

    Solovyov, Alexander; Lipkin, W Ian

    2013-09-08

    Many problems in computational biology require alignment-free sequence comparisons. One of the common tasks involving sequence comparison is sequence clustering. Here we apply methods of alignment-free comparison (in particular, comparison using sequence composition) to the challenge of sequence clustering. We study several centroid based algorithms for clustering sequences based on word counts. Study of their performance shows that using k-means algorithm with or without the data whitening is efficient from the computational point of view. A higher clustering accuracy can be achieved using the soft expectation maximization method, whereby each sequence is attributed to each cluster with a specific probability. We implement an open source tool for alignment-free clustering. It is publicly available from github: https://github.com/luscinius/afcluster. We show the utility of alignment-free sequence clustering for high throughput sequencing analysis despite its limitations. In particular, it allows one to perform assembly with reduced resources and a minimal loss of quality. The major factor affecting performance of alignment-free read clustering is the length of the read.

  8. Protein sequencing via nanopore based devices: a nanofluidics perspective

    Chinappi, Mauro; Cecconi, Fabio

    2018-05-01

    Proteins perform a huge number of central functions in living organisms, thus all the new techniques allowing their precise, fast and accurate characterization at single-molecule level certainly represent a burst in proteomics with important biomedical impact. In this review, we describe the recent progresses in the developing of nanopore based devices for protein sequencing. We start with a critical analysis of the main technical requirements for nanopore protein sequencing, summarizing some ideas and methodologies that have recently appeared in the literature. In the last sections, we focus on the physical modelling of the transport phenomena occurring in nanopore based devices. The multiscale nature of the problem is discussed and, in this respect, some of the main possible computational approaches are illustrated.

  9. Quantitative risk trends deriving from PSA-based event analyses. Analysis of results from U.S.NRC's accident sequence precursor program

    Watanabe, Norio

    2004-01-01

    The United States Nuclear Regulatory Commission (U.S.NRC) has been carrying out the Accident Sequence Precursor (ASP) Program to identify and categorize precursors to potential severe core damage accident sequences using the probabilistic safety assessment (PSA) technique. The ASP Program has identified a lot of risk significant events as precursors that occurred at U.S. nuclear power plants. Although the results from the ASP Program include valuable information that could be useful for obtaining and characterizing risk significant insights and for monitoring risk trends in nuclear power industry, there are only a few attempts to determine and develop the trends using the ASP results. The present study examines and discusses quantitative risk trends for the industry level, using two indicators, that is, the occurrence frequency of precursors and the annual core damage probability, deriving from the results of the ASP analysis. It is shown that the core damage risk at U.S. nuclear power plants has been lowered and the likelihood of risk significant events has been remarkably decreasing. As well, the present study demonstrates that two risk indicators used here can provide quantitative information useful for examining and monitoring the risk trends and/or risk characteristics in nuclear power industry. (author)

  10. Novel algorithms for protein sequence analysis

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  11. Pig genome sequence - analysis and publication strategy

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  12. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    Brown, Judith K.

    2015-04-18

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  13. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    Brown, Judith K.; Zerbini, F. Murilo; Navas-Castillo, Jesú s; Moriones, Enrique; Ramos-Sobrinho, Roberto; Silva, José C. F.; Fiallo-Olivé , Elvira; Briddon, Rob W.; Herná ndez-Zepeda, Cecilia; Idris, Ali; Malathi, V. G.; Martin, Darren P.; Rivera-Bustamante, Rafael; Ueda, Shigenori; Varsani, Arvind

    2015-01-01

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  14. Characterization and sequence analysis of cysteine and glycine-rich ...

    Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).

  15. Phylogeny reconstruction and hybrid analysis of populus (Salicaceae) based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments.

    Wang, Zhaoshan; Du, Shuhui; Dayanandan, Selvadurai; Wang, Dongsheng; Zeng, Yanfei; Zhang, Jianguo

    2014-01-01

    Populus (Salicaceae) is one of the most economically and ecologically important genera of forest trees. The complex reticulate evolution and lack of highly variable orthologous single-copy DNA markers have posed difficulties in resolving the phylogeny of this genus. Based on a large data set of nuclear and plastid DNA sequences, we reconstructed robust phylogeny of Populus using parsimony, maximum likelihood and Bayesian inference methods. The resulting phylogenetic trees showed better resolution at both inter- and intra-sectional level than previous studies. The results revealed that (1) the plastid-based phylogenetic tree resulted in two main clades, suggesting an early divergence of the maternal progenitors of Populus; (2) three advanced sections (Populus, Aigeiros and Tacamahaca) are of hybrid origin; (3) species of the section Tacamahaca could be divided into two major groups based on plastid and nuclear DNA data, suggesting a polyphyletic nature of the section; and (4) many species proved to be of hybrid origin based on the incongruence between plastid and nuclear DNA trees. Reticulate evolution may have played a significant role in the evolution history of Populus by facilitating rapid adaptive radiations into different environments.

  16. Phylogeny reconstruction and hybrid analysis of populus (Salicaceae based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments.

    Zhaoshan Wang

    Full Text Available Populus (Salicaceae is one of the most economically and ecologically important genera of forest trees. The complex reticulate evolution and lack of highly variable orthologous single-copy DNA markers have posed difficulties in resolving the phylogeny of this genus. Based on a large data set of nuclear and plastid DNA sequences, we reconstructed robust phylogeny of Populus using parsimony, maximum likelihood and Bayesian inference methods. The resulting phylogenetic trees showed better resolution at both inter- and intra-sectional level than previous studies. The results revealed that (1 the plastid-based phylogenetic tree resulted in two main clades, suggesting an early divergence of the maternal progenitors of Populus; (2 three advanced sections (Populus, Aigeiros and Tacamahaca are of hybrid origin; (3 species of the section Tacamahaca could be divided into two major groups based on plastid and nuclear DNA data, suggesting a polyphyletic nature of the section; and (4 many species proved to be of hybrid origin based on the incongruence between plastid and nuclear DNA trees. Reticulate evolution may have played a significant role in the evolution history of Populus by facilitating rapid adaptive radiations into different environments.

  17. Stenostomum cf. leucops (Platyhelminthes in Thailand: a surface observation using scanning electron microscopy and phylogenetic analysis based on 18S ribosomal DNA sequences

    Arin Ngamniyom

    2016-02-01

    Full Text Available The genus Stenostomum contains small turbellaria that are widely distributed in freshwater environments worldwide. However, there are only rare reports or studies of this genus from Thailand. Therefore, the objective of this study was to report S. cf. leucops in Thailand collected from Pathum Thani Province. The worm morphology and surface topography using scanning electron microscopy were determined. Moreover, the phylogenetic tree of S. cf. leucops was analysed with 17 flatworms based on the 18S ribosomal DNA sequences. The phylogenetic relationship shared a common ancestry of Catenulida species, and S. cf. leucops displayed a monophyletic pattern within Stenostomum spp. The results of the morphological and molecular data are discussed. These results may increase the knowledge of freshwater microturbellarians in Thailand.

  18. Putative recombination events and evolutionary history of five economically important viruses of fruit trees based on coat protein-encoding gene sequence analysis.

    Boulila, Moncef

    2010-06-01

    To enhance the knowledge of recombination as an evolutionary process, 267 accessions retrieved from GenBank were investigated, all belonging to five economically important viruses infecting fruit crops (Plum pox, Apple chlorotic leaf spot, Apple mosaic, Prune dwarf, and Prunus necrotic ringspot viruses). Putative recombinational events were detected in the coat protein (CP)-encoding gene using RECCO and RDP version 3.31beta algorithms. Based on RECCO results, all five viruses were shown to contain potential recombination signals in the CP gene. Reconstructed trees with modified topologies were proposed. Furthermore, RECCO performed better than the RDP package in detecting recombination events and exhibiting their evolution rate along the sequences of the five viruses. RDP, however, provided the possible major and minor parents of the recombinants. Thus, the two methods should be considered complementary.

  19. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or 15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  20. Incident sequence analysis; event trees, methods and graphical symbols

    1980-11-01

    When analyzing incident sequences, unwanted events resulting from a certain cause are looked for. Graphical symbols and explanations of graphical representations are presented. The method applies to the analysis of incident sequences in all types of facilities. By means of the incident sequence diagram, incident sequences, i.e. the logical and chronological course of repercussions initiated by the failure of a component or by an operating error, can be presented and analyzed simply and clearly

  1. Profiling of secondary metabolite gene clusters regulated by LaeA in Aspergillus niger FGSC A1279 based on genome sequencing and transcriptome analysis.

    Wang, Bin; Lv, Yangyong; Li, Xuejie; Lin, Yiying; Deng, Hai; Pan, Li

    The global regulator LaeA controls the production of many fungal secondary metabolites, possibly via chromatin remodeling. Here we aimed to survey the secondary metabolite profile regulated by LaeA in Aspergillus niger FGSC A1279 by genome sequencing and comparative transcriptomics between the laeA deletion (ΔlaeA) and overexpressing (OE-laeA) mutants. Genome sequencing revealed four putative polyketide synthase genes specific to FGSC A1279, suggesting that the corresponding polyketide compounds might be unique to FGSC A1279. RNA-seq data revealed 281 putative secondary metabolite genes upregulated in the OE-laeA mutants, including 22 secondary metabolite backbone genes. LC-MS chemical profiling illustrated that many secondary metabolites were produced in OE-laeA mutants compared to wild type and ΔlaeA mutants, providing potential resources for drug discovery. KEGG analysis annotated 16 secondary metabolite clusters putatively linked to metabolic pathways. Furthermore, 34 of 61 Zn 2 Cys 6 transcription factors located in secondary metabolite clusters were differentially expressed between ΔlaeA and OE-laeA mutants. Three secondary metabolite clusters (cluster 18, 30 and 33) containing Zn 2 Cys 6 transcription factors that were upregulated in OE-laeA mutants were putatively linked to KEGG pathways, suggesting that Zn 2 Cys 6 transcription factors might play an important role in synthesizing secondary metabolites regulated by LaeA. Taken together, LaeA dramatically influences the secondary metabolite profile in FGSC A1279. Copyright © 2017 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  2. Model-based quality assessment and base-calling for second-generation sequencing data.

    Bravo, Héctor Corrada; Irizarry, Rafael A

    2010-09-01

    Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in

  3. Linear discriminant analysis of character sequences using occurrences of words

    Dutta, Subhajit; Chaudhuri, Probal; Ghosh, Anil

    2014-01-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  4. Linear discriminant analysis of character sequences using occurrences of words

    Dutta, Subhajit

    2014-02-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  5. Function-Based Algorithms for Biological Sequences

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  6. Sub-grouping of Plasmodium falciparum 3D7 var genes based on sequence analysis of coding and non-coding regions

    Lavstsen, Thomas; Salanti, Ali; Jensen, Anja T R

    2003-01-01

    and organization of the 3D7 PfEMP1 repertoire was investigated on the basis of the complete genome sequence. METHODS: Using two tree-building methods we analysed the coding and non-coding sequences of 3D7 var and rif genes as well as var genes of other parasite strains. RESULTS: var genes can be sub...

  7. The mechanism of enhanced wastewater nitrogen removal by photo-sequencing batch reactors based on comprehensive analysis of system dynamics within a cycle.

    Ye, Jianfeng; Liang, Junyu; Wang, Liang; Markou, Giorgos

    2018-07-01

    To understand the mechanism of enhanced nitrogen removal by photo-sequencing batch reactors (photo-SBRs), which incorporated microalgal photosynthetic oxygenation into the aerobic phases of a conventional cycle, this study performed comprehensive analysis of one-cycle dynamics. Under a low aeration intensity (about 0.02 vvm), a photo-SBR, illuminated with light at 92.27 μ·mol·m -2 ·s -1 , could remove 99.45% COD, 99.93% NH 4 + -N, 90.39% TN, and 95.17% TP, while the control SBR could only remove 98.36% COD, 83.51% NH 4 + -N, 78.96% TN, and 97.75% TP, for a synthetic domestic sewage. The specific oxygen production rate (SOPR) of microalgae in the photo-SBR could reach 6.63 fmol O 2 ·cell -1 ·h -1 . One-cycle dynamics shows that the enhanced nitrogen removal by photo-SBRs is related to photosynthetic oxygenation, resulting in strengthened nitrification, instead of direct nutrient uptake by microalgae. A too high light or aeration intensity could deteriorate anoxic conditions and thus adversely affect the removal of TN and TP in photo-SBRs. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  9. Recurrence plot analysis of DNA sequences

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  10. Accident Sequence Precursor Analysis for SGTR by Using Dynamic PSA Approach

    Lee, Han Sul; Heo, Gyun Young; Kim, Tae Wan

    2016-01-01

    In order to address this issue, this study suggests the sequence tree model to analyze accident sequence systematically. Using the sequence tree model, all possible scenarios which need a specific safety action to prevent the core damage can be identified and success conditions of safety action under complicated situation such as combined accident will be also identified. Sequence tree is branch model to divide plant condition considering the plant dynamics. Since sequence tree model can reflect the plant dynamics, arising from interaction of different accident timing and plant condition and from the interaction between the operator action, mitigation system, and the indicators for operation, sequence tree model can be used to develop the dynamic event tree model easily. Target safety action for this study is a feed-and-bleed (F and B) operation. A F and B operation directly cools down the reactor cooling system (RCS) using the primary cooling system when residual heat removal by the secondary cooling system is not available. In this study, a TLOFW accident and a TLOFW accident with LOCA were the target accidents. Based on the conventional PSA model and indicators, the sequence tree model for a TLOFW accident was developed. Based on the results of a sampling analysis and data from the conventional PSA model, the CDF caused by Sequence no. 26 can be realistically estimated. For a TLOFW accident with LOCA, second accident timings were categorized according to plant condition. Indicators were selected as branch point using the flow chart and tables, and a corresponding sequence tree model was developed. If sampling analysis is performed, practical accident sequences can be identified based on the sequence analysis. If a realistic distribution for the variables can be obtained for sampling analysis, much more realistic accident sequences can be described. Moreover, if the initiating event frequency under a combined accident can be quantified, the sequence tree model

  11. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  12. Analysis of Neuronal Sequences Using Pairwise Biases

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order

  13. Google matrix analysis of DNA sequences.

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  14. Google matrix analysis of DNA sequences.

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  15. POU4F3 mutation screening in Japanese hearing loss patients: Massively parallel DNA sequencing-based analysis identified novel variants associated with autosomal dominant hearing loss.

    Tomohiro Kitano

    Full Text Available A variant in a transcription factor gene, POU4F3, is responsible for autosomal dominant nonsyndromic hereditary hearing loss, DFNA15. To date, 14 variants, including a whole deletion of POU4F3, have been reported to cause HL in various ethnic groups. In the present study, genetic screening for POU4F3 variants was carried out for a large series of Japanese hearing loss (HL patients to clarify the prevalence and clinical characteristics of DFNA15 in the Japanese population. Massively parallel DNA sequencing of 68 target candidate genes was utilized in 2,549 unrelated Japanese HL patients (probands to identify genomic variations responsible for HL. The detailed clinical features in patients with POU4F3 variants were collected from medical charts and analyzed. Novel 12 POU4F3 likely pathogenic variants (six missense variants, three frameshift variants, and three nonsense variants were successfully identified in 15 probands (2.5% among 602 families exhibiting autosomal dominant HL, whereas no variants were detected in the other 1,947 probands with autosomal recessive or inheritance pattern unknown HL. To obtain the audiovestibular configuration of the patients harboring POU4F3 variants, we collected audiograms and vestibular symptoms of the probands and their affected family members. Audiovestibular phenotypes in a total of 24 individuals from the 15 families possessing variants were characterized by progressive HL, with a large variation in the onset age and severity with or without vestibular symptoms observed. Pure-tone audiograms indicated the most prevalent configuration as mid-frequency HL type followed by high-frequency HL type, with asymmetry observed in approximately 20% of affected individuals. Analysis of the relationship between age and pure-tone average suggested that individuals with truncating variants showed earlier onset and slower progression of HL than did those with non-truncating variants. The present study showed that variants

  16. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  17. Analysis of correlations between sites in models of protein sequences

    Giraud, B.G.; Lapedes, A.; Liu, L.C.

    1998-01-01

    A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences; however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from 2, illustrating the relation to familiar classical spin systems, to 20 states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces nonindependence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: The more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor. copyright 1998 The American Physical Society

  18. Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

    Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

    2018-04-01

    The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.

  19. Genome-Wide Analysis of Simple Sequence Repeats and Efficient Development of Polymorphic SSR Markers Based on Whole Genome Re-Sequencing of Multiple Isolates of the Wheat Stripe Rust Fungus.

    Huaiyong Luo

    Full Text Available The biotrophic parasitic fungus Puccinia striiformis f. sp. tritici (Pst causes stripe rust, a devastating disease of wheat, endangering global food security. Because the Pst population is highly dynamic, it is difficult to develop wheat cultivars with durable and highly effective resistance. Simple sequence repeats (SSRs are widely used as molecular markers in genetic studies to determine population structure in many organisms. However, only a small number of SSR markers have been developed for Pst. In this study, a total of 4,792 SSR loci were identified using the whole genome sequences of six isolates from different regions of the world, with a marker density of one SSR per 22.95 kb. The majority of the SSRs were di- and tri-nucleotide repeats. A database containing 1,113 SSR markers were established. Through in silico comparison, the previously reported SSR markers were found mainly in exons, whereas the SSR markers in the database were mostly in intergenic regions. Furthermore, 105 polymorphic SSR markers were confirmed in silico by their identical positions and nucleotide variations with INDELs identified among the six isolates. When 104 in silico polymorphic SSR markers were used to genotype 21 Pst isolates, 84 produced the target bands, and 82 of them were polymorphic and revealed the genetic relationships among the isolates. The results show that whole genome re-sequencing of multiple isolates provides an ideal resource for developing SSR markers, and the newly developed SSR markers are useful for genetic and population studies of the wheat stripe rust fungus.

  20. Genome-Wide Analysis of Simple Sequence Repeats and Efficient Development of Polymorphic SSR Markers Based on Whole Genome Re-Sequencing of Multiple Isolates of the Wheat Stripe Rust Fungus.

    Luo, Huaiyong; Wang, Xiaojie; Zhan, Gangming; Wei, Guorong; Zhou, Xinli; Zhao, Jing; Huang, Lili; Kang, Zhensheng

    2015-01-01

    The biotrophic parasitic fungus Puccinia striiformis f. sp. tritici (Pst) causes stripe rust, a devastating disease of wheat, endangering global food security. Because the Pst population is highly dynamic, it is difficult to develop wheat cultivars with durable and highly effective resistance. Simple sequence repeats (SSRs) are widely used as molecular markers in genetic studies to determine population structure in many organisms. However, only a small number of SSR markers have been developed for Pst. In this study, a total of 4,792 SSR loci were identified using the whole genome sequences of six isolates from different regions of the world, with a marker density of one SSR per 22.95 kb. The majority of the SSRs were di- and tri-nucleotide repeats. A database containing 1,113 SSR markers were established. Through in silico comparison, the previously reported SSR markers were found mainly in exons, whereas the SSR markers in the database were mostly in intergenic regions. Furthermore, 105 polymorphic SSR markers were confirmed in silico by their identical positions and nucleotide variations with INDELs identified among the six isolates. When 104 in silico polymorphic SSR markers were used to genotype 21 Pst isolates, 84 produced the target bands, and 82 of them were polymorphic and revealed the genetic relationships among the isolates. The results show that whole genome re-sequencing of multiple isolates provides an ideal resource for developing SSR markers, and the newly developed SSR markers are useful for genetic and population studies of the wheat stripe rust fungus.

  1. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  2. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    Yunyun Liang

    2015-01-01

    Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM. Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS, segmented PsePSSM, and segmented autocovariance transformation (ACT based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640 are adopted in this paper. Then a 700-dimensional (700D feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA. To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  3. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments.

    Dietrich, Susanne; Borst, Nadine; Schlee, Sandra; Schneider, Daniel; Janda, Jan-Oliver; Sterner, Reinhard; Merkl, Rainer

    2012-07-17

    The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an

  4. Analysis of common SHOX gene sequence variants and ∼4.9-kb ...

    [Solc R., Hirschfeldova K., Kebrdlova V. and Baxova A. 2014 Analysis of common SHOX gene sequence variants ... based on a Gibbs sampling strategy were done using .... SHOX (short stature homeobox) are an important cause of growth.

  5. Sequence memory based on coherent spin-interaction neural networks.

    Xia, Min; Wong, W K; Wang, Zhijie

    2014-12-01

    Sequence information processing, for instance, the sequence memory, plays an important role on many functions of brain. In the workings of the human brain, the steady-state period is alterable. However, in the existing sequence memory models using heteroassociations, the steady-state period cannot be changed in the sequence recall. In this work, a novel neural network model for sequence memory with controllable steady-state period based on coherent spininteraction is proposed. In the proposed model, neurons fire collectively in a phase-coherent manner, which lets a neuron group respond differently to different patterns and also lets different neuron groups respond differently to one pattern. The simulation results demonstrating the performance of the sequence memory are presented. By introducing a new coherent spin-interaction sequence memory model, the steady-state period can be controlled by dimension parameters and the overlap between the input pattern and the stored patterns. The sequence storage capacity is enlarged by coherent spin interaction compared with the existing sequence memory models. Furthermore, the sequence storage capacity has an exponential relationship to the dimension of the neural network.

  6. Genotyping by sequencing for SNP-based linkage analysis and identification of QTLs linked to fruit quality traits in Japanese plum (Prunus salicina Lindl.)

    Marker-assisted selection (MAS) in stone fruit (Prunus species) breeding is currently difficult to achieve due to the polygenic nature of themost relevant agronomic traits linked to fruit quality. Genotyping by sequencing (GBS), however, provides a large quantity of useful data suitable for finemapp...

  7. Parametric inference for biological sequence analysis.

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  8. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  9. Molecular identification and phylogenetic analysis of important medicinal plant species in genus Paeonia based on rDNA-ITS, matK, and rbcL DNA barcode sequences.

    Kim, W J; Ji, Y; Choi, G; Kang, Y M; Yang, S; Moon, B C

    2016-08-05

    This study was performed to identify and analyze the phylogenetic relationship among four herbaceous species of the genus Paeonia, P. lactiflora, P. japonica, P. veitchii, and P. suffruticosa, using DNA barcodes. These four species, which are commonly used in traditional medicine as Paeoniae Radix and Moutan Radicis Cortex, are pharmaceutically defined in different ways in the national pharmacopoeias in Korea, Japan, and China. To authenticate the different species used in these medicines, we evaluated rDNA-internal transcribed spacers (ITS), matK and rbcL regions, which provide information capable of effectively distinguishing each species from one another. Seventeen samples were collected from different geographic regions in Korea and China, and DNA barcode regions were amplified using universal primers. Comparative analyses of these DNA barcode sequences revealed species-specific nucleotide sequences capable of discriminating the four Paeonia species. Among the entire sequences of three barcodes, marker nucleotides were identified at three positions in P. lactiflora, eleven in P. japonica, five in P. veitchii, and 25 in P. suffruticosa. Phylogenetic analyses also revealed four distinct clusters showing homogeneous clades with high resolution at the species level. The results demonstrate that the analysis of these three DNA barcode sequences is a reliable method for identifying the four Paeonia species and can be used to authenticate Paeoniae Radix and Moutan Radicis Cortex at the species level. Furthermore, based on the assessment of amplicon sizes, inter/intra-specific distances, marker nucleotides, and phylogenetic analysis, rDNA-ITS was the most suitable DNA barcode for identification of these species.

  10. Editorial: Special Issue on Algorithms for Sequence Analysis and Storage

    Veli Mäkinen

    2014-03-01

    Full Text Available This special issue of Algorithms is dedicated to approaches to biological sequence analysis that have algorithmic novelty and potential for fundamental impact in methods used for genome research.

  11. Next-generation sequence analysis of cancer xenograft models.

    Fernando J Rossello

    Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

  12. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  13. Sequencing and Analysis of Neanderthal Genomic DNA

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  14. SVAMP: Sequence variation analysis, maps and phylogeny

    Naeem, Raeece; Hidayah, Lailatul; Preston, Mark D.; Clark, Taane G.; Pain, Arnab

    2014-01-01

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis

  15. Swarm-based Sequencing Recommendations in E-learning

    Van den Berg, Bert; Tattersall, Colin; Janssen, José; Brouns, Francis; Kurvers, Hub; Koper, Rob

    2005-01-01

    Van den Berg, B., Tattersall, C., Janssen, J., Brouns, F., Kurvers, H., & Koper, R. (2006). Swarm-based Sequencing Recommendations in E-learning. International Journal of Computer Science & Applications, III(III), 1-11.

  16. Protein Function Prediction Based on Sequence and Structure Information

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  17. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  18. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  19. Characterization of the Pathogenicity of Streptococcus intermedius TYG1620 Isolated from a Human Brain Abscess Based on the Complete Genome Sequence with Transcriptome Analysis and Transposon Mutagenesis in a Murine Subcutaneous Abscess Model.

    Hasegawa, Noriko; Sekizuka, Tsuyoshi; Sugi, Yutaka; Kawakami, Nobuhiro; Ogasawara, Yumiko; Kato, Kengo; Yamashita, Akifumi; Takeuchi, Fumihiko; Kuroda, Makoto

    2017-02-01

    Streptococcus intermedius is known to cause periodontitis and pyogenic infections in the brain and liver. Here we report the complete genome sequence of strain TYG1620 (genome size, 2,006,877 bp; GC content, 37.6%; 2,020 predicted open reading frames [ORFs]) isolated from a brain abscess in an infant. Comparative analysis of S. intermedius genome sequences suggested that TYG1620 carries a notable type VII secretion system (T7SS), two long repeat regions, and 19 ORFs for cell wall-anchored proteins (CWAPs). To elucidate the genes responsible for the pathogenicity of TYG1620, transcriptome analysis was performed in a murine subcutaneous abscess model. The results suggest that the levels of expression of small hypothetical proteins similar to phenol-soluble modulin β1 (PSMβ1), a staphylococcal virulence factor, significantly increased in the abscess model. In addition, an experiment in a murine subcutaneous abscess model with random transposon (Tn) mutant attenuation suggested that Tn mutants with mutations in 212 ORFs in the Tn mutant library were attenuated in the murine abscess model (629 ORFs were disrupted in total); the 212 ORFs are putatively essential for abscess formation. Transcriptome analysis identified 37 ORFs, including paralogs of the T7SS and a putative glucan-binding CWAP in long repeat regions, to be upregulated and attenuated in vivo This study provides a comprehensive characterization of S. intermedius pathogenicity based on the complete genome sequence and a murine subcutaneous abscess model with transcriptome and Tn mutagenesis, leading to the identification of pivotal targets for vaccines or antimicrobial agents for the control of S. intermedius infections. Copyright © 2017 American Society for Microbiology.

  20. Development of Sequence-Based Microsatellite Marker for Phalaenopsis Orchid

    FATIMAH

    2011-06-01

    Full Text Available Phalaenopsis is one of the most interesting genera of orchids due to the members are often used as parents to produce hybrids. The establishment and development of highly reliable and discriminatory methods for identifying species and cultivars has become increasingly more important to plant breeders and members of the nursery industry. The aim of this research was to develop sequence-based microsatellite (eSSR markers for the Phalaenopsis orchid designed from the sequence of GenBank NCBI. Seventeen primers were designed and thirteen primers pairs could amplify the DNA giving the expected PCR product with polymorphism. A total of 51 alleles, with an average of 3 alleles per locus and polymorphism information content (PIC values at 0.674, were detected at the 16 SSR loci. Therefore, these markers could be used for identification of the Phalaenopsis orchid used in this study. Genetic similarity and principle coordinate analysis identified five major groups of Phalaenopsis sp. the first group consisted of P. amabilis, P. fuscata, P. javanica, and P. zebrine. The second group consisted of P. amabilis, P. amboinensis, P. bellina, P. floresens, and P. mannii. The third group consisted of P. bellina, P. cornucervi, P. cornucervi, P. violaceae sumatra, P. modesta. The forth group consisted of P. cornucervi and P. lueddemanniana, and the fifth group was P. amboinensis.

  1. RNA sequencing based analysis of the spleen transcriptome following the infectious bronchitis virus infection of chickens selected for different mannose-binding lectin serum concentrations

    Hamzic, Edin; Kjærup, Rikke Brødsgaard; Mach, Núria

    2016-01-01

    in strategies to control IB. To this end, two chicken lines, selected for high and low serum concentration of mannose-binding lectin (MBL), a soluble pattern recognition receptor, were studied. In total, 32 animals from each line (designated L10H for high and L10L for low MBL serum concentration) were used....... Sixteen birds from each line were infected with IBV on day 1 and birds were euthanized at 1 week and 3 weeks post infection, 8 uninfected controls and 8 infected birds from each line at each occasion. RNA sequencing was performed on spleen samples from all 64 birds used in the experiment. Differential...

  2. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  3. COI (cytochrome oxidase-I) sequence based studies of Carangid fishes from Kakinada coast, India.

    Persis, M; Chandra Sekhar Reddy, A; Rao, L M; Khedkar, G D; Ravinder, K; Nasruddin, K

    2009-09-01

    Mitochondrial DNA, cytochrome oxidase-1 gene sequences were analyzed for species identification and phylogenetic relationship among the very high food value and commercially important Indian carangid fish species. Sequence analysis of COI gene very clearly indicated that all the 28 fish species fell into five distinct groups, which are genetically distant from each other and exhibited identical phylogenetic reservation. All the COI gene sequences from 28 fishes provide sufficient phylogenetic information and evolutionary relationship to distinguish the carangid species unambiguously. This study proves the utility of mtDNA COI gene sequence based approach in identifying fish species at a faster pace.

  4. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  5. PHARMACOGENETIC TESTING OPPORTUNITIES IN CARDIOLOGY BASED ON EXOME SEQUENCING

    N. V. Shcherbakova

    2014-01-01

    Full Text Available Aim. To study what cardiac drugs currently have any comments on biomarkers and what information can be obtained by pharmacogenetic testing using data exome sequencing in patients with cardiac diseases.Material and methods. Exome sequencing in random participant of the ATEROGEN IVANOVO study and bioinformatics analysis of the data were performed. Point mutations were annotated using ANNOVAR program, as well as comparison with a number of specialized databases was done on the basis of user protocols.Results. 11 cardiac drugs and 7 genes which variants can influence cardiac drug metabolism were analyzed. According to exome sequencing of the participant we did not reveal allelic variants that require dose regime correction and careful efficacy control.Conclusion. The exome sequencing application is the next step to a wide range of personalized therapy. Future opportunities for improvement of the risk-benefit ratio in each patient are the main purpose of the collection and analysis of pharmacogenetic data.

  6. Human factors review for Severe Accident Sequence Analysis (SASA)

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure

  7. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Jason D Thompson

    Full Text Available Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  8. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  9. Nonlinear analysis of river flow time sequences

    Porporato, Amilcare; Ridolfi, Luca

    1997-06-01

    Within the field of chaos theory several methods for the analysis of complex dynamical systems have recently been proposed. In light of these ideas we study the dynamics which control the behavior over time of river flow, investigating the existence of a low-dimension deterministic component. The present article follows the research undertaken in the work of Porporato and Ridolfi [1996a] in which some clues as to the existence of chaos were collected. Particular emphasis is given here to the problem of noise and to nonlinear prediction. With regard to the latter, the benefits obtainable by means of the interpolation of the available time series are reported and the remarkable predictive results attained with this nonlinear method are shown.

  10. An assembly sequence planning method based on composite algorithm

    Enfu LIU

    2016-02-01

    Full Text Available To solve the combination explosion problem and the blind searching problem in assembly sequence planning of complex products, an assembly sequence planning method based on composite algorithm is proposed. In the composite algorithm, a sufficient number of feasible assembly sequences are generated using formalization reasoning algorithm as the initial population of genetic algorithm. Then fuzzy knowledge of assembly is integrated into the planning process of genetic algorithm and ant algorithm to get the accurate solution. At last, an example is conducted to verify the feasibility of composite algorithm.

  11. An optical CDMA system based on chaotic sequences

    Liu, Xiao-lei; En, De; Wang, Li-guo

    2014-03-01

    In this paper, a coherent asynchronous optical code division multiple access (OCDMA) system is proposed, whose encoder/decoder is an all-optical generator. This all-optical generator can generate analog and bipolar chaotic sequences satisfying the logistic maps. The formula of bit error rate (BER) is derived, and the relationship of BER and the number of simultaneous transmissions is analyzed. Due to the good property of correlation, this coherent OCDMA system based on these bipolar chaotic sequences can support a large number of simultaneous users, which shows that these chaotic sequences are suitable for asynchronous OCDMA system.

  12. Automation tools for accelerator control a network based sequencer

    Clout, P.; Geib, M.; Westervelt, R.

    1991-01-01

    In conjunction with a major client, Vista Control Systems has developed a sequencer for control systems which works in conjunction with its realtime, distributed Vsystem database. Vsystem is a network-based data acquisition, monitoring and control system which has been applied successfully to both accelerator projects and projects outside this realm of research. The network-based sequencer allows a user to simply define a thread of execution in any supported computer on the network. The script defining a sequence has a simple syntax designed for non-programmers, with facilities for selectively abbreviating the channel names for easy reference. The semantics of the script contains most of the familiar capabilities of conventional programming languages, including standard stream I/O and the ability to start other processes with parameters passed. The script is compiled to threaded code for execution efficiency. The implementation is described in some detail and examples are given of applications for which the sequencer has been used

  13. High Interlaboratory Reprocucibility of DNA Sequence-based Typing of Bacteria in a Multicenter Study

    Sousa, MA de; Boye, Kit; Lencastre, H de

    2006-01-01

    Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...... extensive harmonization of protocols for 30 blind-coded S. aureus DNA samples sent to 10 laboratories. Specialized software for automated sequence analysis ensured a common typing nomenclature....

  14. Accident sequence analysis of human-computer interface design

    Fan, C.-F.; Chen, W.-H.

    2000-01-01

    It is important to predict potential accident sequences of human-computer interaction in a safety-critical computing system so that vulnerable points can be disclosed and removed. We address this issue by proposing a Multi-Context human-computer interaction Model along with its analysis techniques, an Augmented Fault Tree Analysis, and a Concurrent Event Tree Analysis. The proposed augmented fault tree can identify the potential weak points in software design that may induce unintended software functions or erroneous human procedures. The concurrent event tree can enumerate possible accident sequences due to these weak points

  15. Food Fish Identification from DNA Extraction through Sequence Analysis

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  16. An efficient binomial model-based measure for sequence comparison and its application.

    Liu, Xiaoqing; Dai, Qi; Li, Lihua; He, Zerong

    2011-04-01

    Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.

  17. Compilation and analysis of Escherichia coli promoter DNA sequences.

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter ...

  18. Genotyping by Sequencing for SNP-Based Linkage Map Construction and QTL Analysis of Chilling Requirement and Bloom Date in Peach [Prunus persica (L. Batsch].

    Douglas Gary Bielenberg

    Full Text Available Low-cost, high throughput genotyping methods are crucial to marker discovery and marker-assisted breeding efforts, but have not been available for many 'specialty crops' such as fruit and nut trees. Here we apply the Genotyping-By-Sequencing (GBS method developed for cereals to the discovery of single nucleotide polymorphisms (SNPs in a peach F2 mapping population. Peach is a genetic and genomic model within the Rosaceae and will provide a template for the use of this method with other members of this family. Our F2 mapping population of 57 genotypes segregates for bloom time (BD and chilling requirement (CR and we have extensively phenotyped this population. The population derives from a selfed F1 progeny of a cross between 'Hakuho' (high CR and 'UFGold' (low CR. We were able to successfully employ GBS and the TASSEL GBS pipeline without modification of the original methodology using the ApeKI restriction enzyme and multiplexing at an equivalent of 96 samples per Illumina HiSeq 2000 lane. We obtained hundreds of SNP markers which were then used to construct a genetic linkage map and identify quantitative trait loci (QTL for BD and CR.

  19. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  20. Inferences of population structure and demographic history for Taxodium distichum, a coniferous tree in North America, based on amplicon sequencing analysis.

    Ikezaki, Yuka; Suyama, Yoshihisa; Middleton, Beth A; Tsumura, Yoshihiko; Teshima, Kousuke; Tachida, Hidenori; Kusumi, Junko

    2016-11-01

    Studies of natural genetic variation can elucidate the genetic basis of phenotypic variation and the past population structure of species. Our study species, Taxodium distichum, is a unique conifer that inhabits the flood plains and swamps of North America. Morphological and ecological differences in two varieties, T. distichum var. distichum (bald cypress) and T. distichum var. imbricarium (pond cypress), are well known, but little is known about the level of genetic differentiation between the varieties and the demographic history of local populations. We analyzed nucleotide polymorphisms at 47 nuclear loci from 96 individuals collected from the Mississippi River Alluvial Valley (MRAV), and Gulf Coastal populations in Texas, Louisiana, and Florida using high-throughput DNA sequencing. Standard population genetic statistics were calculated, and demographic parameters were estimated using a composite-likelihood approach. Taxodium distichum in North America can be divided into at least three genetic groups, bald cypress in the MRAV and Texas, bald cypress in Florida, and pond cypress in Florida. The levels of genetic differentiation among the groups were low but significant. Several loci showed the signatures of positive selection, which might be responsible for local adaptation or varietal differentiation. Bald cypress was genetically differentiated into two geographical groups, and the boundary was located between the MRAV and Florida. This differentiation could be explained by population expansion from east to west. Despite the overlap of the two varieties' ranges, they were genetically differentiated in Florida. The estimated demographic parameters suggested that pond cypress split from bald cypress during the late Miocene. © 2016 Botanical Society of America.

  1. Inference of population structure and demographic history in Taxodium distichum, a coniferous tree in North America, based on amplicon sequence analysis

    Ikezaki, Yuka; Suyama, Yoshihisa; Middleton, Beth A.; Tsumura, Yoshihiko; Teshima, Kousuke; Tachida, Hidenori; Kusumi, Junko

    2016-01-01

    PREMISE OF THE STUDY: Studies of natural genetic variation can elucidate the genetic basis of phenotypic variation and the past population structure of species. Our study species, Taxodium distichum, is a unique conifer that inhabits the flood plains and swamps of North America. Morphological and ecological differences in two varieties, T. distichum var. distichum (bald cypress) and T. distichum var. imbricarium (pond cypress), are well known, but little is known about the level of genetic differentiation between the varieties and the demographic history of local populations.METHODS: We analyzed nucleotide polymorphisms at 47 nuclear loci from 96 individuals collected from the Mississippi River Alluvial Valley (MRAV), and Gulf Coastal populations in Texas, Louisiana, and Florida using high-throughput DNA sequencing. Standard population genetic statistics were calculated, and demographic parameters were estimated using a composite-likelihood approach.KEY RESULTS: Taxodium distichum in North America can be divided into at least three genetic groups, bald cypress in the MRAV and Texas, bald cypress in Florida, and pond cypress in Florida. The levels of genetic differentiation among the groups were low but significant. Several loci showed the signatures of positive selection, which might be responsible for local adaptation or varietal differentiation.CONCLUSIONS: Bald cypress was genetically differentiated into two geographical groups, and the boundary was located between the MRAV and Florida. This differentiation could be explained by population expansion from east to west. Despite the overlap of the two varieties’ ranges, they were genetically differentiated in Florida. The estimated demographic parameters suggested that pond cypress split from bald cypress during the late Miocene.

  2. Thermodynamics-based models of transcriptional regulation with gene sequence.

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  3. Probabilistic topic modeling for the analysis and classification of genomic sequences

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  4. A molecular phylogenetic and fruit evolutionary analysis of the major groups of the paleotropical woody bamboos (Gramineae: Bambusoideae) based on nuclear ITS, GBSSI gene and plastid trnL-F DNA sequences.

    Yang, Han-Qi; Yang, Jun-Bo; Peng, Zhen-Hua; Gao, Jian; Yang, Yu-Ming; Peng, Sheng; Li, De-Zhu

    2008-09-01

    This study presented the first molecular phylogenetic analysis of the major clades of woody bamboos of the Old World tropics based on nuclear and chloroplast sequences (ITS, GBSSI and trnL-F). Sequence data from 53 species, representing 17 paleotropical woody bamboo genera, were analyzed using the maximum parsimony and Bayesian inference methods. All examined ingroup taxa were clustered into two clades, i.e., the Bambusinae+Dinochloa clade and the Melocanninae clade. The former clade included Bambusa, Bonia, Dendrocalamus, Dendrocalamopsis, Dinochloa, Gigantochloa, Molecalamus, Neomicrocalamus, Neosinocalamus, Oxytenanthera s. str. (sensu stricto), Racemobambos and Thyrsostachys. The Melocanninae clade consisted of Cephalostachyum, Leptocanna (better treated as part of Cephalostachyum), Melocanna, Pseudostachyum and Schizostachyum s. str. The subtribe Racemobambosinae and tribes Dendrocalameae and Oxytenanthereae were not supported and may be better placed in subtribe Bambusinae. The ovary characters seemed to be good criteria to distinguish these two clades. The reconstruction of ancestral fruit characters indicated that the bacoid caryopsis, namely, fleshy or berry-like fruits, was found to be scattered in three lineages of the examined paleotropical woody bamboos. Fruit characters are thus not reliable indicators of phylogeny and bacoid caryopsis likely represents a specialization for particular ecological conditions.

  5. MultiLocus Sequence Analysis- and Amplified Fragment Length Polymorphism-based characterization of xanthomonads associated with bacterial spot of tomato and pepper and their relatedness to Xanthomonas species.

    Hamza, A A; Robene-Soustrade, I; Jouen, E; Lefeuvre, P; Chiroleu, F; Fisher-Le Saux, M; Gagnevin, L; Pruvost, O

    2012-05-01

    MultiLocus Sequence Analysis (MLSA) and Amplified Fragment Length Polymorphism (AFLP) were used to measure the genetic relatedness of a comprehensive collection of xanthomonads pathogenic to solaneous hosts to Xanthomonas species. The MLSA scheme was based on partial sequences of four housekeeping genes (atpD, dnaK, efp and gyrB). Globally, MLSA data unambiguously identified strains causing bacterial spot of tomato and pepper at the species level and was consistent with AFLP data. Genetic distances derived from both techniques showed a close relatedness of (i) X. euvesicatoria, X. perforans and X. alfalfae and (ii) X. gardneri and X. cynarae. Maximum likelihood tree topologies derived from each gene portion and the concatenated data set for species in the X. campestris 16S rRNA core (i.e. the species cluster comprising all strains causing bacterial spot of tomato and pepper) were not congruent, consistent with the detection of several putative recombination events in our data sets by several recombination search algorithms. One recombinant region in atpD was identified in most strains of X. euvesicatoria including the type strain. Copyright © 2012 Elsevier GmbH. All rights reserved.

  6. Phylogenetic relationships of Salmonella based on rRNA sequences

    Christensen, H.; Nordentoft, Steen; Olsen, J.E.

    1998-01-01

    separated by 16S rRNA analysis and found to be closely related to the Escherichia coli and Shigella complex by both 16S and 23S rRNA analyses. The diphasic serotypes S. enterica subspp. I and VI were separated from the monophasic serotypes subspp. IIIa and IV, including S. bongori, by 23S rRNA sequence...

  7. Isolation and sequence analysis of a cDNA clone encoding the fifth complement component

    Lundwall, Åke B; Wetsel, Rick A; Kristensen, Torsten

    1985-01-01

    DNA clone of 1.85 kilobase pairs was isolated. Hybridization of the mixed-sequence probe to the complementary strand of the plasmid insert and sequence analysis by the dideoxy method predicted the expected protein sequence of C5a (positions 1-12), amino-terminal to the anticipated priming site. The sequence......, subcloned into M13 mp8, and sequenced at random by the dideoxy technique, thereby generating a contiguous sequence of 1703 base pairs. This clone contained coding sequence for the C-terminal 262 amino acid residues of the beta-chain, the entire C5a fragment, and the N-terminal 98 residues of the alpha......'-chain. The 3' end of the clone had a polyadenylated tail preceded by a polyadenylation recognition site, a 3'-untranslated region, and base pairs homologous to the human Alu concensus sequence. Comparison of the derived partial human C5 protein sequence with that previously determined for murine C3 and human...

  8. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-08-01

    RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.

  9. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-01-01

    Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465

  10. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

    Hironobu Yanagisawa

    2016-03-01

    Full Text Available The presence of high molecular weight double-stranded RNA (dsRNA within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV, a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses.

  11. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  13. Sequence analysis corresponding to the PPE and PE proteins in ...

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  14. Molecular cloning, expression analysis and sequence prediction of ...

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  15. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  16. Sequence symmetry analysis in pharmacovigilance and pharmacoepidemiologic studies

    Lai, Edward Chia Cheng; Pratt, Nicole; Hsieh, Cheng Yang

    2017-01-01

    Sequence symmetry analysis (SSA) is a method for detecting adverse drug events by utilizing computerized claims data. The method has been increasingly used to investigate safety concerns of medications and as a pharmacovigilance tool to identify unsuspected side effects. Validation studies have i...

  17. DNAApp: a mobile application for sequencing data analysis.

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  18. DNAApp: a mobile application for sequencing data analysis

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  19. Long-read sequencing data analysis for yeasts.

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  20. Protein Function Prediction Based on Sequence and Structure Information

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  1. MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

    Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund

    2017-01-01

    number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post...... pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets....

  2. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  3. Phylogenetic Analysis of a ‘Jewel Orchid’ Genus Goodyera (Orchidaceae) Based on DNA Sequence Data from Nuclear and Plastid Regions

    Hu, Chao; Tian, Huaizhen; Li, Hongqing; Hu, Aiqun; Xing, Fuwu; Bhattacharjee, Avishek; Hsu, Tianchuan; Kumar, Pankaj; Chung, Shihwen

    2016-01-01

    A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae) based on the nuclear ribosomal internal transcribed spacer (ITS) region and two chloroplast loci (matK and trnL-F) was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal the intrageneric relationships of Goodyera and its intergeneric relationships to related genera. The results indicate that: 1) Goodyera is not monophyletic; 2) Goodyera could be divided into four sections, viz., Goodyera, Otosepalum, Reticulum and a new section; 3) sect. Reticulum can be further divided into two subsections, viz., Reticulum and Foliosum, whereas sect. Goodyera can in turn be divided into subsections Goodyera and a new subsection. PMID:26927946

  4. (Brassicaceae) based on nuclear ribosomal ITS DNA sequences

    Home; Journals; Journal of Genetics; Volume 93; Issue 2. Phylogeny and biogeography of Alyssum (Brassicaceae) based on nuclear ribosomal ITS DNA sequences. Yan Li Yan Kong Zhe Zhang Yanqiang Yin Bin Liu Guanghui Lv Xiyong Wang. Research Article Volume 93 Issue 2 August 2014 pp 313-323 ...

  5. Targeted DNA Methylation Analysis by High Throughput Sequencing in Porcine Peri-attachment Embryos

    MORRILL, Benson H.; COX, Lindsay; WARD, Anika; HEYWOOD, Sierra; PRATHER, Randall S.; ISOM, S. Clay

    2013-01-01

    Abstract The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx plat...

  6. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S.; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M.; Tettelin, Herv?; White, Owen; Angiuoli, Samuel V.; Mahurkar, Anup; Fricke, W. Florian

    2017-01-01

    Background The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. Results CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. ...

  7. Sequence-based analysis of the Vitis vinifera L. cv Cabernet Sauvignon grape must mycobiome in three South African vineyards employing distinct agronomic systems

    MATHABATHA EVODIA SETATI

    2015-11-01

    Full Text Available Recent microbiomic research of agricultural habitats has highlighted tremendous microbial biodiversity associated with such ecosystems. Data generated in vineyards have furthermore highlighted significant regional differences in vineyard biodiversity, hinting at the possibility that such differences might be responsible for regional differences in wine style and character, a hypothesis referred to as microbial terroir. The current study further contributes to this body of work by comparing the mycobiome associated with South African (SA Cabernet Sauvignon grapes in three neighboring vineyards that employ different agronomic approaches, and comparing the outcome with similar data sets from Californian vineyards. The aim of this study was to fully characterize the mycobiomes associated with the grapes from these vineyards. The data revealed approximately 10 times more fungal diversity than what is typically retrieved from culture-based studies. The Biodynamic vineyard was found to harbor a more diverse fungal community (H = 2.6 than the conventional (H = 2.1 and integrated (H = 1.8 vineyards. The data show that ascomycota are the most abundant phylum in the three vineyards, with Aureobasidium pullulans and its close relative Kabatiella microsticta being the most dominant fungi. This is the first report to reveal a high incidence of K. microsticta in the grape/wine ecosystem. Different common wine yeast species, such as Metschnikowia pulcherrima and Starmerella bacillaris dominated the mycobiome in the three vineyards. The data show that the filamentous fungi are the most abundant community in grape must although they are not regarded as relevant during wine fermentation. Comparison of metagenomic datasets from the three SA vineyards and previously published data from Californian vineyards revealed only 25% of the fungi in the SA dataset was also present in the Californian dataset, with greater variation evident amongst ubiquitous epiphytic fungi.

  8. Sequence-based Analysis of the Vitis vinifera L. cv Cabernet Sauvignon Grape Must Mycobiome in Three South African Vineyards Employing Distinct Agronomic Systems.

    Setati, Mathabatha E; Jacobson, Daniel; Bauer, Florian F

    2015-01-01

    Recent microbiomic research of agricultural habitats has highlighted tremendous microbial biodiversity associated with such ecosystems. Data generated in vineyards have furthermore highlighted significant regional differences in vineyard biodiversity, hinting at the possibility that such differences might be responsible for regional differences in wine style and character, a hypothesis referred to as "microbial terroir." The current study further contributes to this body of work by comparing the mycobiome associated with South African (SA) Cabernet Sauvignon grapes in three neighboring vineyards that employ different agronomic approaches, and comparing the outcome with similar data sets from Californian vineyards. The aim of this study was to fully characterize the mycobiomes associated with the grapes from these vineyards. The data revealed approximately 10 times more fungal diversity than what is typically retrieved from culture-based studies. The Biodynamic vineyard was found to harbor a more diverse fungal community (H = 2.6) than the conventional (H = 2.1) and integrated (H = 1.8) vineyards. The data show that ascomycota are the most abundant phylum in the three vineyards, with Aureobasidium pullulans and its close relative Kabatiella microsticta being the most dominant fungi. This is the first report to reveal a high incidence of K. microsticta in the grape/wine ecosystem. Different common wine yeast species, such as Metschnikowia pulcherrima and Starmerella bacillaris dominated the mycobiome in the three vineyards. The data show that the filamentous fungi are the most abundant community in grape must although they are not regarded as relevant during wine fermentation. Comparison of metagenomic datasets from the three SA vineyards and previously published data from Californian vineyards revealed only 25% of the fungi in the SA dataset was also present in the Californian dataset, with greater variation evident amongst ubiquitous epiphytic fungi.

  9. Analyzing Plasmodium falciparum erythrocyte membrane protein 1 gene expression by a next generation sequencing based method

    Jespersen, Jakob S.; Petersen, Bent; Seguin-Orlando, Andaine

    2013-01-01

    at identifying PfEMP1 features associated with high virulence. Here we present the first effective method for sequence analysis of var genes expressed in field samples: a sequential PCR and next generation sequencing based technique applied on expressed var sequence tags and subsequently on long range PCR......, encoded by ~60 highly variable 'var' genes per haploid genome. PfEMP1 is exported to the surface of infected erythrocytes and is thought to be fundamental to immune evasion by adhesion to host and parasite factors. The highly variable nature has constituted a roadblock in var expression studies aimed...

  10. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  11. Molecular cloning, sequence analysis and structure prediction of the ...

    AJL

    2012-04-19

    Apr 19, 2012 ... The primers were based on the rBAT sequences of other animals deposited in GenBank. .... fragment; M1, 2000 bp DNA ladder; M2, 1000 bp DNA ladder. spliced to obtain the ..... A traffic signal for heterodimeric amino acid.

  12. Quality Control of the Traditional Patent Medicine Yimu Wan Based on SMRT Sequencing and DNA Barcoding

    Jia, Jing; Xu, Zhichao; Xin, Tianyi; Shi, Linchun; Song, Jingyuan

    2017-01-01

    Substandard traditional patent medicines may lead to global safety-related issues. Protecting consumers from the health risks associated with the integrity and authenticity of herbal preparations is of great concern. Of particular concern is quality control for traditional patent medicines. Here, we establish an effective approach for verifying the biological composition of traditional patent medicines based on single-molecule real-time (SMRT) sequencing and DNA barcoding. Yimu Wan (YMW), a classical herbal prescription recorded in the Chinese Pharmacopoeia, was chosen to test the method. Two reference YMW samples were used to establish a standard method for analysis, which was then applied to three different batches of commercial YMW samples. A total of 3703 and 4810 circular-consensus sequencing (CCS) reads from two reference and three commercial YMW samples were mapped to the ITS2 and psbA-trnH regions, respectively. Moreover, comparison of intraspecific genetic distances based on SMRT sequencing data with reference data from Sanger sequencing revealed an ITS2 and psbA-trnH intergenic spacer that exhibited high intraspecific divergence, with the sites of variation showing significant differences within species. Using the CCS strategy for SMRT sequencing analysis was adequate to guarantee the accuracy of identification. This study demonstrates the application of SMRT sequencing to detect the biological ingredients of herbal preparations. SMRT sequencing provides an affordable way to monitor the legality and safety of traditional patent medicines. PMID:28620408

  13. Analysis of Sequence Diagram Layout in Advanced UML Modelling Tools

    Ņikiforova Oksana

    2016-05-01

    Full Text Available System modelling using Unified Modelling Language (UML is the task that should be solved for software development. The more complex software becomes the higher requirements are stated to demonstrate the system to be developed, especially in its dynamic aspect, which in UML is offered by a sequence diagram. To solve this task, the main attention is devoted to the graphical presentation of the system, where diagram layout plays the central role in information perception. The UML sequence diagram due to its specific structure is selected for a deeper analysis on the elements’ layout. The authors research represents the abilities of modern UML modelling tools to offer automatic layout of the UML sequence diagram and analyse them according to criteria required for the diagram perception.

  14. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

    Morgan Kullberg

    Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.

  15. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  16. antaRNA: ant colony-based RNA sequence design.

    Kleinkauf, Robert; Mann, Martin; Backofen, Rolf

    2015-10-01

    RNA sequence design is studied at least as long as the classical folding problem. Although for the latter the functional fold of an RNA molecule is to be found ,: inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology ,: reliable RNA sequence design becomes a crucial step to generate novel biochemical components. In this article ,: the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GC-content distribution ,: specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets. http://www.bioinf.uni-freiburg.de/Software/antaRNA CONTACT: backofen@informatik.uni-freiburg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  17. Sequence determination and analysis of the NSs genes of two tospoviruses.

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  18. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  19. QTL analysis by sequencing of Water Use Efficiency (WUE) in potato

    Kaminski, Kacper Piotr; Sønderkær, Mads; Sørensen, Kirsten Kørup

    2013-01-01

    The traditional approach to potato breeding, the classical “mate and phenotype” approach is relatively costly and because phenotyping and growth capacity is limited, this are being slowly replaced by Marker Assisted Selection (MAS) breeding schemes. MAS is based on the presence of DNA polymorphic.......sparsipilum), phenotyped for water use efficiency. This population has also previously been phenotyped for the total glycoalkaloid (TGA) content....... and time consuming process. Here, a novel method for Quantitative Trait Locus (QTL) analysis has been developed, that allows for development of specific markers by use of genomic sequence reads and the recently published reference genome sequence for potato. Prior to sequencing the mapping population...

  20. First fungal genome sequence from Africa: A preliminary analysis

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  1. Automatic analysis of the 2015 Gorkha earthquake aftershock sequence.

    Baillard, C.; Lyon-Caen, H.; Bollinger, L.; Rietbrock, A.; Letort, J.; Adhikari, L. B.

    2016-12-01

    The Mw 7.8 Gorkha earthquake, that partially ruptured the Main Himalayan Thrust North of Kathmandu on the 25th April 2015, was the largest and most catastrophic earthquake striking Nepal since the great M8.4 1934 earthquake. This mainshock was followed by multiple aftershocks, among them, two notable events that occurred on the 12th May with magnitudes of 7.3 Mw and 6.3 Mw. Due to these recent events it became essential for the authorities and for the scientific community to better evaluate the seismic risk in the region through a detailed analysis of the earthquake catalog, amongst others, the spatio-temporal distribution of the Gorkha aftershock sequence. Here we complement this first study by doing a microseismic study using seismic data coming from the eastern part of the Nepalese Seismological Center network associated to one broadband station in Everest. Our primary goal is to deliver an accurate catalog of the aftershock sequence. Due to the exceptional number of events detected we performed an automatic picking/locating procedure which can be splitted in 4 steps: 1) Coarse picking of the onsets using a classical STA/LTA picker, 2) phase association of picked onsets to detect and declare seismic events, 3) Kurtosis pick refinement around theoretical arrival times to increase picking and location accuracy and, 4) local magnitude calculation based amplitude of waveforms. This procedure is time efficient ( 1 sec/event), reduces considerably the location uncertainties ( 2 to 5 km errors) and increases the number of events detected compared to manual processing. Indeed, the automatic detection rate is 10 times higher than the manual detection rate. By comparing to the USGS catalog we were able to give a new attenuation law to compute local magnitudes in the region. A detailed analysis of the seismicity shows a clear migration toward the east of the region and a sudden decrease of seismicity 100 km east of Kathmandu which may reveal the presence of a tectonic

  2. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  3. galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

    Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

    2004-06-12

    The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

  4. Now And Next Generation Sequencing Techniques: Future of Sequence Analysis using Cloud Computing

    Radhe Shyam Thakur

    2012-12-01

    Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.

  5. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  6. SEQUENCING AND SEQUENCE ANALYSIS OF MYOSTATIN GENE IN THE EXON 1 OF THE CAMEL (CAMELUS DROMEDARIUS

    M. G. SHAH, A. S. QURESHI1, M. REISSMANN2 AND H. J. SCHWARTZ3

    2006-10-01

    Full Text Available Myostatin, also called growth differentiation factor-8 (GDF-8, is a member of the mammalian growth transforming family (TGF-beta superfamily, which is expressed specifically in developing an adult skeletal muscle. Muscular hypertrophy allele (mh allele in the double muscle breeds involved mutation within the myostatin gene. Genomic DNA was isolated from the camel hair using NucleoSpin Tissue kit. Two animals of each of the six breeds namely, Marecha, Dhatti, Larri, Kohi, Sakrai and Cambelpuri were used for sequencing. For PCR amplification of the gene, a primer pair was designed from homolog regions of already published sequences of farm animals from GenBank. Results showed that camel myostatin possessed more than 90% homology with that of cattle, sheep and pig. Camel formed separate cluster from the pig in spite of having high homology (98% and showed 94% homology with cattle and sheep as reported in literature. Sequence analysis of the PCR amplified part of exon 1 (256 bp of the camel myostatin was identical among six camel breeds.

  7. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley.

    Martin Mascher

    Full Text Available The rapid development of next-generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS, a low-cost, reduced representation sequencing method, is becoming a common approach for whole-genome marker profiling in many species. With quickly developing sequencing technologies, adapting current GBS methodologies to new platforms will leverage these advancements for future studies. To test new semiconductor sequencing platforms for GBS, we genotyped a barley recombinant inbred line (RIL population. Based on a previous GBS approach, we designed bar code and adapter sets for the Ion Torrent platforms. Four sets of 24-plex libraries were constructed consisting of 94 RILs and the two parents and sequenced on two Ion platforms. In parallel, a 96-plex library of the same RILs was sequenced on the Illumina HiSeq 2000. We applied two different computational pipelines to analyze sequencing data; the reference-independent TASSEL pipeline and a reference-based pipeline using SAMtools. Sequence contigs positioned on the integrated physical and genetic map were used for read mapping and variant calling. We found high agreement in genotype calls between the different platforms and high concordance between genetic and reference-based marker order. There was, however, paucity in the number of SNP that were jointly discovered by the different pipelines indicating a strong effect of alignment and filtering parameters on SNP discovery. We show the utility of the current barley genome assembly as a framework for developing very low-cost genetic maps, facilitating high resolution genetic mapping and negating the need for developing de novo genetic maps for future studies in barley. Through demonstration of GBS on semiconductor sequencing platforms, we conclude that the GBS approach is amenable to a range of platforms and can easily be modified as new

  8. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Mo Sisu

    2013-02-01

    Full Text Available Abstract Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA to 8.9% (dnaN. Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic

  9. A sequence-dependent rigid-base model of DNA

    Gonzalez, O.; Petkevičiutė, D.; Maddocks, J. H.

    2013-02-01

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  10. A sequence-dependent rigid-base model of DNA.

    Gonzalez, O; Petkevičiūtė, D; Maddocks, J H

    2013-02-07

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  11. Crash sequence based risk matrix for motorcycle crashes.

    Wu, Kun-Feng; Sasidharan, Lekshmi; Thor, Craig P; Chen, Sheng-Yin

    2018-04-05

    Considerable research has been conducted related to motorcycle and other powered-two-wheeler (PTW) crashes; however, it always has been controversial among practitioners concerning with types of crashes should be first targeted and how to prioritize resources for the implementation of mitigating actions. Therefore, there is a need to identify types of motorcycle crashes that constitute the greatest safety risk to riders - most frequent and most severe crashes. This pilot study seeks exhibit the efficacy of a new approach for prioritizing PTW crash causation sequences as they relate to injury severity to better inform the application of mitigating countermeasures. To accomplish this, the present study constructed a crash sequence-based risk matrix to identify most frequent and most severe motorcycle crashes in an attempt to better connect causes and countermeasures of PTW crashes. Although the frequency of each crash sequence can be computed from crash data, a crash severity model is needed to compare the levels of crash severity among different crash sequences, while controlling for other factors that also have effects on crash severity such drivers' age, use of helmet, etc. The construction of risk matrix based on crash sequences involve two tasks: formulation of crash sequence and the estimation of a mixed-effects (ME) model to adjust the levels of severities for each crash sequence to account for other crash contributing factors that would have an effect on the maximum level of crash severity in a crash. Three data elements from the National Automotive Sampling System - General Estimating System (NASS-GES) data were utilized to form a crash sequence: critical event, crash types, and sequence of events. A mixed-effects model was constructed to model the severity levels for each crash sequence while accounting for the effects of those crash contributing factors on crash severity. A total of 8039 crashes involving 8208 motorcycles occurred during 2011 and 2013 were

  12. Phylogeny of the Serrasalmidae (Characiformes based on mitochondrial DNA sequences

    Guillermo Ortí

    2008-01-01

    Full Text Available Previous studies based on DNA sequences of mitochondrial (mt rRNA genes showed three main groups within the subfamily Serrasalminae: (1 a "pacu" clade of herbivores (Colossoma, Mylossoma, Piaractus; (2 the "Myleus" clade (Myleus, Mylesinus, Tometes, Ossubtus; and (3 the "piranha" clade (Serrasalmus, Pygocentrus, Pygopristis, Pristobrycon, Catoprion, Metynnis. The genus Acnodon was placed as the sister taxon of clade (2+3. However, poor resolution within each clade was obtained due to low levels of variation among rRNA gene sequences. Complete sequences of the hypervariable mtDNA control region for a total of 45 taxa, and additional sequences of 12S and 16S rRNA from a total of 74 taxa representing all genera in the family are now presented to address intragroup relationships. Control region sequences of several serrasalmid species exhibit tandem repeats of short motifs (12 to 33 bp in the 3' end of this region, accounting for substantial length variation. Bayesian inference and maximum parsimony analyses of these sequences identify the same groupings as before and provide further evidence to support the following observations: (a Serrasalmus gouldingi and species of Pristobrycon (non-striolatus form a monophyletic group that is the sister group to other species of Serrasalmus and Pygocentrus; (b Catoprion, Pygopristis, and Pristobrycon striolatus form a well supported clade, sister to the group described above; (c some taxa assigned to the genus Myloplus (M. asterias, M tiete, M ternetzi, and M rubripinnis form a well supported group whereas other Myloplus species remain with uncertain affinities (d Mylesinus, Tometes and Myleus setiger form a monophyletic group.

  13. Regularized rare variant enrichment analysis for case-control exome sequencing data.

    Larson, Nicholas B; Schaid, Daniel J

    2014-02-01

    Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.

  14. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library.

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes provides the necessary foundation for

  15. Speeding disease gene discovery by sequence based candidate prioritization

    Porteous David J

    2005-03-01

    Full Text Available Abstract Background Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. Results We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. Conclusion PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.

  16. Disk-based compression of data from genome sequencing.

    Grabowski, Szymon; Deorowicz, Sebastian; Roguski, Łukasz

    2015-05-01

    High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since the redundancy between overlapping reads cannot be easily captured in the (relatively small) main memory. More interesting solutions for this problem are disk based, where the better of these two, from Cox et al. (2012), is based on the Burrows-Wheeler transform (BWT) and achieves 0.518 bits per base for a 134.0 Gbp human genome sequencing collection with almost 45-fold coverage. We propose overlapping reads compression with minimizers, a compression algorithm dedicated to sequencing reads (DNA only). Our method makes use of a conceptually simple and easily parallelizable idea of minimizers, to obtain 0.317 bits per base as the compression ratio, allowing to fit the 134.0 Gbp dataset into only 5.31 GB of space. http://sun.aei.polsl.pl/orcom under a free license. sebastian.deorowicz@polsl.pl Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. PMS2 gene mutational analysis: direct cDNA sequencing to circumvent pseudogene interference.

    Wimmer, Katharina; Wernstedt, Annekatrin

    2014-01-01

    The presence of highly homologous pseudocopies can compromise the mutation analysis of a gene of interest. In particular, when using PCR-based strategies, pseudogene co-amplification has to be effectively prevented. This is often achieved by using primers designed to be parental gene specific according to the reference sequence and by applying stringent PCR conditions. However, there are cases in which this approach is of limited utility. For example, it has been shown that the PMS2 gene exchanges sequences with one of its pseudogenes, named PMS2CL. This results in functional PMS2 alleles containing pseudogene-derived sequences at their 3'-end and in nonfunctional PMS2CL pseudogene alleles that contain gene-derived sequences. Hence, the paralogues cannot be distinguished according to the reference sequence. This shortcoming can be effectively circumvented by using direct cDNA sequencing. This approach is based on the selective amplification of PMS2 transcripts in two overlapping 1.6-kb RT-PCR products. In addition to avoiding pseudogene co-amplification and allele dropout, this method has also the advantage that it allows to effectively identify deletions, splice mutations, and de novo retrotransposon insertions that escape the detection of most DNA-based mutation analysis protocols.

  19. Comparison of sequence-based and structure-based phylogenetic ...

    Prakash

    phylogenetic tree construction methods, has been considered as an equivalent of .... Further detailed analysis described is restricted to the first two groups only. ..... Aspartate-ammonia ligase. Plant virus ..... enzymatic activities?; Trends ...

  20. PseudoMLSA: a database for multigenic sequence analysis of Pseudomonas species

    Lalucat Jorge

    2010-04-01

    Full Text Available Abstract Background The genus Pseudomonas comprises more than 100 species of environmental, clinical, agricultural, and biotechnological interest. Although, the recommended method for discriminating bacterial species is DNA-DNA hybridisation, alternative techniques based on multigenic sequence analysis are becoming a common practice in bacterial species discrimination studies. Since there is not a general criterion for determining which genes are more useful for species resolution; the number of strains and genes analysed is increasing continuously. As a result, sequences of different genes are dispersed throughout several databases. This sequence information needs to be collected in a common database, in order to be useful for future identification-based projects. Description The PseudoMLSA Database is a comprehensive database of multiple gene sequences from strains of Pseudomonas species. The core of the database is composed of selected gene sequences from all Pseudomonas type strains validly assigned to the genus through 2008. The database is aimed to be useful for MultiLocus Sequence Analysis (MLSA procedures, for the identification and characterisation of any Pseudomonas bacterial isolate. The sequences are available for download via a direct connection to the National Center for Biotechnology Information (NCBI. Additionally, the database includes an online BLAST interface for flexible nucleotide queries and similarity searches with the user's datasets, and provides a user-friendly output for easily parsing, navigating, and analysing BLAST results. Conclusions The PseudoMLSA database amasses strains and sequence information of validly described Pseudomonas species, and allows free querying of the database via a user-friendly, web-based interface available at http://www.uib.es/microbiologiaBD/Welcome.html. The web-based platform enables easy retrieval at strain or gene sequence information level; including references to published peer

  1. Spike-Based Bayesian-Hebbian Learning of Temporal Sequences.

    Philip J Tully

    2016-05-01

    Full Text Available Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN learning rule. We find that the formation of distributed memories, embodied by increased periods of firing in pools of excitatory neurons, together with asymmetrical associations between these distinct network states, can be acquired through plasticity. The model's feasibility is demonstrated using simulations of adaptive exponential integrate-and-fire model neurons (AdEx. We show that the learning and speed of sequence replay depends on a confluence of biophysically relevant parameters including stimulus duration, level of background noise, ratio of synaptic currents, and strengths of short-term depression and adaptation. Moreover, sequence elements are shown to flexibly participate multiple times in the sequence, suggesting that spiking attractor networks of this type can support an efficient combinatorial code. The model provides a principled approach towards understanding how multiple interacting plasticity mechanisms can coordinate hetero-associative learning in unison.

  2. Digital chaotic sequence generator based on coupled chaotic systems

    Shu-Bo, Liu; Jing, Sun; Jin-Shuo, Liu; Zheng-Quan, Xu

    2009-01-01

    Chaotic systems perform well as a new rich source of cryptography and pseudo-random coding. Unfortunately their digital dynamical properties would degrade due to the finite computing precision. Proposed in this paper is a modified digital chaotic sequence generator based on chaotic logistic systems with a coupling structure where one chaotic subsystem generates perturbation signals to disturb the control parameter of the other one. The numerical simulations show that the length of chaotic orbits, the output distribution of chaotic system, and the security of chaotic sequences have been greatly improved. Moreover the chaotic sequence period can be extended at least by one order of magnitude longer than that of the uncoupled logistic system and the difficulty in decrypting increases 2 128 *2 128 times indicating that the dynamical degradation of digital chaos is effectively improved. A field programmable gate array (FPGA) implementation of an algorithm is given and the corresponding experiment shows that the output speed of the generated chaotic sequences can reach 571.4 Mbps indicating that the designed generator can be applied to the real-time video image encryption. (general)

  3. Genetic mutation analysis of human gastric adenocarcinomas using ion torrent sequencing platform.

    Zhi Xu

    Full Text Available Gastric cancer is the one of the major causes of cancer-related death, especially in Asia. Gastric adenocarcinoma, the most common type of gastric cancer, is heterogeneous and its incidence and cause varies widely with geographical regions, gender, ethnicity, and diet. Since unique mutations have been observed in individual human cancer samples, identification and characterization of the molecular alterations underlying individual gastric adenocarcinomas is a critical step for developing more effective, personalized therapies. Until recently, identifying genetic mutations on an individual basis by DNA sequencing remained a daunting task. Recent advances in new next-generation DNA sequencing technologies, such as the semiconductor-based Ion Torrent sequencing platform, makes DNA sequencing cheaper, faster, and more reliable. In this study, we aim to identify genetic mutations in the genes which are targeted by drugs in clinical use or are under development in individual human gastric adenocarcinoma samples using Ion Torrent sequencing. We sequenced 737 loci from 45 cancer-related genes in 238 human gastric adenocarcinoma samples using the Ion Torrent Ampliseq Cancer Panel. The sequencing analysis revealed a high occurrence of mutations along the TP53 locus (9.7% in our sample set. Thus, this study indicates the utility of a cost and time efficient tool such as Ion Torrent sequencing to screen cancer mutations for the development of personalized cancer therapy.

  4. Sequence-based classification and identification of Fungi.

    Hibbett, David; Abarenkov, Kessy; Kõljalg, Urmas; Öpik, Maarja; Chai, Benli; Cole, James; Wang, Qiong; Crous, Pedro; Robert, Vincent; Helgason, Thorunn; Herr, Joshua R; Kirk, Paul; Lueschow, Shiloh; O'Donnell, Kerry; Nilsson, R Henrik; Oono, Ryoko; Schoch, Conrad; Smyth, Christopher; Walker, Donald M; Porras-Alfaro, Andrea; Taylor, John W; Geiser, David M

    Fungal taxonomy and ecology have been revolutionized by the application of molecular methods and both have increasing connections to genomics and functional biology. However, data streams from traditional specimen- and culture-based systematics are not yet fully integrated with those from metagenomic and metatranscriptomic studies, which limits understanding of the taxonomic diversity and metabolic properties of fungal communities. This article reviews current resources, needs, and opportunities for sequence-based classification and identification (SBCI) in fungi as well as related efforts in prokaryotes. To realize the full potential of fungal SBCI it will be necessary to make advances in multiple areas. Improvements in sequencing methods, including long-read and single-cell technologies, will empower fungal molecular ecologists to look beyond ITS and current shotgun metagenomics approaches. Data quality and accessibility will be enhanced by attention to data and metadata standards and rigorous enforcement of policies for deposition of data and workflows. Taxonomic communities will need to develop best practices for molecular characterization in their focal clades, while also contributing to globally useful datasets including ITS. Changes to nomenclatural rules are needed to enable validPUBLICation of sequence-based taxon descriptions. Finally, cultural shifts are necessary to promote adoption of SBCI and to accord professional credit to individuals who contribute to community resources.

  5. Teaching Research Methodology Using a Project-Based Three Course Sequence Critical Reflections on Practice

    Braguglia, Kay H.; Jackson, Kanata A.

    2012-01-01

    This article presents a reflective analysis of teaching research methodology through a three course sequence using a project-based approach. The authors reflect critically on their experiences in teaching research methods courses in an undergraduate business management program. The introduction of a range of specific techniques including student…

  6. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    Arias Covadonga

    2007-06-01

    Full Text Available Abstract Background The ciliate protozoan Ichthyophthirius multifiliis (Ich is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate. Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan. BLASTX searches produced 2,518 significant (E-value -5 hits and further Gene Ontology (GO analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289. Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence.

  7. Environmental impact analysis for the main accidental sequences of ignitor

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-01-01

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs

  8. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Masking as an effective quality control method for next-generation sequencing data analysis.

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  10. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  11. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  12. Using Behavior Sequence Analysis to Map Serial Killers' Life Histories.

    Keatley, David A; Golightly, Hayley; Shephard, Rebecca; Yaksic, Enzo; Reid, Sasha

    2018-03-01

    The aim of the current research was to provide a novel method for mapping the developmental sequences of serial killers' life histories. An in-depth biographical account of serial killers' lives, from birth through to conviction, was gained and analyzed using Behavior Sequence Analysis. The analyses highlight similarities in behavioral events across the serial killers' lives, indicating not only which risk factors occur, but the temporal order of these factors. Results focused on early childhood environment, indicating the role of parental abuse; behaviors and events surrounding criminal histories of serial killers, showing that many had previous convictions and were known to police for other crimes; behaviors surrounding their murders, highlighting differences in victim choice and modus operandi; and, finally, trial pleas and convictions. The present research, therefore, provides a novel approach to synthesizing large volumes of data on criminals and presenting results in accessible, understandable outcomes.

  13. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.

    2005-01-01

    sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human...

  14. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  15. Novel DNA sequence detection method based on fluorescence energy transfer

    Kobayashi, S.; Tamiya, E.; Karube, I.

    1987-01-01

    Recently the detection of specific DNA sequence, DNA analysis, has been becoming more important for diagnosis of viral genomes causing infections disease and human sequences related to inherited disorders. These methods typically involve electrophoresis, the immobilization of DNA on a solid support, hybridization to a complementary probe, the detection using labeled with /sup 32/P or nonisotopically with a biotin-avidin-enzyme system, and so on. These techniques are highly effective, but they are very time-consuming and expensive. A principle of fluorescene energy transfer is that the light energy from an excited donor (fluorophore) is transferred to an acceptor (fluorophore), if the acceptor exists in the vicinity of the donor and the excitation spectrum of donor overlaps the emission spectrum of acceptor. In this study, the fluorescence energy transfer was applied to the detection of specific DNA sequence using the hybridization method. The analyte, single-stranded DNA labeled with the donor fluorophore is hybridized to a probe DNA labeled with the acceptor. Because of the complementary DNA duplex formation, two fluorophores became to be closed to each other, and the fluorescence energy transfer was occurred

  16. Nucleotide and Predicted Amino Acid Sequence-Based Analysis of the Avian Metapneumovirus Type C Cell Attachment Glycoprotein Gene: Phylogenetic Analysis and Molecular Epidemiology of U.S. Pneumoviruses

    Alvarez, Rene; Lwamba, Humphrey M.; Kapczynski, Darrell R.; Njenga, M. Kariuki; Seal, Bruce S.

    2003-01-01

    A serologically distinct avian metapneumovirus (aMPV) was isolated in the United States after an outbreak of turkey rhinotracheitis (TRT) in February 1997. The newly recognized U.S. virus was subsequently demonstrated to be genetically distinct from European subtypes and was designated aMPV serotype C (aMPV/C). We have determined the nucleotide sequence of the gene encoding the cell attachment glycoprotein (G) of aMPV/C (Colorado strain and three Minnesota isolates) and predicted amino acid sequence by sequencing cloned cDNAs synthesized from intracellular RNA of aMPV/C-infected cells. The nucleotide sequence comprised 1,321 nucleotides with only one predicted open reading frame encoding a protein of 435 amino acids, with a predicted Mr of 48,840. The structural characteristics of the predicted G protein of aMPV/C were similar to those of the human respiratory syncytial virus (hRSV) attachment G protein, including two mucin-like regions (heparin-binding domains) flanking both sides of a CX3C chemokine motif present in a conserved hydrophobic pocket. Comparison of the deduced G-protein amino acid sequence of aMPV/C with those of aMPV serotypes A, B, and D, as well as hRSV revealed overall predicted amino acid sequence identities ranging from 4 to 16.5%, suggesting a distant relationship. However, G-protein sequence identities ranged from 72 to 97% when aMPV/C was compared to other members within the aMPV/C subtype or 21% for the recently identified human MPV (hMPV) G protein. Ratios of nonsynonymous to synonymous nucleotide changes were greater than one in the G gene when comparing the more recent Minnesota isolates to the original Colorado isolate. Epidemiologically, this indicates positive selection among U.S. isolates since the first outbreak of TRT in the United States. PMID:12682171

  17. Discrepancy between Hepatitis C Virus Genotypes and NS4-Based Serotypes: Association with Their Subgenomic Sequences

    Nan Nwe Win

    2017-01-01

    Full Text Available Determination of hepatitis C virus (HCV genotypes plays an important role in the direct-acting agent era. Discrepancies between HCV genotyping and serotyping assays are occasionally observed. Eighteen samples with discrepant results between genotyping and serotyping methods were analyzed. HCV serotyping and genotyping were based on the HCV nonstructural 4 (NS4 region and 5′-untranslated region (5′-UTR, respectively. HCV core and NS4 regions were chosen to be sequenced and were compared with the genotyping and serotyping results. Deep sequencing was also performed for the corresponding HCV NS4 regions. Seventeen out of 18 discrepant samples could be sequenced by the Sanger method. Both HCV core and NS4 sequences were concordant with that of genotyping in the 5′-UTR in all 17 samples. In cloning analysis of the HCV NS4 region, there were several amino acid variations, but each sequence was much closer to the peptide with the same genotype. Deep sequencing revealed that minor clones with different subgenotypes existed in two of the 17 samples. Genotyping by genome amplification showed high consistency, while several false reactions were detected by serotyping. The deep sequencing method also provides accurate genotyping results and may be useful for analyzing discrepant cases. HCV genotyping should be correctly determined before antiviral treatment.

  18. Prediction of potential drug targets based on simple sequence properties

    Lai Luhua

    2007-09-01

    Full Text Available Abstract Background During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets. Results Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research. Conclusion We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.

  19. Taxonomy and phylogeny of the genus citrus based on the nuclear ribosomal dna its region sequence

    Sun, Y.L.

    2015-01-01

    The genus Citrus (Aurantioideae, Rutaceae) is the sole source of the citrus fruits of commerce showing high economic values. In this study, the taxonomy and phylogeny of Citrus species is evaluated using sequence analysis of the ITS region of nrDNA. This study is based on 26 plants materials belonging to 22 Citrus species having wild, domesticated, and cultivated species. Through DNA alignment of the ITS sequence, ITS1 and ITS2 regions showed relatively high variations of sequence length and nucleotide among these Citrus species. According to previous six-tribe discrimination theory by Swingle and Reece, the grouping in our ITS phylogenetic tree reconstructed by ITS sequences was not related to tribe discrimination but species discrimination. However, the molecular analysis could provide more information on citrus taxonomy. Combined with ITS sequences of other subgenera in then true citrus fruit tree group, the ITS phylogenetic tree indicated subgenera Citrus was monophyletic and nearer to Fortunella, Poncirus, and Clymenia compared to Microcitrus and Eremocitrus. Abundant sequence variations of the ITS region shown in this study would help species identification and tribe differentiation of the genus Citrus. (author)

  20. HLA class I sequence-based typing using DNA recovered from frozen plasma.

    Cotton, Laura A; Abdur Rahman, Manal; Ng, Carmond; Le, Anh Q; Milloy, M-J; Mo, Theresa; Brumme, Zabrina L

    2012-08-31

    We describe a rapid, reliable and cost-effective method for intermediate-to-high-resolution sequence-based HLA class I typing using frozen plasma as a source of genomic DNA. The plasma samples investigated had a median age of 8.5 years. Total nucleic acids were isolated from matched frozen PBMC (~2.5 million) and plasma (500 μl) samples from a panel of 25 individuals using commercial silica-based kits. Extractions yielded median [IQR] nucleic acid concentrations of 85.7 [47.0-130.0]ng/μl and 2.2 [1.7-2.6]ng/μl from PBMC and plasma, respectively. Following extraction, ~1000 base pair regions spanning exons 2 and 3 of HLA-A, -B and -C were amplified independently via nested PCR using universal, locus-specific primers and sequenced directly. Chromatogram analysis was performed using commercial DNA sequence analysis software and allele interpretation was performed using a free web-based tool. HLA-A, -B and -C amplification rates were 100% and chromatograms were of uniformly high quality with clearly distinguishable mixed bases regardless of DNA source. Concordance between PBMC and plasma-derived HLA types was 100% at the allele and protein levels. At the nucleotide level, a single partially discordant base (resulting from a failure to call both peaks in a mixed base) was observed out of >46,975 bases sequenced (>99.9% concordance). This protocol has previously been used to perform HLA class I typing from a variety of genomic DNA sources including PBMC, whole blood, granulocyte pellets and serum, from specimens up to 30 years old. This method provides comparable specificity to conventional sequence-based approaches and could be applied in situations where cell samples are unavailable or DNA quantities are limiting. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. Base Sequence Context Effects on Nucleotide Excision Repair

    Yuqin Cai

    2010-01-01

    Full Text Available Nucleotide excision repair (NER plays a critical role in maintaining the integrity of the genome when damaged by bulky DNA lesions, since inefficient repair can cause mutations and human diseases notably cancer. The structural properties of DNA lesions that determine their relative susceptibilities to NER are therefore of great interest. As a model system, we have investigated the major mutagenic lesion derived from the environmental carcinogen benzo[a]pyrene (B[a]P, 10S (+-trans-anti-B[a]P-2-dG in six different sequence contexts that differ in how the lesion is positioned in relation to nearby guanine amino groups. We have obtained molecular structural data by NMR and MD simulations, bending properties from gel electrophoresis studies, and NER data obtained from human HeLa cell extracts for our six investigated sequence contexts. This model system suggests that disturbed Watson-Crick base pairing is a better recognition signal than a flexible bend, and that these can act in concert to provide an enhanced signal. Steric hinderance between the minor groove-aligned lesion and nearby guanine amino groups determines the exact nature of the disturbances. Both nearest neighbor and more distant neighbor sequence contexts have an impact. Regardless of the exact distortions, we hypothesize that they provide a local thermodynamic destabilization signal for repair.

  2. DNA Barcoding: Amplification and sequence analysis of rbcl and matK genome regions in three divergent plant species

    Javed Iqbal Wattoo

    2016-11-01

    Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.

  3. Molecular characterization, sequence analysis and tissue expression of a porcine gene – MOSPD2

    Yang Jie

    2017-01-01

    Full Text Available The full-length cDNA sequence of a porcine gene, MOSPD2, was amplified using the rapid amplification of cDNA ends method based on a pig expressed sequence tag sequence which was highly homologous to the coding sequence of the human MOSPD2 gene. Sequence prediction analysis revealed that the open reading frame of this gene encodes a protein of 491 amino acids that has high homology with the motile sperm domain-containing protein 2 (MOSPD2 of five species: horse (89%, human (90%, chimpanzee (89%, rhesus monkey (89% and mouse (85%; thus, it could be defined as a porcine MOSPD2 gene. This novel porcine gene was assigned GeneID: 100153601. This gene is structured in 15 exons and 14 introns as revealed by computer-assisted analysis. The phylogenetic analysis revealed that the porcine MOSPD2 gene has a closer genetic relationship with the MOSPD2 gene of horse. Tissue expression analysis indicated that the porcine MOSPD2 gene is generally and differentially expressed in the spleen, muscle, skin, kidney, lung, liver, fat and heart. Our experiment is the first to establish the primary foundation for further research on the porcine MOSPD2 gene.

  4. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  5. Studies of base pair sequence effects on DNA solvation based on all

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization ...

  6. Single-base resolution and long-coverage sequencing based on single-molecule nanomanipulation

    An Hongjie; Huang Jiehuan; Lue Ming; Li Xueling; Lue Junhong; Li Haikuo; Zhang Yi; Li Minqian; Hu Jun

    2007-01-01

    We show new approaches towards a novel single-molecule sequencing strategy which consists of high-resolution positioning isolation of overlapping DNA fragments with atomic force microscopy (AFM), subsequent single-molecule PCR amplification and conventional Sanger sequencing. In this study, a DNA labelling technique was used to guarantee the accuracy in positioning the target DNA. Single-molecule multiplex PCR was carried out to test the contamination. The results showed that the two overlapping DNA fragments isolated by AFM could be successfully sequenced with high quality and perfect contiguity, indicating that single-base resolution and long-coverage sequencing have been achieved simultaneously

  7. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  8. Cloning and sequence analysis of cDNA coding for rat nucleolar protein C23

    Ghaffari, S.H.; Olson, M.O.J.

    1986-01-01

    Using synthetic oligonucleotides as primers and probes, the authors have isolated and sequenced cDNA clones encoding protein C23, a putative nucleolus organizer protein. Poly(A + ) RNA was isolated from rat Novikoff hepatoma cells and enriched in C23 mRNA by sucrose density gradient ultracentrifugation. Two deoxyoligonuleotides, a 48- and a 27-mer, were synthesized on the basis of amino acid sequence from the C-terminal half of protein C23 and cDNA sequence data from CHO cell protein. The 48-mer was used a primer for synthesis of cDNA which was then inserted into plasmid pUC9. Transformed bacterial colonies were screened by hybridization with 32 P labeled 27-mer. Two clones among 5000 gave a strong positive signal. Plasmid DNAs from these clones were purified and characterized by blotting and nucleotide sequence analysis. The length of C23 mRNA was estimated to be 3200 bases in a northern blot analysis. The sequence of a 267 b.p. insert shows high homology with the CHO cDNA with only 9 nucleotide differences and an identical amino acid sequence. These studies indicate that this region of the protein is highly conserved

  9. Spike-Based Bayesian-Hebbian Learning of Temporal Sequences

    Tully, Philip J; Lindén, Henrik; Hennig, Matthias H

    2016-01-01

    Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed...... in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN) learning rule. We find that the formation of distributed memories, embodied by increased periods...

  10. Complete Deletion of the Fucose Operon in Haemophilus influenzae Is Associated with a Cluster in Multilocus Sequence Analysis-Based Phylogenetic Group II Related to Haemophilus haemolyticus: Implications for Identification and Typing.

    de Gier, Camilla; Kirkham, Lea-Ann S; Nørskov-Lauritsen, Niels

    2015-12-01

    Nonhemolytic variants of Haemophilus haemolyticus are difficult to differentiate from Haemophilus influenzae despite a wide difference in pathogenic potential. A previous investigation characterized a challenging set of 60 clinical strains using multiple PCRs for marker genes and described strains that could not be unequivocally identified as either species. We have analyzed the same set of strains by multilocus sequence analysis (MLSA) and near-full-length 16S rRNA gene sequencing. MLSA unambiguously allocated all study strains to either of the two species, while identification by 16S rRNA sequence was inconclusive for three strains. Notably, the two methods yielded conflicting identifications for two strains. Most of the "fuzzy species" strains were identified as H. influenzae that had undergone complete deletion of the fucose operon. Such strains, which are untypeable by the H. influenzae multilocus sequence type (MLST) scheme, have sporadically been reported and predominantly belong to a single branch of H. influenzae MLSA phylogenetic group II. We also found evidence of interspecies recombination between H. influenzae and H. haemolyticus within the 16S rRNA genes. Establishing an accurate method for rapid and inexpensive identification of H. influenzae is important for disease surveillance and treatment. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  11. Dynamic Assessment of Microbial Ecology (DAME): A web app for interactive analysis and visualization of microbial sequencing data

    Dynamic Assessment of Microbial Ecology (DAME) is a shiny-based web application for interactive analysis and visualization of microbial sequencing data. DAME provides researchers not familiar with R programming the ability to access the most current R functions utilized for ecology and gene sequenci...

  12. Exact combinatorial reliability analysis of dynamic systems with sequence-dependent failures

    Xing Liudong; Shrestha, Akhilesh; Dai Yuanshun

    2011-01-01

    Many real-life fault-tolerant systems are subjected to sequence-dependent failure behavior, in which the order in which the fault events occur is important to the system reliability. Such systems can be modeled by dynamic fault trees (DFT) with priority-AND (pAND) gates. Existing approaches for the reliability analysis of systems subjected to sequence-dependent failures are typically state-space-based, simulation-based or inclusion-exclusion-based methods. Those methods either suffer from the state-space explosion problem or require long computation time especially when results with high degree of accuracy are desired. In this paper, an analytical method based on sequential binary decision diagrams is proposed. The proposed approach can analyze the exact reliability of non-repairable dynamic systems subjected to the sequence-dependent failure behavior. Also, the proposed approach is combinatorial and is applicable for analyzing systems with any arbitrary component time-to-failure distributions. The application and advantages of the proposed approach are illustrated through analysis of several examples. - Highlights: → We analyze the sequence-dependent failure behavior using combinatorial models. → The method has no limitation on the type of time-to-failure distributions. → The method is analytical and based on sequential binary decision diagrams (SBDD). → The method is computationally more efficient than existing methods.

  13. Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

    Ramos, Rommel Tj; Carneiro, Adriana R; Baumbach, Jan; Azevedo, Vasco; Schneider, Maria Pc; Silva, Artur

    2011-04-18

    Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated. We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads. Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

  14. Comparative analysis of catfish BAC end sequences with the zebrafish genome

    Abernathy Jason

    2009-12-01

    Full Text Available Abstract Background Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish. Results We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3% had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5, of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds. Conclusion BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.

  15. sRNAnalyzer-a flexible and customizable small RNA sequencing data analysis pipeline.

    Wu, Xiaogang; Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J; Wang, Kai

    2017-12-01

    Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. sRNAnalyzer—a flexible and customizable small RNA sequencing data analysis pipeline

    Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J.

    2017-01-01

    Abstract Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline—sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. PMID:29069500

  17. RNA-Seq analysis and gene discovery of Andrias davidianus using Illumina short read sequencing.

    Fenggang Li

    Full Text Available The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp. Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR. The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians.

  18. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome

    Santosh K. Tiwari

    2011-01-01

    Full Text Available The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs, in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0 software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5 software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  19. Face recognition based on matching of local features on 3D dynamic range sequences

    Echeagaray-Patrón, B. A.; Kober, Vitaly

    2016-09-01

    3D face recognition has attracted attention in the last decade due to improvement of technology of 3D image acquisition and its wide range of applications such as access control, surveillance, human-computer interaction and biometric identification systems. Most research on 3D face recognition has focused on analysis of 3D still data. In this work, a new method for face recognition using dynamic 3D range sequences is proposed. Experimental results are presented and discussed using 3D sequences in the presence of pose variation. The performance of the proposed method is compared with that of conventional face recognition algorithms based on descriptors.

  20. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Hongliang Liu

    Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  1. Extended -Regular Sequence for Automated Analysis of Microarray Images

    Jin Hee-Jeong

    2006-01-01

    Full Text Available Microarray study enables us to obtain hundreds of thousands of expressions of genes or genotypes at once, and it is an indispensable technology for genome research. The first step is the analysis of scanned microarray images. This is the most important procedure for obtaining biologically reliable data. Currently most microarray image processing systems require burdensome manual block/spot indexing work. Since the amount of experimental data is increasing very quickly, automated microarray image analysis software becomes important. In this paper, we propose two automated methods for analyzing microarray images. First, we propose the extended -regular sequence to index blocks and spots, which enables a novel automatic gridding procedure. Second, we provide a methodology, hierarchical metagrid alignment, to allow reliable and efficient batch processing for a set of microarray images. Experimental results show that the proposed methods are more reliable and convenient than the commercial tools.

  2. Sequence Quality Analysis Tool for HIV Type 1 Protease and Reverse Transcriptase

    DeLong, Allison K.; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W.; Kantor, Rami

    2012-01-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802...

  3. Chaos game representation of functional protein sequences, and simulation and multifractal analysis of induced measures

    Zu-Guo, Yu; Qian-Jun, Xiao; Long, Shi; Jun-Wu, Yu; Anh, Vo

    2010-01-01

    Investigating the biological function of proteins is a key aspect of protein studies. Bioinformatic methods become important for studying the biological function of proteins. In this paper, we first give the chaos game representation (CGR) of randomly-linked functional protein sequences, then propose the use of the recurrent iterated function systems (RIFS) in fractal theory to simulate the measure based on their chaos game representations. This method helps to extract some features of functional protein sequences, and furthermore the biological functions of these proteins. Then multifractal analysis of the measures based on the CGRs of randomly-linked functional protein sequences are performed. We find that the CGRs have clear fractal patterns. The numerical results show that the RIFS can simulate the measure based on the CGR very well. The relative standard error and the estimated probability matrix in the RIFS do not depend on the order to link the functional protein sequences. The estimated probability matrices in the RIFS with different biological functions are evidently different. Hence the estimated probability matrices in the RIFS can be used to characterise the difference among linked functional protein sequences with different biological functions. From the values of the D q curves, one sees that these functional protein sequences are not completely random. The D q of all linked functional proteins studied are multifractal-like and sufficiently smooth for the C q (analogous to specific heat) curves to be meaningful. Furthermore, the D q curves of the measure μ based on their CGRs for different orders to link the functional protein sequences are almost identical if q ≥ 0. Finally, the C q curves of all linked functional proteins resemble a classical phase transition at a critical point. (cross-disciplinary physics and related areas of science and technology)

  4. PCR-based assays versus direct sequencing for evaluating the effect of KRAS status on anti-EGFR treatment response in colorectal cancer patients: a systematic review and meta-analysis.

    Lianfeng Shan

    Full Text Available The survival rate of colorectal cancer (CRC patients carrying wild-type KRAS is significantly increased by combining anti-EGFR monoclonal antibody (mAb with standard chemotherapy. However, conflicting data exist in both the wild-type KRAS and mutant KRAS groups, which strongly challenge CRC anti-EGFR treatment. Here we conducted a meta-analysis in an effort to provide more reliable information regarding anti-EGFR treatment in CRC patients.We searched full reports of randomized clinical trials using Medline, the American Society of Clinical Oncology (ASCO, and the European Society for Medical Oncology (ESMO. Two investigators independently screened the published literature according to our inclusive and exclusive criteria and the relative data were extracted. We used Review Manager 5.2 software to analyze the data.The addition of anti-EGFR mAb to standard chemotherapy significantly improved both progression-free survival (PFS and median overall survival (mOS in the wild-type KRAS group; hazard ratios (HRs for PFS and mOS were 0.70 [95% confidence interval (CI, 0.58-0.84] and 0.83 [95% CI, 0.75-0.91], respectively. In sub-analyses of the wild-type KRAS group, when PCR-based assays are employed, PFS and mOS notably increase: the HRs were 0.74 [95% CI, 0.62-0.88] and 0.87 [95% CI, 0.78-0.96], respectively. In sub-analyses of the mutant KRAS group, neither PCR-based assays nor direct sequencing enhance PFS or mOS.Our data suggest that PCR-based assays with high sensitivity and specificity allow accurate identification of patients with wild-type KRAS and thus increase PFS and mOS. Furthermore, such assays liberate patients with mutant KRAS from unnecessary drug side effects, and provide them an opportunity to receive appropriate treatment. Thus, establishing a precise standard reference test will substantially optimize CRC-targeted therapies.

  5. Estimation of a Killer Whale (Orcinus orca Population's Diet Using Sequencing Analysis of DNA from Feces.

    Michael J Ford

    Full Text Available Estimating diet composition is important for understanding interactions between predators and prey and thus illuminating ecosystem function. The diet of many species, however, is difficult to observe directly. Genetic analysis of fecal material collected in the field is therefore a useful tool for gaining insight into wild animal diets. In this study, we used high-throughput DNA sequencing to quantitatively estimate the diet composition of an endangered population of wild killer whales (Orcinus orca in their summer range in the Salish Sea. We combined 175 fecal samples collected between May and September from five years between 2006 and 2011 into 13 sample groups. Two known DNA composition control groups were also created. Each group was sequenced at a ~330bp segment of the 16s gene in the mitochondrial genome using an Illumina MiSeq sequencing system. After several quality controls steps, 4,987,107 individual sequences were aligned to a custom sequence database containing 19 potential fish prey species and the most likely species of each fecal-derived sequence was determined. Based on these alignments, salmonids made up >98.6% of the total sequences and thus of the inferred diet. Of the six salmonid species, Chinook salmon made up 79.5% of the sequences, followed by coho salmon (15%. Over all years, a clear pattern emerged with Chinook salmon dominating the estimated diet early in the summer, and coho salmon contributing an average of >40% of the diet in late summer. Sockeye salmon appeared to be occasionally important, at >18% in some sample groups. Non-salmonids were rarely observed. Our results are consistent with earlier results based on surface prey remains, and confirm the importance of Chinook salmon in this population's summer diet.

  6. Studies of base pair sequence effects on DNA solvation based on all-atom molecular dynamics simulations.

    Dixit, Surjit B; Mezei, Mihaly; Beveridge, David L

    2012-07-01

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization were observed in these simulations. The results were compared to essentially all known experimental data on the subject. Proximity analysis was employed to highlight the sequence dependent differences in solvation and ion localization properties in the grooves of DNA. Comparison of the MD-calculated DNA structure with canonical A- and B-forms supports the idea that the G/C-rich sequences are closer to canonical A- than B-form structures, while the reverse is true for the poly A sequences, with the exception of the alternating ATAT sequence. Analysis of hydration density maps reveals that the flexibility of solute molecule has a significant effect on the nature of observed hydration. Energetic analysis of solute-solvent interactions based on proximity analysis of solvent reveals that the GC or CG base pairs interact more strongly with water molecules in the minor groove of DNA that the AT or TA base pairs, while the interactions of the AT or TA pairs in the major groove are stronger than those of the GC or CG pairs. Computation of solvent-accessible surface area of the nucleotide units in the simulated trajectories reveals that the similarity with results derived from analysis of a database of crystallographic structures is excellent. The MD trajectories tend to follow Manning's counterion condensation theory, presenting a region of condensed counterions within a radius of about 17 A from the DNA surface independent of sequence. The GC and CG pairs tend to associate with cations in the major groove of the DNA structure to a greater extent than the AT and TA pairs. Cation association is more frequent in the minor groove of AT than the GC pairs. In general, the

  7. Histoimmunogenetics Markup Language 1.0: Reporting next generation sequencing-based HLA and KIR genotyping.

    Milius, Robert P; Heuer, Michael; Valiga, Daniel; Doroschak, Kathryn J; Kennedy, Caleb J; Bolon, Yung-Tsi; Schneider, Joel; Pollack, Jane; Kim, Hwa Ran; Cereb, Nezih; Hollenbach, Jill A; Mack, Steven J; Maiers, Martin

    2015-12-01

    We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and Sequence Based Typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  8. A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer.

    Álvarez-Martos, Isabel; Ferapontova, Elena E

    2017-08-05

    A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus, is not an aptamer and cannot be used neither for in vivo nor in situ analysis of dopamine in the presence of structurally related neurotransmitters. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

    Francisco Alexandre P

    2012-05-01

    Full Text Available Abstract Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

  10. Sequence-based classification using discriminatory motif feature selection.

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  11. A trace display and editing program for data from fluorescence based sequencing machines.

    Gleeson, T; Hillier, L

    1991-12-11

    'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.

  12. CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

    Spraggins Thomas A

    2007-04-01

    Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the

  13. Heart rate measurement based on face video sequence

    Xu, Fang; Zhou, Qin-Wu; Wu, Peng; Chen, Xing; Yang, Xiaofeng; Yan, Hong-jian

    2015-03-01

    This paper proposes a new non-contact heart rate measurement method based on photoplethysmography (PPG) theory. With this method we can measure heart rate remotely with a camera and ambient light. We collected video sequences of subjects, and detected remote PPG signals through video sequences. Remote PPG signals were analyzed with two methods, Blind Source Separation Technology (BSST) and Cross Spectral Power Technology (CSPT). BSST is a commonly used method, and CSPT is used for the first time in the study of remote PPG signals in this paper. Both of the methods can acquire heart rate, but compared with BSST, CSPT has clearer physical meaning, and the computational complexity of CSPT is lower than that of BSST. Our work shows that heart rates detected by CSPT method have good consistency with the heart rates measured by a finger clip oximeter. With good accuracy and low computational complexity, the CSPT method has a good prospect for the application in the field of home medical devices and mobile health devices.

  14. System-level hazard analysis using the sequence-tree method

    Huang, H.-W.; Shih Chunkuan; Yih Swu; Chen, M.-H.

    2008-01-01

    A system-level PHA using the sequence-tree method is presented to perform safety-related digital I and C system SSA. The conventional PHA involves brainstorming among experts on various portions of the system to identify hazards through discussions. However, since the conventional PHA is not a systematic technique, the analysis results depend strongly on the experts' subjective opinions. The quality of analysis cannot be appropriately controlled. Therefore, this study presents a system-level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. This sequence-tree-based technique has two major phases. The first phase adopts a table to analyze each event in SAR Chapter 15 for a specific safety-related I and C system, such as RPS. The second phase adopts a sequence tree to recognize the I and C systems involved in the event, the working of the safety-related systems and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. The defense-in-depth echelons, namely the Control echelon, Reactor trip echelon, ESFAS echelon and Monitoring and indicator echelon, are arranged to build the sequence-tree structure. All the related I and C systems, including the digital systems and the analog back-up systems, are allocated in their specific echelons. This system-centric sequence-tree analysis not only systematically identifies preliminary hazards, but also vulnerabilities in a nuclear power plant. Hence, an effective simplified D3 evaluation can also be conducted

  15. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  16. Reverse transcriptase sequences from mulberry LTR retrotransposons: characterization analysis

    Ma Bi

    2017-10-01

    Full Text Available Copia and Gypsy play important roles in structural, functional and evolutionary dynamics of plant genomes. In this study, a total of 106 and 101, Copia and Gypsy reverse transcriptase (rt were amplified respectively in the Morus notabilis genome using degenerate primers. All sequences exhibited high levels of heterogeneity, were rich in AT and possessed higher sequence divergence of Copia rt in comparison to Gypsy rt. Two reasons are likely to account for this phenomenon: a these elements often experience deletions or fragmentation by illegitimate or unequal homologous recombination in the transposition process; b strong purifying selective pressure drives the evolution of these elements through “selective silencing” with random mutation and eventual deletion from the host genome. Interestingly, mulberry rt clustered with other rt from distantly related taxa according to the phylogenetic analysis. This phenomenon did not result from horizontal transposable element transfer. Results obtained from fluorescence in situ hybridization revealed that most of the hybridization signals were preferentially concentrated in pericentromeric and distal regions of chromosomes, and these elements may play important roles in the regions in which they are found. Results of this study support the continued pursuit of further functional studies of Copia and Gypsy in the mulberry genome.

  17. Molecular diagnosis of Usher syndrome: application of two different next generation sequencing-based procedures.

    Danilo Licastro

    Full Text Available Usher syndrome (USH is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II and Roche 454 (GS FLX for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified.

  18. Molecular Diagnosis of Usher Syndrome: Application of Two Different Next Generation Sequencing-Based Procedures

    Licastro, Danilo; Mutarelli, Margherita; Peluso, Ivana; Neveling, Kornelia; Wieskamp, Nienke; Rispoli, Rossella; Vozzi, Diego; Athanasakis, Emmanouil; D'Eustacchio, Angela; Pizzo, Mariateresa; D'Amico, Francesca; Ziviello, Carmela; Simonelli, Francesca; Fabretto, Antonella; Scheffer, Hans; Gasparini, Paolo; Banfi, Sandro; Nigro, Vincenzo

    2012-01-01

    Usher syndrome (USH) is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS) technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II) and Roche 454 (GS FLX) for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous) out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified. PMID:22952768

  19. Implementation of an RFID-Based Sequencing-Error-Proofing System for Automotive Manufacturing Logistics

    Yong-Shin Kang

    2018-01-01

    Full Text Available Serialized tracing provides the ability to track and trace the lifecycle of the products and parts. Unlike barcodes, Radio frequency identification (RFID, which is an important building block for internet of things (IoT, does not require a line of sight and has the advantages of recognizing many objects simultaneously and rapidly, and storing more information than barcodes. Therefore, RFID has been used in a variety of application domains such as logistics, distributions, and manufacturing, significantly improving traceability and process efficiency. In this study, we applied RFID to improve the just-in-sequence operation of an automotive inbound logistics process. First, we implemented an RFID-based visibility system for real-time traceability and control of part supply from the production lines of suppliers to the assembly line of a car manufacturer. Second, we developed an RFID-based sequence-error proofing system to avoid accidental line stops due to incorrect part sequencing. The whole system has been successfully installed in a rear-axle inbound logistics process of GM Korea. We achieved a significant amount of cost savings, especially due to the prevention of sequencing errors and part shortages, and the reduction of manual operations. Thorough cost-benefit analysis demonstrates the clear economic feasibility of using RFID technologies for the just-in-sequence inbound logistics in an automobile manufacturing environment.

  20. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  1. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene

    Heilbronn, T.; Jahn, G.; Buerkle, A.; Freese, U.K.; Fleckenstein, B.; Zur Hausen, H.

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSF-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at T/sub m/ - 25/degrees/C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Esptein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein

  2. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  3. On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    Over the last decades, we have observed an ongoing tremendous growth of available sequencing data fueled by the advancements in wet-lab technology. The sequencing information is only the beginning of the actual understanding of how organisms survive and prosper. It is, for instance, equally...... important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...

  4. The sequence relay selection strategy based on stochastic dynamic programming

    Zhu, Rui; Chen, Xihao; Huang, Yangchao

    2017-07-01

    Relay-assisted (RA) network with relay node selection is a kind of effective method to improve the channel capacity and convergence performance. However, most of the existing researches about the relay selection did not consider the statically channel state information and the selection cost. This shortage limited the performance and application of RA network in practical scenarios. In order to overcome this drawback, a sequence relay selection strategy (SRSS) was proposed. And the performance upper bound of SRSS was also analyzed in this paper. Furthermore, in order to make SRSS more practical, a novel threshold determination algorithm based on the stochastic dynamic program (SDP) was given to work with SRSS. Numerical results are also presented to exhibit the performance of SRSS with SDP.

  5. An analysis of LOCA sequences in the development of severe accident analysis DB

    Choi, Young; Park, Soo Yong; Ahn, Kwang-Il; Kim, D.H.

    2006-01-01

    Although a Level 2 PSA was performed for the Korean Standard Power Plants (KSNPs), and it considered the necessary sequences for an assessment of the containment integrity and source term analysis. In terms of an accident management, however, more cases causing severe core damage need to be analyzed and arranged systematically for an easy access to the results. At present, KAERI is calculating the severe accident sequences intensively for various initiating events and generating a database for the accident progression including thermal hydraulic and source term behaviours. The developed Database (DB) system includes a graphical display for a plant and equipment status, previous research results by knowledge-base technique, and the expected plant behaviour. The plant model used in this paper is oriented to the case of LOCAs related severe accident phenomena and thus can simulate the plant behaviours for a severe accident. Therefore the developed system may play a central role as an information source for decision-making for a severe accident management, and will be used as a training simulator for a severe accident management. (author)

  6. Nonlinear analysis of sequence repeats of multi-domain proteins

    Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: lmf_bill@sina.com

    2007-11-15

    Many multi-domain proteins have repetitive three-dimensional structures but nearly-random amino acid sequences. In the present paper, by using a modified recurrence plot proposed by us previously, we show that these amino acid sequences have hidden repetitions in fact. These results indicate that the repetitive domain structures are encoded by the repetitive sequences. This also gives a method to detect the repetitive domain structures directly from amino acid sequences.

  7. Welding distortion analysis of multipass joint combination with different sequences using 3D FEM and experiment

    Manurung, Yupiter H.P.; Lidam, Robert Ngendang; Rahim, M. Ridzwan; Zakaria, M. Yusof; Redza, M. Ridhwan; Sulaiman, M. Shahar; Tham, Ghalib; Abas, Sunhaji K.

    2013-01-01

    This paper presents an investigation of the welding sequence effect on induced angular distortion using FEM and experiments. The specimen of a combined joint geometry was modeled and simulated using Multipass Welding Advisor (MWA) in SYSWELD 2010 based on the thermal-elastic-plastic approach with low manganese carbon steel S3355J2G3 as specimen material and Goldak's double ellipsoid as heat source model. To validate the simulation results, a series of experiments was conducted with two different welding sequences using automated welding process, low carbon steel as parent metal, digital GMAW power source with premixed shielding gas and both-sided clamping technique. Based on the results, it was established that the thermo-elastic-plastic 3D FEM analysis shows good agreement with experimental results and the welding sequence “from outside to inside” induced less angular distortion compared to “from inside to outside”. -- Highlights: • 3D FEM was used to analyze the welding distortion on two different sequences. • Simulation results were validated with experiments using automated welding system. • Simulation results and experiments showed acceptable accuracy. • Welding sequence “outside–inside” showed less distortion than “inside–outside”

  8. Intergeneric Classification of Genus Bulbophyllum from Peninsular Malaysia Based on Combined Morphological and RBCL Sequence Data

    Hosseini, S.; Dadkhah, K.

    2016-01-01

    Bulbophyllum Thou. is largest genus in Orchidaceae family and a well-known plant of tropical area. The present study provides a comparative morphological study of 38 Bulbophyllum spp. as well as molecular sequence analysis of large subunit of rubisco (rbcL), to infer the intergeneric classification for studied taxa of genus Bulbophyllum. Thirty morphological characters were coded in a data matrix, and used in phenetic analysis. Morphological result was strongly consistent with earlier classification, with exception of B. auratum, B. gracillimum, B. mutabile and B. limbatum status. Furthermore Molecular data analysis of rbcL was congruent with morphological data in some aspects. Species interrelationships specified using combination of rbcL sequence data with morphological data. The results revealed close affiliation in 11 sections of Bulbophyllum from Peninsular Malaysia. Consequently, based on this study generic status of sections Cirrhopetalum and Epicrianthes cannot longer be supported, as they are deeply embedded within the genus Bulbophyllum. (author)

  9. Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

    Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

    2018-01-01

    The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/

  10. Sequence analysis of cereal sucrose synthase genes and isolation ...

    SERVER

    2007-10-18

    Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.

  11. Chimera: construction of chimeric sequences for phylogenetic analysis

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  12. Analysis on sequence stratigraphy and depositional systems of Mangbang formation, upper tertiary in Longchuanjiang basin

    Sun Zexuan; Yao Yifeng; Chen Yong; Li Guoxin

    2004-01-01

    Longchuanjiang basin is a small Cenozoic intramontane down-faulted basin. This paper, combining the Pliocene structure, the volcanic activities and the sedimentation of the basin, analyses the sequence stratigraphy and the depositional systems of Mangbang formation (the cover of the basin). Based on the analysis of depositional systems of Mangbang formation, the depositional pattern of Pliocene in Longchuanjiang basin is set up. It is suggested that because of the fast accumulation in early down-faulted zone during Pliocene time, the alluvial fan depositional system was dominated at that time. During the middle-late period, the alluvial fan entered the lake forming a combination of fan-fandelta-lacustrine depositional systems. Authors propose a view point that the formation of Mangbang formation sequence was constrained by multistage tectonic movement, and three structural sequences were established, and system tracts were divided. (authors)

  13. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  14. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs

  15. A Quantitative Accident Sequence Analysis for a VHTR

    Kim, Jintae; Lee, Joeun; Jae, Moosung [Hanyang University, Seoul (Korea, Republic of)

    2016-05-15

    In Korea, the basic design features of VHTR are currently discussed in the various design concepts. Probabilistic risk assessment (PRA) offers a logical and structured method to assess risks of a large and complex engineered system, such as a nuclear power plant. It will be introduced at an early stage in the design, and will be upgraded at various design and licensing stages as the design matures and the design details are defined. Risk insights to be developed from the PRA are viewed as essential to developing a design that is optimized in meeting safety objectives and in interpreting the applicability of the existing demands to the safety design approach of the VHTR. In this study, initiating events which may occur in VHTRs were selected through MLD method. The initiating events were then grouped into four categories for the accident sequence analysis. Initiating events frequency and safety systems failure rate were calculated by using reliability data obtained from the available sources and fault tree analysis. After quantification, uncertainty analysis was conducted. The SR and LR frequency are calculated respectively 7.52E- 10/RY and 7.91E-16/RY, which are relatively less than the core damage frequency of LWRs.

  16. The analysis of novel microRNA mimic sequences in cancer cells reveals lack of specificity in stem-loop RT-qPCR-based microRNA detection.

    Winata, Patrick; Williams, Marissa; McGowan, Eileen; Nassif, Najah; van Zandwijk, Nico; Reid, Glen

    2017-11-17

    MicroRNAs are frequently downregulated in cancer, and restoring expression has tumour suppressive activity in tumour cells. Our recent phase I clinical trial investigated microRNA-based therapy in patients with malignant pleural mesothelioma. Treatment with TargomiRs, microRNA mimics with novel sequence packaged in EGFR antibody-targeted bacterial minicells, revealed clear signs of clinical activity. In order to detect delivery of microRNA mimics to tumour cells in future clinical trials, we tested hydrolysis probe-based assays specific for the sequence of the novel mimics in transfected mesothelioma cell lines using RT-qPCR. The custom assays efficiently and specifically amplified the consensus mimics. However, we found that these assays gave a signal when total RNA from untransfected and control mimic-transfected cells were used as templates. Further investigation revealed that the reverse transcription step using stem-loop primers appeared to introduce substantial non-specific amplification with either total RNA or synthetic RNA templates. This suggests that reverse transcription using stem-loop primers suffers from an intrinsic lack of specificity for the detection of highly similar microRNAs in the same family, especially when analysing total RNA. These results suggest that RT-qPCR is unlikely to be an effective means to detect delivery of microRNA mimic-based drugs to tumour cells in patients.

  17. Internal event analysis for Laguna Verde Unit 1 Nuclear Power Plant. Accident sequence quantification and results

    Huerta B, A.; Aguilar T, O.; Nunez C, A.; Lopez M, R.

    1994-01-01

    The Level 1 results of Laguna Verde Nuclear Power Plant PRA are presented in the I nternal Event Analysis for Laguna Verde Unit 1 Nuclear Power Plant, CNSNS-TR 004, in five volumes. The reports are organized as follows: CNSNS-TR 004 Volume 1: Introduction and Methodology. CNSNS-TR4 Volume 2: Initiating Event and Accident Sequences. CNSNS-TR 004 Volume 3: System Analysis. CNSNS-TR 004 Volume 4: Accident Sequence Quantification and Results. CNSNS-TR 005 Volume 5: Appendices A, B and C. This volume presents the development of the dependent failure analysis, the treatment of the support system dependencies, the identification of the shared-components dependencies, and the treatment of the common cause failure. It is also presented the identification of the main human actions considered along with the possible recovery actions included. The development of the data base and the assumptions and limitations in the data base are also described in this volume. The accident sequences quantification process and the resolution of the core vulnerable sequences are presented. In this volume, the source and treatment of uncertainties associated with failure rates, component unavailabilities, initiating event frequencies, and human error probabilities are also presented. Finally, the main results and conclusions for the Internal Event Analysis for Laguna Verde Nuclear Power Plant are presented. The total core damage frequency calculated is 9.03x 10-5 per year for internal events. The most dominant accident sequences found are the transients involving the loss of offsite power, the station blackout accidents, and the anticipated transients without SCRAM (ATWS). (Author)

  18. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  19. The sequence and analysis of duplication rich human chromosome 16

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  20. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  1. Factoring local sequence composition in motif significance analysis.

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  2. MIToS.jl: mutual information tools for protein sequence analysis in the Julia language

    Zea, Diego J.; Anfossi, Diego; Nielsen, Morten

    2017-01-01

    Motivation: MIToS is an environment for mutual information analysis and a framework for protein multiple sequence alignments (MSAs) and protein structures (PDB) management in Julia language. It integrates sequence and structural information through SIFTS, making Pfam MSAs analysis straightforward....... MIToS streamlines the implementation of any measure calculated from residue contingency tables and its optimization and testing in terms of protein contact prediction. As an example, we implemented and tested a BLOSUM62-based pseudo-count strategy in mutual information analysis. Availability...... and Implementation: The software is totally implemented in Julia and supported for Linux, OS X and Windows. It’s freely available on GitHub under MIT license: http://mitos.leloir.org.ar. Contacts:diegozea@gmail.com or cmb@leloir.org.ar Supplementary information: Supplementary data are available at Bioinformatics...

  3. Characterising the CRISPR immune system in Archaea using genome sequence analysis

    Shah, Shiraz Ali

    Archaea, a group of microorganisms distinct from bacteria and eukaryotes, are equipped with an adaptive immune system called the CRISPR system, which relies on an RNA interference mechanism to combat invading viruses and plasmids. Using a genome sequence analysis approach, the four components...... of archaeal genomic CRISPR loci were analysed, namely, repeats, spacers, leaders and cas genes. Based on analysis of spacer sequences it was predicted that the immune system combats viruses and plasmids by targeting their DNA. Furthermore, analysis of repeats, leaders and cas genes revealed that CRISPR...... systems exist as distinct families which have key differences between themselves. Closely related organisms were seen harbouring different CRISPR systems, while some distantly related species carried similar systems, indicating frequent horizontal exchange. Moreover, it was found that cas genes of Type I...

  4. Comparing methods of classifying life courses: Sequence analysis and latent class analysis

    Elzinga, C.H.; Liefbroer, Aart C.; Han, Sapphire

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  5. Comparing methods of classifying life courses: sequence analysis and latent class analysis

    Han, Y.; Liefbroer, A.C.; Elzinga, C.

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  6. An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.

    Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit

    2016-05-26

    Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.

  7. Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis.

    Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

    2004-02-01

    To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.

  8. Differentiation of Xylella fastidiosa Strains via Multilocus Sequence Analysis of Environmentally Mediated Genes (MLSA-E)

    Parker, Jennifer K.; Havird, Justin C.; De La Fuente, Leonardo

    2012-01-01

    Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of enviro...

  9. Human factors review for nuclear power plant severe accident sequence analysis

    Krois, P.A.; Haas, P.M.

    1985-01-01

    The paper discusses work conducted to: (1) support the severe accident sequence analysis of a nuclear power plant transient based on an assessment of operator actions, and (2) develop a descriptive model of operator severe accident management. Operator actions during the transient are assessed using qualitative and quantitative methods. A function-oriented accident management model provides a structure for developing technical operator guidance on mitigating core damage preventing radiological release

  10. DNA cross-linking by dehydromonocrotaline lacks apparent base sequence preference.

    Rieben, W Kurt; Coulombe, Roger A

    2004-12-01

    Pyrrolizidine alkaloids (PAs) are ubiquitous plant toxins, many of which, upon oxidation by hepatic mixed-function oxidases, become reactive bifunctional pyrrolic electrophiles that form DNA-DNA and DNA-protein cross-links. The anti-mitotic, toxic, and carcinogenic action of PAs is thought to be caused, at least in part, by these cross-links. We wished to determine whether the activated PA pyrrole dehydromonocrotaline (DHMO) exhibits base sequence preferences when cross-linked to a set of model duplex poly A-T 14-mer oligonucleotides with varying internal and/or end 5'-d(CG), 5'-d(GC), 5'-d(TA), 5'-d(CGCG), or 5'-d(GCGC) sequences. DHMO-DNA cross-links were assessed by electrophoretic mobility shift assay (EMSA) of 32P endlabeled oligonucleotides and by HPLC analysis of cross-linked DNAs enzymatically digested to their constituent deoxynucleosides. The degree of DNA cross-links depended upon the concentration of the pyrrole, but not on the base sequence of the oligonucleotide target. Likewise, HPLC chromatograms of cross-linked and digested DNAs showed no discernible sequence preference for any nucleotide. Added glutathione, tyrosine, cysteine, and aspartic acid, but not phenylalanine, threonine, serine, lysine, or methionine competed with DNA as alternate nucleophiles for cross-linking by DHMO. From these data it appears that DHMO exhibits no strong base preference when forming cross-links with DNA, and that some cellular nucleophiles can inhibit DNA cross-link formation.

  11. Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx

    Colbourne John K

    2009-05-01

    Full Text Available Abstract Background New methods are needed for genomic-scale analysis of emerging model organisms that exemplify important biological questions but lack fully sequenced genomes. For example, there is an urgent need to understand the potential for corals to adapt to climate change, but few molecular resources are available for studying these processes in reef-building corals. To facilitate genomics studies in corals and other non-model systems, we describe methods for transcriptome sequencing using 454, as well as strategies for assembling a useful catalog of genes from the output. We have applied these methods to sequence the transcriptome of planulae larvae from the coral Acropora millepora. Results More than 600,000 reads produced in a single 454 sequencing run were assembled into ~40,000 contigs with five-fold average sequencing coverage. Based on sequence similarity with known proteins, these analyses identified ~11,000 different genes expressed in a range of conditions including thermal stress and settlement induction. Assembled sequences were annotated with gene names, conserved domains, and Gene Ontology terms. Targeted searches using these annotations identified the majority of genes associated with essential metabolic pathways and conserved signaling pathways, as well as novel candidate genes for stress-related processes. Comparisons with the genome of the anemone Nematostella vectensis revealed ~8,500 pairs of orthologs and ~100 candidate coral-specific genes. More than 30,000 SNPs were detected in the coral sequences, and a subset of these validated by re-sequencing. Conclusion The methods described here for deep sequencing of the transcriptome should be widely applicable to generate catalogs of genes and genetic markers in emerging model organisms. Our data provide the most comprehensive sequence resource currently available for reef-building corals, and include an extensive collection of potential genetic markers for association and

  12. Genetic diversity analysis of Leuconostoc mesenteroides from Korean vegetables and food products by multilocus sequence typing.

    Sharma, Anshul; Kaur, Jasmine; Lee, Sulhee; Park, Young-Seo

    2018-06-01

    In the present study, 35 Leuconostoc mesenteroides strains isolated from vegetables and food products from South Korea were studied by multilocus sequence typing (MLST) of seven housekeeping genes (atpA, groEL, gyrB, pheS, pyrG, rpoA, and uvrC). The fragment sizes of the seven amplified housekeeping genes ranged in length from 366 to 1414 bp. Sequence analysis indicated 27 different sequence types (STs) with 25 of them being represented by a single strain indicating high genetic diversity, whereas the remaining 2 were characterized by five strains each. In total, 220 polymorphic nucleotide sites were detected among seven housekeeping genes. The phylogenetic analysis based on the STs of the seven loci indicated that the 35 strains belonged to two major groups, A (28 strains) and B (7 strains). Split decomposition analysis showed that intraspecies recombination played a role in generating diversity among strains. The minimum spanning tree showed that the evolution of the STs was not correlated with food source. This study signifies that the multilocus sequence typing is a valuable tool to access the genetic diversity among L. mesenteroides strains from South Korea and can be used further to monitor the evolutionary changes.

  13. A novel RNA sequencing data analysis method for cell line authentication.

    Erik Fasterius

    Full Text Available We have developed a novel analysis method that can interrogate the authenticity of biological samples used for generation of transcriptome profiles in public data repositories. The method uses RNA sequencing information to reveal mutations in expressed transcripts and subsequently confirms the identity of analysed cells by comparison with publicly available cell-specific mutational profiles. Cell lines constitute key model systems widely used within cancer research, but their identity needs to be confirmed in order to minimise the influence of cell contaminations and genetic drift on the analysis. Using both public and novel data, we demonstrate the use of RNA-sequencing data analysis for cell line authentication by examining the validity of COLO205, DLD1, HCT15, HCT116, HKE3, HT29 and RKO colorectal cancer cell lines. We successfully authenticate the studied cell lines and validate previous reports indicating that DLD1 and HCT15 are synonymous. We also show that the analysed HKE3 cells harbour an unexpected KRAS-G13D mutation and confirm that this cell line is a genuine KRAS dosage mutant, rather than a true isogenic derivative of HCT116 expressing only the wild type KRAS. This authentication method could be used to revisit the numerous cell line based RNA sequencing experiments available in public data repositories, analyse new experiments where whole genome sequencing is not available, as well as facilitate comparisons of data from different experiments, platforms and laboratories.

  14. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  15. Amelioration of the cooling load based chiller sequencing control

    Huang, Sen; Zuo, Wangda; Sohn, Michael D.

    2016-01-01

    Highlights: • We developed a new approach for the optimal load distribution for chillers. • We proposed a new approach to optimize the number of operating chillers. • We provided a holistic solution to address chiller sequencing control problems. - Abstract: Cooling Load based Control (CLC) for the chiller sequencing is a commonly used control strategy for multiple-chiller plants. To improve the energy efficiency of these chiller plants, researchers proposed various CLC optimization approaches, which can be divided into two groups: studies to optimize the load distribution and studies to identify the optimal number of operating chillers. However, both groups have their own deficiencies and do not consider the impact of each other. This paper aims to improve the CLC by proposing three new approaches. The first optimizes the load distribution by adjusting the critical points for the chiller staging, which is easier to be implemented than existing approaches. In addition, by considering the impact of the load distribution on the cooling tower energy consumption and the pump energy consumption, this approach can achieve a better energy saving. The second optimizes the number of operating chillers by modulating the critical points and the condenser water set point in order to achieve the minimal energy consumption of the entire chiller plant that may not be guaranteed by existing approaches. The third combines the first two approaches to provide a holistic solution. The proposed three approaches were evaluated via a case study. The results show that the total energy consumption saving for the studied chiller plant is 0.5%, 5.3% and 5.6% by the three approaches, respectively. An energy saving of 4.9–11.8% can be achieved for the chillers at the cost of more energy consumption by the cooling towers (increases of 5.8–43.8%). The pumps’ energy saving varies from −8.6% to 2.0%, depending on the approach.

  16. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure...... that the data produced is optimal. Although much of the procedure can be followed directly from the manufacturer's protocols, the key differences lie in the library preparation steps. This chapter presents an optimized protocol for the sequencing of fossil remains and museum specimens, commonly referred...

  17. MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

    Grimes, Susan M; Ji, Hanlee P

    2014-08-27

    Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

  18. A new feedback image encryption scheme based on perturbation with dynamical compound chaotic sequence cipher generator

    Tong, Xiaojun; Cui, Minggen; Wang, Zhu

    2009-07-01

    The design of the new compound two-dimensional chaotic function is presented by exploiting two one-dimensional chaotic functions which switch randomly, and the design is used as a chaotic sequence generator which is proved by Devaney's definition proof of chaos. The properties of compound chaotic functions are also proved rigorously. In order to improve the robustness against difference cryptanalysis and produce avalanche effect, a new feedback image encryption scheme is proposed using the new compound chaos by selecting one of the two one-dimensional chaotic functions randomly and a new image pixels method of permutation and substitution is designed in detail by array row and column random controlling based on the compound chaos. The results from entropy analysis, difference analysis, statistical analysis, sequence randomness analysis, cipher sensitivity analysis depending on key and plaintext have proven that the compound chaotic sequence cipher can resist cryptanalytic, statistical and brute-force attacks, and especially it accelerates encryption speed, and achieves higher level of security. By the dynamical compound chaos and perturbation technology, the paper solves the problem of computer low precision of one-dimensional chaotic function.

  19. The Use of Next Generation Sequencing and Junction Sequence Analysis Bioinformatics to Achieve Molecular Characterization of Crops Improved Through Modern Biotechnology

    David Kovalic

    2012-11-01

    Full Text Available The assessment of genetically modified (GM crops for regulatory approval currently requires a detailed molecular characterization of the DNA sequence and integrity of the transgene locus. In addition, molecular characterization is a critical component of event selection and advancement during product development. Typically, molecular characterization has relied on Southern blot analysis to establish locus and copy number along with targeted sequencing of polymerase chain reaction products spanning any inserted DNA to complete the characterization process. Here we describe the use of next generation (NexGen sequencing and junction sequence analysis bioinformatics in a new method for achieving full molecular characterization of a GM event without the need for Southern blot analysis. In this study, we examine a typical GM soybean [ (L. Merr.] line and demonstrate that this new method provides molecular characterization equivalent to the current Southern blot-based method. We also examine an event containing in vivo DNA rearrangement of multiple transfer DNA inserts to demonstrate that the new method is effective at identifying complex cases. Next generation sequencing and bioinformatics offers certain advantages over current approaches, most notably the simplicity, efficiency, and consistency of the method, and provides a viable alternative for efficiently and robustly achieving molecular characterization of GM crops.

  20. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  1. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Liu Chang

    2012-12-01

    Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.

  2. Frame sequences analysis technique of linear objects movement

    Oshchepkova, V. Y.; Berg, I. A.; Shchepkin, D. V.; Kopylova, G. V.

    2017-12-01

    Obtaining data by noninvasive methods are often needed in many fields of science and engineering. This is achieved through video recording in various frame rate and light spectra. In doing so quantitative analysis of movement of the objects being studied becomes an important component of the research. This work discusses analysis of motion of linear objects on the two-dimensional plane. The complexity of this problem increases when the frame contains numerous objects whose images may overlap. This study uses a sequence containing 30 frames at the resolution of 62 × 62 pixels and frame rate of 2 Hz. It was required to determine the average velocity of objects motion. This velocity was found as an average velocity for 8-12 objects with the error of 15%. After processing dependencies of the average velocity vs. control parameters were found. The processing was performed in the software environment GMimPro with the subsequent approximation of the data obtained using the Hill equation.

  3. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  4. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

    Novák, Petr; Neumann, Pavel; Macas, Jiří

    2010-01-01

    Roč. 11, č. 1 (2010), s. 378-389 ISSN 1471-2105 R&D Projects: GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004 Institutional research plan: CEZ:AV0Z50510513 Keywords : repetitive DNA * plant genome * next generation sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.028, year: 2010

  5. An overview of the Phalaenopsis orchid genome through BAC end sequence analysis

    Hsiao Yu-Yun

    2011-01-01

    Full Text Available Abstract Background Phalaenopsis orchids are popular floral crops, and development of new cultivars is economically important to floricultural industries worldwide. Analysis of orchid genes could facilitate orchid improvement. Bacterial artificial chromosome (BAC end sequences (BESs can provide the first glimpses into the sequence composition of a novel genome and can yield molecular markers for use in genetic mapping and breeding. Results We used two BAC libraries (constructed using the BamHI and HindIII restriction enzymes of Phalaenopsis equestris to generate pair-end sequences from 2,920 BAC clones (71.4% and 28.6% from the BamHI and HindIII libraries, respectively, at a success rate of 95.7%. A total of 5,535 BESs were generated, representing 4.5 Mb, or about 0.3% of the Phalaenopsis genome. The trimmed sequences ranged from 123 to 1,397 base pairs (bp in size, with an average edited read length of 821 bp. When these BESs were subjected to sequence homology searches, it was found that 641 (11.6% were predicted to represent protein-encoding regions, whereas 1,272 (23.0% contained repetitive DNA. Most of the repetitive DNA sequences were gypsy- and copia-like retrotransposons (41.9% and 12.8%, respectively, whereas only 10.8% were DNA transposons. Further, 950 potential simple sequence repeats (SSRs were discovered. Dinucleotides were the most abundant repeat motifs; AT/TA dimer repeats were the most frequent SSRs, representing 253 (26.6% of all identified SSRs. Microsynteny analysis revealed that more BESs mapped to the whole-genome sequences of poplar than to those of grape or Arabidopsis, and even fewer mapped to the rice genome. This work will facilitate analysis of the Phalaenopsis genome, and will help clarify similarities and differences in genome composition between orchids and other plant species. Conclusion Using BES analysis, we obtained an overview of the Phalaenopsis genome in terms of gene abundance, the presence of repetitive

  6. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence re...... with highest similarity to DNA repair protein from Campylobacter jejuni (25% aa). Orf34 showed similarity to sigma factors with highest similarity (28% aa) to the sporulation specific Sigma factor, Sigma 28(K) from Bacillus thuringiensis....

  7. Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

    Juan Ning

    Full Text Available BACKGROUND: Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. RESULTS: Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs that provide a resource for gene function studies. CONCLUSION: Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.

  8. Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome evolution between two wheat cultivars

    Thind, Anupriya Kaur

    2018-02-08

    Background: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the evolutionary dynamics of wheat genomes on a megabase-scale. Results: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes, the old landrace Chinese Spring and the elite Swiss spring wheat line CH Campala Lr22a. There was a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations revealed four large insertions/deletions (InDels) of >100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the evolutionary mechanisms that caused these InDels. Three of the large InDels affected copy number of NLRs, a gene family involved in plant immunity. Analysis of single nucleotide polymorphism (SNP) density revealed three haploblocks of 8 Mb, 9 Mb and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Conclusions: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

  9. High-resolution analysis of the 5'-end transcriptome using a next generation DNA sequencer.

    Shin-ichi Hashimoto

    Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  10. Chimira: analysis of small RNA sequencing data and microRNA modifications.

    Vitsios, Dimitrios M; Enright, Anton J

    2015-10-15

    Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3'-modifications (e.g. uridylation, adenylation), 5'-modifications and also internal modifications or variation (ADAR editing or single nucleotide polymorphisms). Besides cleaning and mapping of input sequences to miRNAs, Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary Material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplementary Material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface. Chimira has been developed as a web application and it is accessible here: http://www.ebi.ac.uk/research/enright/software/chimira. aje@ebi.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  11. Human papilloma viruses and cervical tumours: mapping of integration sites and analysis of adjacent cellular sequences

    Klimov, Eugene; Vinokourova, Svetlana; Moisjak, Elena; Rakhmanaliev, Elian; Kobseva, Vera; Laimins, Laimonis; Kisseljov, Fjodor; Sulimova, Galina

    2002-01-01

    In cervical tumours the integration of human papilloma viruses (HPV) transcripts often results in the generation of transcripts that consist of hybrids of viral and cellular sequences. Mapping data using a variety of techniques has demonstrated that HPV integration occurred without obvious specificity into human genome. However, these techniques could not demonstrate whether integration resulted in the generation of transcripts encoding viral or viral-cellular sequences. The aim of this work was to map the integration sites of HPV DNA and to analyse the adjacent cellular sequences. Amplification of the INTs was done by the APOT technique. The APOT products were sequenced according to standard protocols. The analysis of the sequences was performed using BLASTN program and public databases. To localise the INTs PCR-based screening of GeneBridge4-RH-panel was used. Twelve cellular sequences adjacent to integrated HPV16 (INT markers) expressed in squamous cell cervical carcinomas were isolated. For 11 INT markers homologous human genomic sequences were readily identified and 9 of these showed significant homologies to known genes/ESTs. Using the known locations of homologous cDNAs and the RH-mapping techniques, mapping studies showed that the INTs are distributed among different human chromosomes for each tumour sample and are located in regions with the high levels of expression. Integration of HPV genomes occurs into the different human chromosomes but into regions that contain highly transcribed genes. One interpretation of these studies is that integration of HPV occurs into decondensed regions, which are more accessible for integration of foreign DNA

  12. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  13. Graph-based sequence annotation using a data integration approach

    Pesch Robert

    2008-06-01

    Full Text Available The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.

  14. Graph-based sequence annotation using a data integration approach.

    Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

    2008-08-25

    The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

  15. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    Wang, Bing; Zhang, Jun; Chen, Peng; Ji, Zhiwei; Deng, Shuping; Li, Chi

    2013-01-01

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  16. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    Wang, Bing

    2013-05-09

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  17. Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis

    Zhao Patrick X

    2011-07-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.

  18. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  19. Nucleotide and amino acid sequences of a coat protein of an Ukrainian isolate of Potato virus Y: comparison with homologous sequences of other isolates and phylogenetic analysis

    Budzanivska I. G.

    2014-03-01

    Full Text Available Aim. Identification of the widespread Ukrainian isolate(s of PVY (Potato virus Y in different potato cultivars and subsequent phylogenetic analysis of detected PVY isolates based on NA and AA sequences of coat protein. Methods. ELISA, RT-PCR, DNA sequencing and phylogenetic analysis. Results. PVY has been identified serologically in potato cultivars of Ukrainian selection. In this work we have optimized a method for total RNA extraction from potato samples and offered a sensitive and specific PCR-based test system of own design for diagnostics of the Ukrainian PVY isolates. Part of the CP gene of the Ukrainian PVY isolate has been sequenced and analyzed phylogenetically. It is demonstrated that the Ukrainian isolate of Potato virus Y (CP gene has a higher percentage of homology with the recombinant isolates (strains of this pathogen (approx. 98.8– 99.8 % of homology for both nucleotide and translated amino acid sequences of the CP gene. The Ukrainian isolate of PVY is positioned in the separate cluster together with the isolates found in Syria, Japan and Iran; these isolates possibly have common origin. The Ukrainian PVY isolate is confirmed to be recombinant. Conclusions. This work underlines the need and provides the means for accurate monitoring of Potato virus Y in the agroecosystems of Ukraine. Most importantly, the phylogenetic analysis demonstrated the recombinant nature of this PVY isolate which has been attributed to the strain group O, subclade N:O.

  20. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis

    Thiele Bernhard

    2011-05-01

    Full Text Available Abstract Background Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4 variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Methods Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Results Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%, and defining a minority cutoff of 5%, the results were concordant in all but one isolate. Conclusions The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  1. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis.

    Däumer, Martin; Kaiser, Rolf; Klein, Rolf; Lengauer, Thomas; Thiele, Bernhard; Thielen, Alexander

    2011-05-13

    Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%), and defining a minority cutoff of 5%, the results were concordant in all but one isolate. The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  2. [Sequence analysis of LEAFY homologous gene from Dendrobium moniliforme and application for identification of medicinal Dendrobium].

    Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu

    2013-04-01

    The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.

  3. A systematic identification of Kolobok superfamily transposons in Trichomonas vaginalis and sequence analysis on related transposases

    Qingshu Meng; Kaifu Chen; Lina Ma; Songnian Hu; Jun Yu

    2011-01-01

    Transposons are sequence elements widely distributed among genomes of all three kingdoms of life, providing genomic changes and playing significant roles in genome evolution. Trichomonas vaginalis is an excellent model system for transposon study since its genome ( ~ 160 Mb) has been sequenced and is composed of ~65% transposons and other repetitive elements. In this study, we primarily report the identification of Kolobok-type transposons (termed tvBac) in T. vaginalis and the results of transposase sequence analysis. We categorized 24 novel subfamilies of the Kolobok element, including one autonomous subfamily and 23 non-autonomous subfamilies. We also identified a novel H2CH motif in tvBac transposases based on multiple sequence alignment. In addition, we supposed that tvBac and Mutator transposons may have evolved independently from a common ancestor according to our phylogenetic analysis. Our results provide basic information for the understanding of the function and evolution of tvBac transposons in particular and other related transposon families in general.

  4. Genetic analysis of 430 Chinese Cynodon dactylon accessions using sequence-related amplified polymorphism markers.

    Huang, Chunqiong; Liu, Guodao; Bai, Changjun; Wang, Wenqiang

    2014-10-21

    Although Cynodon dactylon (C. dactylon) is widely distributed in China, information on its genetic diversity within the germplasm pool is limited. The objective of this study was to reveal the genetic variation and relationships of 430 C. dactylon accessions collected from 22 Chinese provinces using sequence-related amplified polymorphism (SRAP) markers. Fifteen primer pairs were used to amplify specific C. dactylon genomic sequences. A total of 481 SRAP fragments were generated, with fragment sizes ranging from 260-1800 base pairs (bp). Genetic similarity coefficients (GSC) among the 430 accessions averaged 0.72 and ranged from 0.53-0.96. Cluster analysis conducted by two methods, namely the unweighted pair-group method with arithmetic averages (UPGMA) and principle coordinate analysis (PCoA), separated the accessions into eight distinct groups. Our findings verify that Chinese C. dactylon germplasms have rich genetic diversity, which is an excellent basis for C. dactylon breeding for new cultivars.

  5. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. Discrimination of the Lactobacillus acidophilus group using sequencing, species-specific PCR and SNaPshot mini-sequencing technology based on the recA gene.

    Huang, Chien-Hsun; Chang, Mu-Tzu; Huang, Mu-Chiou; Wang, Li-Tin; Huang, Lina; Lee, Fwu-Ling

    2012-10-01

    To clearly identify specific species and subspecies of the Lactobacillus acidophilus group using phenotypic and genotypic (16S rDNA sequence analysis) techniques alone is difficult. The aim of this study was to use the recA gene for species discrimination in the L. acidophilus group, as well as to develop a species-specific primer and single nucleotide polymorphism primer based on the recA gene sequence for species and subspecies identification. The average sequence similarity for the recA gene among type strains was 80.0%, and most members of the L. acidophilus group could be clearly distinguished. The species-specific primer was designed according to the recA gene sequencing, which was employed for polymerase chain reaction with the template DNA of Lactobacillus strains. A single 231-bp species-specific band was found only in L. delbrueckii. A SNaPshot mini-sequencing assay using recA as a target gene was also developed. The specificity of the mini-sequencing assay was evaluated using 31 strains of L. delbrueckii species and was able to unambiguously discriminate strains belonging to the subspecies L. delbrueckii subsp. bulgaricus. The phylogenetic relationships of most strains in the L. acidophilus group can be resolved using recA gene sequencing, and a novel method to identify the species and subspecies of the L. delbrueckii and L. delbrueckii subsp. bulgaricus was developed by species-specific polymerase chain reaction combined with SNaPshot mini-sequencing. Copyright © 2012 Society of Chemical Industry.

  7. Sequencing and analysis of an Irish human genome.

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  8. Exome Sequence Analysis of 14 Families With High Myopia

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...

  9. Database-driven primary analysis of raw sequencing data

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications...

  10. Accelerating next generation sequencing data analysis with system level optimizations.

    Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid

    2017-08-22

    Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.

  11. The sequence and analysis of a Chinese pig genome

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  12. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  13. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically

  14. HIV-1 envelope sequence-based diversity measures for identifying recent infections.

    Alexis Kafando

    Full Text Available Identifying recent HIV-1 infections is crucial for monitoring HIV-1 incidence and optimizing public health prevention efforts. To identify recent HIV-1 infections, we evaluated and compared the performance of 4 sequence-based diversity measures including percent diversity, percent complexity, Shannon entropy and number of haplotypes targeting 13 genetic segments within the env gene of HIV-1. A total of 597 diagnostic samples obtained in 2013 and 2015 from recently and chronically HIV-1 infected individuals were selected. From the selected samples, 249 (134 from recent versus 115 from chronic infections env coding regions, including V1-C5 of gp120 and the gp41 ectodomain of HIV-1, were successfully amplified and sequenced by next generation sequencing (NGS using the Illumina MiSeq platform. The ability of the four sequence-based diversity measures to correctly identify recent HIV infections was evaluated using the frequency distribution curves, median and interquartile range and area under the curve (AUC of the receiver operating characteristic (ROC. Comparing the median and interquartile range and evaluating the frequency distribution curves associated with the 4 sequence-based diversity measures, we observed that the percent diversity, number of haplotypes and Shannon entropy demonstrated significant potential to discriminate recent from chronic infections (p<0.0001. Using the AUC of ROC analysis, only the Shannon entropy measure within three HIV-1 env segments could accurately identify recent infections at a satisfactory level. The env segments were gp120 C2_1 (AUC = 0.806, gp120 C2_3 (AUC = 0.805 and gp120 V3 (AUC = 0.812. Our results clearly indicate that the Shannon entropy measure represents a useful tool for predicting HIV-1 infection recency.

  15. HPV-QUEST: A highly customized system for automated HPV sequence analysis capable of processing Next Generation sequencing data set.

    Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M

    2012-01-01

    Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.

  16. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  17. Spreadsheet-based program for alignment of overlapping DNA sequences.

    Anbazhagan, R; Gabrielson, E

    1999-06-01

    Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.

  18. Hybridization and sequencing of nucleic acids using base pair mismatches

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  19. Complete genome sequence of Klebsiella pneumoniae J1, a protein-based microbial flocculant-producing bacterium.

    Pang, Changlong; Li, Ang; Cui, Di; Yang, Jixian; Ma, Fang; Guo, Haijuan

    2016-02-20

    Klebsiella pneumoniae J1 is a Gram-negative strain, which belongs to a protein-based microbial flocculant-producing bacterium. However, little genetic information is known about this species. Here we carried out a whole-genome sequence analysis of this strain and report the complete genome sequence of this organism and its genetic basis for carbohydrate metabolism, capsule biosynthesis and transport system. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  1. Association analysis of whole genome sequencing data accounting for longitudinal and family designs.

    Hu, Yijuan; Hui, Qin; Sun, Yan V

    2014-01-01

    Using the whole genome sequencing data and the simulated longitudinal phenotypes for 849 pedigree-based individuals from Genetic Analysis Workshop 18, we investigated various approaches to detecting the association of rare and common variants with blood pressure traits. We compared three strategies for longitudinal data: (a) using the baseline measurement only, (b) using the average from multiple visits, and (c) using all individual measurements. We also compared the power of using all of the pedigree-based data and the unrelated subset. The analyses were performed without knowledge of the underlying simulating model.

  2. Event Sequence Analysis of the Air Intelligence Agency Information Operations Center Flight Operations

    Larsen, Glen

    1998-01-01

    This report applies Event Sequence Analysis, methodology adapted from aircraft mishap investigation, to an investigation of the performance of the Air Intelligence Agency's Information Operations Center (IOC...

  3. Citrate synthase gene sequence: a new tool for phylogenetic analysis and identification of Ehrlichia.

    Inokuma, H; Brouqui, P; Drancourt, M; Raoult, D

    2001-09-01

    The sequence of the citrate synthase gene (gltA) of 13 ehrlichial species (Ehrlichia chaffeensis, Ehrlichia canis, Ehrlichia muris, an Ehrlichia species recently detected from Ixodes ovatus, Cowdria ruminantium, Ehrlichia phagocytophila, Ehrlichia equi, the human granulocytic ehrlichiosis [HGE] agent, Anaplasma marginale, Anaplasma centrale, Ehrlichia sennetsu, Ehrlichia risticii, and Neorickettsia helminthoeca) have been determined by degenerate PCR and the Genome Walker method. The ehrlichial gltA genes are 1,197 bp (E. sennetsu and E. risticii) to 1,254 bp (A. marginale and A. centrale) long, and GC contents of the gene vary from 30.5% (Ehrlichia sp. detected from I. ovatus) to 51.0% (A. centrale). The percent identities of the gltA nucleotide sequences among ehrlichial species were 49.7% (E. risticii versus A. centrale) to 99.8% (HGE agent versus E. equi). The percent identities of deduced amino acid sequences were 44.4% (E. sennetsu versus E. muris) to 99.5% (HGE agent versus E. equi), whereas the homology range of 16S rRNA genes was 83.5% (E. risticii versus the Ehrlichia sp. detected from I. ovatus) to 99.9% (HGE agent, E. equi, and E. phagocytophila). The architecture of the phylogenetic trees constructed by gltA nucleotide sequences or amino acid sequences was similar to that derived from the 16S rRNA gene sequences but showed more-significant bootstrap values. Based upon the alignment analysis of the ehrlichial gltA sequences, two sets of primers were designed to amplify tick-borne Ehrlichia and Neorickettsia genogroup Ehrlichia (N. helminthoeca, E. sennetsu, and E. risticii), respectively. Tick-borne Ehrlichia species were specifically identified by restriction fragment length polymorphism (RFLP) patterns of AcsI and XhoI with the exception of E. muris and the very closely related ehrlichia derived from I. ovatus for which sequence analysis of the PCR product is needed. Similarly, Neorickettsia genogroup Ehrlichia species were specifically identified by

  4. Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

    Chen, Zitian; Zhou, Wenxiong; Qiao, Shuo; Kang, Li;