WorldWideScience

Sample records for sequence comparison analysis

  1. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  2. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  3. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... and antisense primers, a single band of 573 base pairs .... Amino acid sequence alignment of Cluster I and Cluster II of phylogenetic tree. First ten sequences ... sequence weighting, postion-spiecific gap penalties and weight.

  4. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    Directory of Open Access Journals (Sweden)

    Andrea Vásquez

    2014-01-01

    Full Text Available We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava’s SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp. It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava’s genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution.

  5. Microbial analysis of bite marks by sequence comparison of streptococcal DNA.

    Directory of Open Access Journals (Sweden)

    Darnell M Kennedy

    Full Text Available Bite mark injuries often feature in violent crimes. Conventional morphometric methods for the forensic analysis of bite marks involve elements of subjective interpretation that threaten the credibility of this field. Human DNA recovered from bite marks has the highest evidentiary value, however recovery can be compromised by salivary components. This study assessed the feasibility of matching bacterial DNA sequences amplified from experimental bite marks to those obtained from the teeth responsible, with the aim of evaluating the capability of three genomic regions of streptococcal DNA to discriminate between participant samples. Bite mark and teeth swabs were collected from 16 participants. Bacterial DNA was extracted to provide the template for PCR primers specific for streptococcal 16S ribosomal RNA (16S rRNA gene, 16S-23S intergenic spacer (ITS and RNA polymerase beta subunit (rpoB. High throughput sequencing (GS FLX 454, followed by stringent quality filtering, generated reads from bite marks for comparison to those generated from teeth samples. For all three regions, the greatest overlaps of identical reads were between bite mark samples and the corresponding teeth samples. The average proportions of reads identical between bite mark and corresponding teeth samples were 0.31, 0.41 and 0.31, and for non-corresponding samples were 0.11, 0.20 and 0.016, for 16S rRNA, ITS and rpoB, respectively. The probabilities of correctly distinguishing matching and non-matching teeth samples were 0.92 for ITS, 0.99 for 16S rRNA and 1.0 for rpoB. These findings strongly support the tenet that bacterial DNA amplified from bite marks and teeth can provide corroborating information in the identification of assailants.

  6. Ribosomal DNA sequence analysis of different geographically distributed Aloe Vera plants: Comparison with clonally regenerated plants

    International Nuclear Information System (INIS)

    Yagi, A.; Sato, Y.; Miwa, Y.; Kabbash, A.; Moustafa, S.; Shimomura, K.; El-Bassuony, A.

    2006-01-01

    A comparison of the sequences in an internally transcribed spacer (ITS) 1 region of rDNA between clonally regenerated A.vera and same species in Japan, USA and Egypt revealed the presence of two types of nucleotide sequences, 252 and 254 bps. Based on the findings in the ITS 1 region, A.vera having 252 and 254 bps clearly showed a stable sequence similarity, suggesting high conversation of the base peak sequence in the ITS 1 region. However, frequent base substitutions in the 252 bps samples leaves that came from callus tissue and micropropagated plants were observed around the regions of nucleotide positions 66, 99 and 199-201. The minor deviation in clonally regenerated A.vera may be due to the stage of regeneration and cell specification in cases of the callus tissue. In the present study, the base peak sequence of the Its 1 region of rDNA was adopted as a molecular marker for differentiating A.vera plants from geographically distributed and clonally regenerated A.vera plants and it was suggested that the base peak substitutions in the ITS 1 region may arise from the different nutritional and environmental factors in cultivation and plant growth stages. (author)

  7. Analysis and comparison of fragrant gene sequence in some rice cultivars

    Directory of Open Access Journals (Sweden)

    Karami Noushafarin

    2016-01-01

    Full Text Available It is known that the fragrant trait in rice (Oryza sativa L. is largely controlled by fgr gene on chromosome 8 and it has been specified that the existence of an 8 bp deletion and three single nucleotide polymorphism (SNP in exon 7 is effective on this trait. In this study, sequence alignment analysis of fgr exon7 on chromosome 8 for 11 different fragrant and non-fragrant cultivars revealed that 5 aromatic rice cultivars carried 3 SNPs and 8 bp deletion in exon7 which terminates prematurely at a TAA stop codon. However, 5 of the non-aromatics showed a sequence identical to the published Nipponbare, being non-fragrant Japonica variety sequence. An exception among them was Bejar, which had 8 bp deletion and 3SNPs but it was non-aromatic. Sequencing can determine nucleotide alignment of a gene and give beneficial information about gene function. In silico prediction showed proteins sequences alignment of fgr gene for Khazar and Domsiah genotypes were different. Betaine aldehyde dehydrogenase complete enzyme belongs to Khazar non-fragrant genotype that has complete length and 503 amino acids while non-functional BADH2 enzyme for Domsiah fragrant genotype has 251 amino acids that result in accumulate 2-acetyl-1-pyrroline (2AP and produces aroma in fragrant genotypes.

  8. Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-01-01

    Full Text Available A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713. Genome-to-Genome Distance (GGDC showed high similarity to Pseudoalteromonas haloplanktis (X67024. The generated unique Quick Response (QR codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates using MEGA6 software. Principal Component Analysis (PCA was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification.

  9. Definition and Analysis of a System for the Automated Comparison of Curriculum Sequencing Algorithms in Adaptive Distance Learning

    Science.gov (United States)

    Limongelli, Carla; Sciarrone, Filippo; Temperini, Marco; Vaste, Giulia

    2011-01-01

    LS-Lab provides automatic support to comparison/evaluation of the Learning Object Sequences produced by different Curriculum Sequencing Algorithms. Through this framework a teacher can verify the correspondence between the behaviour of different sequencing algorithms and her pedagogical preferences. In fact the teacher can compare algorithms…

  10. Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator II

    Directory of Open Access Journals (Sweden)

    Archer John

    2012-03-01

    Full Text Available Abstract Background Next generation sequencing provides detailed insight into the variation present within viral populations, introducing the possibility of treatment strategies that are both reactive and predictive. Current software tools, however, need to be scaled up to accommodate for high-depth viral data sets, which are often temporally or spatially linked. In addition, due to the development of novel sequencing platforms and chemistries, each with implicit strengths and weaknesses, it will be helpful for researchers to be able to routinely compare and combine data sets from different platforms/chemistries. In particular, error associated with a specific sequencing process must be quantified so that true biological variation may be identified. Results Segminator II was developed to allow for the efficient comparison of data sets derived from different sources. We demonstrate its usage by comparing large data sets from 12 influenza H1N1 samples sequenced on both the 454 Life Sciences and Illumina platforms, permitting quantification of platform error. For mismatches median error rates at 0.10 and 0.12%, respectively, suggested that both platforms performed similarly. For insertions and deletions median error rates within the 454 data (at 0.3 and 0.2%, respectively were significantly higher than those within the Illumina data (0.004 and 0.006%, respectively. In agreement with previous observations these higher rates were strongly associated with homopolymeric stretches on the 454 platform. Outside of such regions both platforms had similar indel error profiles. Additionally, we apply our software to the identification of low frequency variants. Conclusion We have demonstrated, using Segminator II, that it is possible to distinguish platform specific error from biological variation using data derived from two different platforms. We have used this approach to quantify the amount of error present within the 454 and Illumina platforms in

  11. Comparison of Boolean analysis and standard phylogenetic methods using artificially evolved and natural mt-tRNA sequences from great apes.

    Science.gov (United States)

    Ari, Eszter; Ittzés, Péter; Podani, János; Thi, Quynh Chi Le; Jakó, Eena

    2012-04-01

    Boolean analysis (or BOOL-AN; Jakó et al., 2009. BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction. Mol. Phylogenet. Evol. 52, 887-97.), a recently developed method for sequence comparison uses the Iterative Canonical Form of Boolean functions. It considers sequence information in a way entirely different from standard phylogenetic methods (i.e. Maximum Parsimony, Maximum-Likelihood, Neighbor-Joining, and Bayesian analysis). The performance and reliability of Boolean analysis were tested and compared with the standard phylogenetic methods, using artificially evolved - simulated - nucleotide sequences and the 22 mitochondrial tRNA genes of the great apes. At the outset, we assumed that the phylogeny of Hominidae is generally well established, and the guide tree of artificial sequence evolution can also be used as a benchmark. These offer a possibility to compare and test the performance of different phylogenetic methods. Trees were reconstructed by each method from 2500 simulated sequences and 22 mitochondrial tRNA sequences. We also introduced a special re-sampling method for Boolean analysis on permuted sequence sites, the P-BOOL-AN procedure. Considering the reliability values (branch support values of consensus trees and Robinson-Foulds distances) we used for simulated sequence trees produced by different phylogenetic methods, BOOL-AN appeared as the most reliable method. Although the mitochondrial tRNA sequences of great apes are relatively short (59-75 bases long) and the ratio of their constant characters is about 75%, BOOL-AN, P-BOOL-AN and the Bayesian approach produced the same tree-topology as the established phylogeny, while the outcomes of Maximum Parsimony, Maximum-Likelihood and Neighbor-Joining methods were equivocal. We conclude that Boolean analysis is a promising alternative to existing methods of sequence comparison for phylogenetic reconstruction and congruence analysis. Copyright © 2012 Elsevier Inc. All

  12. In silico Coding Sequence Analysis of Walnut GAI and PIP2 Genes and Comparison with Different Plant Species

    Directory of Open Access Journals (Sweden)

    Mahdi Mohseniazar

    2017-02-01

    done with MEGA from aligned sequences. The motifs of protein sequences were found using the program of T-COFEE at website (http://www.ebi.ac.uk/Tools/msa/tcoffee/. The Neighbor-Joining (NJ method was used to designing the phylogenetic tree. The predicted exons and introns in mRNA sequences were done by http://genes.mit.edu/GENSCAN.html website. The secondary structure of proteins was predicted by PSIORED online on http://bioinf.cs.ucl.ac.uk/psipred/. Prediction of 3D model of protein was performed using the 3D alignment of protein structure by BLASTp and PDB database as source. Also, targeting prediction of proteins was done online by TargetP at (http://www.cbs.dtu.dk/services/TargetP/ website. Results and discussion: In phylogenetic investigation among 17 different species, Walnut species evolutionary stand in dicotyledonous and woody plants by both of GAI and PIP2 genes and protein sequence clustering. By multiple alignments and investigation in conserved sequence of these genes in plant revealed that despite differences in cDNA length, there were very similarities in conserved region, secondary and tertiary structure. Protein analysis in the GAI gene family showed that the following domains including DELLA, TVHYNP, VHIID, RKVATYFGEALARR, AVNSVFELH, RVER, and SAW were conserved in this proteins. In secondary structure of protein, β-sheets and α-helixes specified by PSIPRED software for both of GAI and PIP2 proteins. GAI protein had 9 β-sheets and 15 α-helixes in its structure, also PIP2 protein had2 β-sheet (at 180-188 and 248-253 and 8 α-helixes. In comparison of 3D structure, walnut PIP2 protein was very similar to chain A of PIP2 protein of spinach (Spinacia oleracea and GAI protein of walnut was similar to B-subunit of Arabidopsis GAI protein with 48% similarity. The length of GAI protein was varied from 636 aa in Malus baccata var. xiaojinensis to 336 aa in Physcomitrella patens among species. In walnut, the length of GAI and PIP2 protein was 613 aa and

  13. In silico Coding Sequence Analysis of Walnut GAI and PIP2 Genes and Comparison with Different Plant Species

    Directory of Open Access Journals (Sweden)

    Mahdi Mohseniazar

    2017-09-01

    done with MEGA from aligned sequences. The motifs of protein sequences were found using the program of T-COFEE at website (http://www.ebi.ac.uk/Tools/msa/tcoffee/. The Neighbor-Joining (NJ method was used to designing the phylogenetic tree. The predicted exons and introns in mRNA sequences were done by http://genes.mit.edu/GENSCAN.html website. The secondary structure of proteins was predicted by PSIORED online on http://bioinf.cs.ucl.ac.uk/psipred/. Prediction of 3D model of protein was performed using the 3D alignment of protein structure by BLASTp and PDB database as source. Also, targeting prediction of proteins was done online by TargetP at (http://www.cbs.dtu.dk/services/TargetP/ website. Results and discussion: In phylogenetic investigation among 17 different species, Walnut species evolutionary stand in dicotyledonous and woody plants by both of GAI and PIP2 genes and protein sequence clustering. By multiple alignments and investigation in conserved sequence of these genes in plant revealed that despite differences in cDNA length, there were very similarities in conserved region, secondary and tertiary structure. Protein analysis in the GAI gene family showed that the following domains including DELLA, TVHYNP, VHIID, RKVATYFGEALARR, AVNSVFELH, RVER, and SAW were conserved in this proteins. In secondary structure of protein, β-sheets and α-helixes specified by PSIPRED software for both of GAI and PIP2 proteins. GAI protein had 9 β-sheets and 15 α-helixes in its structure, also PIP2 protein had2 β-sheet (at 180-188 and 248-253 and 8 α-helixes. In comparison of 3D structure, walnut PIP2 protein was very similar to chain A of PIP2 protein of spinach (Spinacia oleracea and GAI protein of walnut was similar to B-subunit of Arabidopsis GAI protein with 48% similarity. The length of GAI protein was varied from 636 aa in Malus baccata var. xiaojinensis to 336 aa in Physcomitrella patens among species. In walnut, the length of GAI and PIP2 protein was 613 aa and

  14. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  15. Comparison of base composition analysis and Sanger sequencing of mitochondrial DNA for four U.S. population groups.

    Science.gov (United States)

    Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M

    2014-01-01

    A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays. Published by Elsevier Ireland Ltd.

  16. Nucleotide and amino acid sequences of a coat protein of an Ukrainian isolate of Potato virus Y: comparison with homologous sequences of other isolates and phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Budzanivska I. G.

    2014-03-01

    Full Text Available Aim. Identification of the widespread Ukrainian isolate(s of PVY (Potato virus Y in different potato cultivars and subsequent phylogenetic analysis of detected PVY isolates based on NA and AA sequences of coat protein. Methods. ELISA, RT-PCR, DNA sequencing and phylogenetic analysis. Results. PVY has been identified serologically in potato cultivars of Ukrainian selection. In this work we have optimized a method for total RNA extraction from potato samples and offered a sensitive and specific PCR-based test system of own design for diagnostics of the Ukrainian PVY isolates. Part of the CP gene of the Ukrainian PVY isolate has been sequenced and analyzed phylogenetically. It is demonstrated that the Ukrainian isolate of Potato virus Y (CP gene has a higher percentage of homology with the recombinant isolates (strains of this pathogen (approx. 98.8– 99.8 % of homology for both nucleotide and translated amino acid sequences of the CP gene. The Ukrainian isolate of PVY is positioned in the separate cluster together with the isolates found in Syria, Japan and Iran; these isolates possibly have common origin. The Ukrainian PVY isolate is confirmed to be recombinant. Conclusions. This work underlines the need and provides the means for accurate monitoring of Potato virus Y in the agroecosystems of Ukraine. Most importantly, the phylogenetic analysis demonstrated the recombinant nature of this PVY isolate which has been attributed to the strain group O, subclade N:O.

  17. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  18. De-standardization of family-life trajectories of young adults: a cross-national comparison using sequence analysis

    NARCIS (Netherlands)

    Elzinga, C.; Liefbroer, A.C.

    2007-01-01

    We introduce a number of new methods based on sequence analysis to test hypotheses on the de-standardization of family-life trajectories in early adulthood, using Fertility and Family Survey data on 19 countries. Across cohorts, family-life trajectories of young adults have not become more

  19. Method and apparatus for biological sequence comparison

    Science.gov (United States)

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  20. Sequence Comparison: Close and Open problems

    NARCIS (Netherlands)

    Lenzini, Gabriele; Cerrai, P.; Freguglia, P.

    Comparing sequences is a very important activity both in computer science and in a many other areas as well. For example thank to text editors, everyone knows the particular instance of a sequence comparison problem knonw as ``string mathcing problem''. It consists in searching a given work

  1. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  2. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  3. Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Guanghong Zuo

    2015-03-01

    Full Text Available A tripartite comparison of Archaea phylogeny and taxonomy at and above the rank order is reported: (1 the whole-genome-based and alignment-free CVTree using 179 genomes; (2 the 16S rRNA analysis exemplified by the All-Species Living Tree with 366 archaeal sequences; and (3 the Second Edition of Bergey’s Manual of Systematic Bacteriology complemented by some current literature. A high degree of agreement is reached at these ranks. From the newly proposed archaeal phyla, Korarchaeota, Thaumarchaeota, Nanoarchaeota and Aigarchaeota, to the recent suggestion to divide the class Halobacteria into three orders, all gain substantial support from CVTree. In addition, the CVTree helped to determine the taxonomic position of some newly sequenced genomes without proper lineage information. A few discrepancies between the CVTree and the 16S rRNA approaches call for further investigation.

  4. Segmentation of Gait Sequences in Sensor-Based Movement Analysis: A Comparison of Methods in Parkinson’s Disease

    Directory of Open Access Journals (Sweden)

    Nooshin Haji Ghassemi

    2018-01-01

    Full Text Available Robust gait segmentation is the basis for mobile gait analysis. A range of methods have been applied and evaluated for gait segmentation of healthy and pathological gait bouts. However, a unified evaluation of gait segmentation methods in Parkinson’s disease (PD is missing. In this paper, we compare four prevalent gait segmentation methods in order to reveal their strengths and drawbacks in gait processing. We considered peak detection from event-based methods, two variations of dynamic time warping from template matching methods, and hierarchical hidden Markov models (hHMMs from machine learning methods. To evaluate the methods, we included two supervised and instrumented gait tests that are widely used in the examination of Parkinsonian gait. In the first experiment, a sequence of strides from instructed straight walks was measured from 10 PD patients. In the second experiment, a more heterogeneous assessment paradigm was used from an additional 34 PD patients, including straight walks and turning strides as well as non-stride movements. The goal of the latter experiment was to evaluate the methods in challenging situations including turning strides and non-stride movements. Results showed no significant difference between the methods for the first scenario, in which all methods achieved an almost 100% accuracy in terms of F-score. Hence, we concluded that in the case of a predefined and homogeneous sequence of strides, all methods can be applied equally. However, in the second experiment the difference between methods became evident, with the hHMM obtaining a 96% F-score and significantly outperforming the other methods. The hHMM also proved promising in distinguishing between strides and non-stride movements, which is critical for clinical gait analysis. Our results indicate that both the instrumented test procedure and the required stride segmentation algorithm have to be selected adequately in order to support and complement classical

  5. Dynamic programming algorithms for biological sequence comparison.

    Science.gov (United States)

    Pearson, W R; Miller, W

    1992-01-01

    Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.

  6. Comparison of PCR-RFLP pattern with sequencing analysis of the ITS region of Hyrcanain\\'s Tilia

    Directory of Open Access Journals (Sweden)

    Hamed Yousefzadeh

    2014-01-01

    T. hyrcana and T. rubra from Hyrcanian's origin, but it could not separate T. begonifloia from the other hyrcanian species. In this respect, derived results were similar to sequencing one. In conclusion, with regard to less expensive and less time consuming PCR-RFLP technique and high similarity between its result with sequencing, we recommend this method as a simple and economical method with relatively high efficiency studding plant phylogeny.

  7. Genome Analysis of Listeria monocytogenes Sequence Type 8 Strains Persisting in Salmon and Poultry Processing Environments and Comparison with Related Strains

    Science.gov (United States)

    Fagerlund, Annette; Langsrud, Solveig; Schirmer, Bjørn C. T.; Møretrø, Trond; Heir, Even

    2016-01-01

    Listeria monocytogenes is an important foodborne pathogen responsible for the disease listeriosis, and can be found throughout the environment, in many foods and in food processing facilities. The main cause of listeriosis is consumption of food contaminated from sources in food processing environments. Persistence in food processing facilities has previously been shown for the L. monocytogenes sequence type (ST) 8 subtype. In the current study, five ST8 strains were subjected to whole-genome sequencing and compared with five additionally available ST8 genomes, allowing comparison of strains from salmon, poultry and cheese industry, in addition to a human clinical isolate. Genome-wide analysis of single-nucleotide polymorphisms (SNPs) confirmed that almost identical strains were detected in a Danish salmon processing plant in 1996 and in a Norwegian salmon processing plant in 2001 and 2011. Furthermore, we show that L. monocytogenes ST8 was likely to have been transferred between two poultry processing plants as a result of relocation of processing equipment. The SNP data were used to infer the phylogeny of the ST8 strains, separating them into two main genetic groups. Within each group, the plasmid and prophage content was almost entirely conserved, but between groups, these sequences showed strong divergence. The accessory genome of the ST8 strains harbored genetic elements which could be involved in rendering the ST8 strains resilient to incoming mobile genetic elements. These included two restriction-modification loci, one of which was predicted to show phase variable recognition sequence specificity through site-specific domain shuffling. Analysis indicated that the ST8 strains harbor all important known L. monocytogenes virulence factors, and ST8 strains are commonly identified as the causative agents of invasive listeriosis. Therefore, the persistence of this L. monocytogenes subtype in food processing facilities poses a significant concern for food safety

  8. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  9. Comparison of 16S ribosomal RNA gene sequence analysis and conventional culture in the environmental survey of a hospital

    OpenAIRE

    Manaka, Akihiro; Tokue, Yutaka; Murakami, Masami

    2017-01-01

    Background Nosocomial infection is one of the most common complications within health care facilities. Certain studies have reported outbreaks resulting from contaminated hospital environments. Although the identification of bacteria in the environment can readily be achieved using culturing methods, these methods detect live bacteria. Sequencing of the 16S ribosomal RNA (16S rRNA) gene is recognized to be effective for bacterial identification. In this study, we surveyed wards where drug-res...

  10. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  11. Protein sequence comparison and protein evolution

    Energy Technology Data Exchange (ETDEWEB)

    Pearson, W.R. [Univ. of Virginia, Charlottesville, VA (United States). Dept. of Biochemistry

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  12. Direct, rapid RNA sequence analysis

    International Nuclear Information System (INIS)

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  13. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    Science.gov (United States)

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  14. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  15. Identification of Clinical Coryneform Bacterial Isolates: Comparison of Biochemical Methods and Sequence Analysis of 16S rRNA and rpoB Genes▿

    Science.gov (United States)

    Adderson, Elisabeth E.; Boudreaux, Jan W.; Cummings, Jessica R.; Pounds, Stanley; Wilson, Deborah A.; Procop, Gary W.; Hayden, Randall T.

    2008-01-01

    We compared the relative levels of effectiveness of three commercial identification kits and three nucleic acid amplification tests for the identification of coryneform bacteria by testing 50 diverse isolates, including 12 well-characterized control strains and 38 organisms obtained from pediatric oncology patients at our institution. Between 33.3 and 75.0% of control strains were correctly identified to the species level by phenotypic systems or nucleic acid amplification assays. The most sensitive tests were the API Coryne system and amplification and sequencing of the 16S rRNA gene using primers optimized for coryneform bacteria, which correctly identified 9 of 12 control isolates to the species level, and all strains with a high-confidence call were correctly identified. Organisms not correctly identified were species not included in the test kit databases or not producing a pattern of reactions included in kit databases or which could not be differentiated among several genospecies based on reaction patterns. Nucleic acid amplification assays had limited abilities to identify some bacteria to the species level, and comparison of sequence homologies was complicated by the inclusion of allele sequences obtained from uncultivated and uncharacterized strains in databases. The utility of rpoB genotyping was limited by the small number of representative gene sequences that are currently available for comparison. The correlation between identifications produced by different classification systems was poor, particularly for clinical isolates. PMID:18160450

  16. Comparison between two.magnetic resonance sequences (spin-echo and gradient-echo) in the analysis of lesions of the knee joint meniscus

    International Nuclear Information System (INIS)

    Marti-Bonati, L.; Casillas, C.

    1999-01-01

    To compare the diagnostic reliability, the proportion of common diagnoses and the degree of agreement between the results of two magnetic resonance (MR) sequences in the diagnosis of lesions of the meniscus of the knee. One hundred consecutive patients were studied prospectively by MR (1,5 Teslas). All of them underwent T1-weighted spin-echo and T1 and T2-weighted gradient-echo sequences. The final diagnosis was based on the combined results of four imaging sequences. The sensitivity, specificity, positive predictive value (PPV) and negative predictive (NPV) in terms of the final diagnosis were calculated for each meniscus and MR technique. The chi.squared test and kappa test were employed for the statistical analysis. There were discrepancies between the final diagnosis and the spin-echo sequence in 4 cases and between the final diagnosis and the gradient-echo sequences in 5 Both spin-echo and gradient-echo sequences showed the same diagnostic reliabilities: sensitivity of 0.98, specificity of 0.99, PPV of 0.98 and NPV of 0.99. The correlation between the two sequences was highly significant (chi-squared, p < 0.001) with a very high rate of agreement (kappa=0.84). The two sequences can be considered equally reliable in the study of meniscal lesions. (Author) 7 refs

  17. Comparison of DNA Quantification Methods for Next Generation Sequencing.

    Science.gov (United States)

    Robin, Jérôme D; Ludlow, Andrew T; LaRanger, Ryan; Wright, Woodring E; Shay, Jerry W

    2016-04-06

    Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality.

  18. A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy.

    Science.gov (United States)

    Wickland, Daniel P; Battu, Gopal; Hudson, Karen A; Diers, Brian W; Hudson, Matthew E

    2017-12-28

    Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain

  19. Comparison of 61 Sequenced Escherichia coli Genomes

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana; Wassenaar, T. M.; Ussery, David

    2010-01-01

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics...

  20. Molecular Cloning and Sequence Analysis of the Sta58 Major Antigen Gene of Rickettsia tsutsugamushi: Sequence homology and Antigenic Comparison of Sta58 to the 60-Kilodalton Family of Stress Proteins

    Science.gov (United States)

    1990-05-01

    encoding the animals have shown that both cellular and humoral immune Sta58 protein antigen in E. coli. DNA sequence analysis of a responses occur after...infection, with the cellular immune 2.9-kilobase (kb) HindIl fragment carrying the Sta58 gene response being required for protection (16, 19, 25, 42...The first evidence of a 60-kDa common HtpB antigen) reacted strongly with protein antigens in the antigen family (Hsp6O) among procaryotes was based

  1. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  2. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  3. Clinical evaluation of further-developed MRCP sequences in comparison with standard MRCP sequences

    International Nuclear Information System (INIS)

    Hundt, W.; Scheidler, J.; Reiser, M.; Petsch, R.

    2002-01-01

    The purpose of this study was the comparison of technically improved single-shot magnetic resonance cholangiopancreatography (MRCP) sequences with standard single-shot rapid acquisition with relaxation enhancement (RARE) and half-Fourier acquired single-shot turbo spin-echo (HASTE) sequences in evaluating the normal and abnormal biliary duct system. The bile duct system of 45 patients was prospectively investigated on a 1.5-T MRI system. The investigation was performed with RARE and HASTE MR cholangiography sequences with standard and high spatial resolutions, and with a delayed-echo half-Fourier RARE (HASTE) sequence. Findings of the improved MRCP sequences were compared with the standard MRCP sequences. The level of confidence in assessing the diagnosis was divided into five groups. The Wilcoxon signed-rank test at a level of p<0.05 was applied. In 15 patients no pathology was found. The MRCP showed stenoses of the bile duct system in 10 patients and choledocholithiasis and cholecystolithiasis in 16 patients. In 12 patients a dilatation of the bile duct system was found. Comparison of the low- and high spatial resolution sequences and the short and long TE times of the half-Fourier RARE (HASTE) sequence revealed no statistically significant differences regarding accuracy of the examination. The diagnostic confidence level in assessing normal or pathological findings for the high-resolution RARE and half-Fourier RARE (HASTE) was significantly better than for the standard sequences. For the delayed-echo half-Fourier RARE (HASTE) sequence no statistically significant difference was seen. The high-resolution RARE and half-Fourier RARE (HASTE) sequences had a higher confidence level, but there was no significant difference in diagnosis in terms of detection and assessment of pathological changes in the biliary duct system compared with standard sequences. (orig.)

  4. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  5. Comparison of two Next Generation sequencing platforms for full genome sequencing of Classical Swine Fever Virus

    DEFF Research Database (Denmark)

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Höper, Dirk

    2013-01-01

    to the consensus sequence. Additionally, we got an average sequence depth for the genome of 4000 for the Iontorrent PGM and 400 for the FLX platform making the mapping suitable for single nucleotide variant (SNV) detection. The analysis revealed a single non-silent SNV A10665G leading to the amino acid change D......Next Generation Sequencing (NGS) is becoming more adopted into viral research and will be the preferred technology in the years to come. We have recently sequenced several strains of Classical Swine Fever Virus (CSFV) by NGS on both Genome Sequencer FLX (GS FLX) and Iontorrent PGM platforms...

  6. Comparison of the accuracy of two conventional phenotypic methods and two MALDI-TOF MS systems with that of DNA sequencing analysis for correctly identifying clinically encountered yeasts.

    Science.gov (United States)

    Chao, Qiao-Ting; Lee, Tai-Fen; Teng, Shih-Hua; Peng, Li-Yun; Chen, Ping-Hung; Teng, Lee-Jene; Hsueh, Po-Ren

    2014-01-01

    We assessed the accuracy of species-level identification of two commercially available matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) systems (Bruker Biotyper and Vitek MS) and two conventional phenotypic methods (Phoenix 100 YBC and Vitek 2 Yeast ID) with that of rDNA gene sequencing analysis among 200 clinical isolates of commonly encountered yeasts. The correct identification rates of the 200 yeast isolates to species or complex (Candida parapsilosis complex, C. guilliermondii complex and C. rugosa complex) levels by the Bruker Biotyper, Vitek MS (using in vitro devices [IVD] database), Phoenix 100 YBC and Vitek 2 Yeast ID (Sabouraud's dextrose agar) systems were 92.5%, 79.5%, 89%, and 74%, respectively. An additional 72 isolates of C. parapsilosis complex and 18 from the above 200 isolates (30 in each of C. parapsilosis, C. metapsilosis, and C. orthopsilosis) were also evaluated separately. Bruker Biotyper system could accurately identify all C. parapsilosis complex to species level. Using Vitek 2 MS (IVD) system, all C. parapsilosis but none of C. metapsilosis, or C. orthopsilosis could be accurately identified. Among the 89 yeasts misidentified by the Vitek 2 MS (IVD) system, 39 (43.8%), including 27 C. orthopsilosis isolates, could be correctly identified Using the Vitek MS Plus SARAMIS database for research use only. This resulted in an increase in the rate of correct identification of all yeast isolates (87.5%) by Vitek 2 MS. The two species in C. guilliermondii complex (C. guilliermondii and C. fermentati) isolates were correctly identified by cluster analysis of spectra generated by the Bruker Biotyper system. Based on the results obtained in the current study, MALDI-TOF MS systems present a promising alternative for the routine identification of yeast species, including clinically commonly and rarely encountered yeast species and several species belonging to C. parapsilosis complex, C. guilliermondii complex

  7. Comparison of the accuracy of two conventional phenotypic methods and two MALDI-TOF MS systems with that of DNA sequencing analysis for correctly identifying clinically encountered yeasts.

    Directory of Open Access Journals (Sweden)

    Qiao-Ting Chao

    Full Text Available We assessed the accuracy of species-level identification of two commercially available matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS systems (Bruker Biotyper and Vitek MS and two conventional phenotypic methods (Phoenix 100 YBC and Vitek 2 Yeast ID with that of rDNA gene sequencing analysis among 200 clinical isolates of commonly encountered yeasts. The correct identification rates of the 200 yeast isolates to species or complex (Candida parapsilosis complex, C. guilliermondii complex and C. rugosa complex levels by the Bruker Biotyper, Vitek MS (using in vitro devices [IVD] database, Phoenix 100 YBC and Vitek 2 Yeast ID (Sabouraud's dextrose agar systems were 92.5%, 79.5%, 89%, and 74%, respectively. An additional 72 isolates of C. parapsilosis complex and 18 from the above 200 isolates (30 in each of C. parapsilosis, C. metapsilosis, and C. orthopsilosis were also evaluated separately. Bruker Biotyper system could accurately identify all C. parapsilosis complex to species level. Using Vitek 2 MS (IVD system, all C. parapsilosis but none of C. metapsilosis, or C. orthopsilosis could be accurately identified. Among the 89 yeasts misidentified by the Vitek 2 MS (IVD system, 39 (43.8%, including 27 C. orthopsilosis isolates, could be correctly identified Using the Vitek MS Plus SARAMIS database for research use only. This resulted in an increase in the rate of correct identification of all yeast isolates (87.5% by Vitek 2 MS. The two species in C. guilliermondii complex (C. guilliermondii and C. fermentati isolates were correctly identified by cluster analysis of spectra generated by the Bruker Biotyper system. Based on the results obtained in the current study, MALDI-TOF MS systems present a promising alternative for the routine identification of yeast species, including clinically commonly and rarely encountered yeast species and several species belonging to C. parapsilosis complex, C. guilliermondii

  8. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  9. Close Sequence Comparisons are Sufficient to Identify Humancis-Regulatory Elements

    Energy Technology Data Exchange (ETDEWEB)

    Prabhakar, Shyam; Poulin, Francis; Shoukry, Malak; Afzal, Veena; Rubin, Edward M.; Couronne, Olivier; Pennacchio, Len A.

    2005-12-01

    Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons, due to the lack of a universal metric for sequence conservation, and also the paucity of empirically defined benchmark sets of cis-regulatory elements. To address this problem, we developed a general-purpose algorithm (Gumby) that detects slowly-evolving regions in primate, mammalian and more distant comparisons without requiring adjustment of parameters, and ranks conserved elements by P-value using Karlin-Altschul statistics. We benchmarked Gumby predictions against previously identified cis-regulatory elements at diverse genomic loci, and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using reporter-gene assays in transgenic mice. Human regulatory elements were identified with acceptable sensitivity and specificity by comparison with 1-5 other eutherian mammals or 6 other simian primates. More distant comparisons (marsupial, avian, amphibian and fish) failed to identify many of the empirically defined functional noncoding elements. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole genome comparative analysis, which explains some of these findings. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for testing at embryonic time points.

  10. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  11. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  12. An efficient binomial model-based measure for sequence comparison and its application.

    Science.gov (United States)

    Liu, Xiaoqing; Dai, Qi; Li, Lihua; He, Zerong

    2011-04-01

    Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.

  13. Probabilistic accident sequence recovery analysis

    International Nuclear Information System (INIS)

    Stutzke, Martin A.; Cooper, Susan E.

    2004-01-01

    Recovery analysis is a method that considers alternative strategies for preventing accidents in nuclear power plants during probabilistic risk assessment (PRA). Consideration of possible recovery actions in PRAs has been controversial, and there seems to be a widely held belief among PRA practitioners, utility staff, plant operators, and regulators that the results of recovery analysis should be skeptically viewed. This paper provides a framework for discussing recovery strategies, thus lending credibility to the process and enhancing regulatory acceptance of PRA results and conclusions. (author)

  14. Nucleotide sequence analysis of HTLV-I isolated from cerebrospinal fluid of a patient with TSP/HAM: comparison to other HTLV-I isolates.

    Science.gov (United States)

    Mukhopadhyaya, R; Sadaie, M R

    1993-02-01

    Human T-cell leukemia virus type I (HTLV-I) has been associated with adult T-cell leukemia/lymphoma and the chronic neurologic disorder tropical spastic paraparesis/HTLV-I-associated myelopathy (TSP/HAM). To study the genetic structure of the virus associated with TSP/HAM, we have obtained and sequenced a partial genomic clone from an HTLV-I-positive cell line established from cerebrospinal fluid (CSF) of a Jamaican patient with TSP/HAM. This clone consisted of a 4.3-kb viral sequence containing the 5' long terminal repeat (LTR), gag, and N-terminal portion of the pol gene, with an overall 1.3% sequence variation resulting from mostly nucleotide substitutions, as compared to the prototype HTLV-I ATK-1. The gag and pol regions showed only 1.4% and 1.2% nucleotide variations, respectively. However, the U3 region of the LTR showed the highest sequence variation (3.6%), where several changes appear to be common among certain TSP/HAM isolates. Several of these changes reside within the 21-bp boundaries and the Tax-responsive element. It would be important to determine if the observed changes are sufficient to cause neurologic disorders similar to the murine leukemia virus system or simply reflect the divergent pool of HTLV-I from different geographic locations. At this time, we cannot rule out the possibility that the observed changes have either direct or indirect significance for the HTLV-I pathogenesis in TSP/HAM.

  15. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  16. Detection of Weakly Conserved Ancestral Mammalian RegulatorySequences by Primate Comparisons

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Qian-fei; Prabhakar, Shyam; Chanan, Sumita; Cheng,Jan-Fang; Rubin, Edward M.; Boffelli, Dario

    2006-06-01

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detectcryptic functional elements, which are too weakly conserved among mammalsto distinguish from nonfunctional DNA. To address this problem, weexplored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  17. Preliminary hazard analysis using sequence tree method

    International Nuclear Information System (INIS)

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  18. Characterization of Pasteurella multocida associated with ovine pneumonia using multi-locus sequence typing (MLST) and virulence-associated gene profile analysis and comparison with porcine isolates.

    Science.gov (United States)

    García-Alvarez, Andrés; Vela, Ana Isabel; San Martín, Elvira; Chaves, Fernando; Fernández-Garayzábal, José Francisco; Lucas, Domínguez; Cid, Dolores

    2017-05-01

    Pasteurella multocida is a pathogen causing disease in a wide range of hosts including sheep and pigs. Isolates from ovine pneumonia were characterized by MLST (Multi-host and RIRDC databases) and virulence-associated gene (VAG) typing and compared with porcine isolates. Ovine and porcine isolates did not share any STs as determined by both schemes and exhibited different VAG profiles. With the Multi-host database, sixteen STs were identified among 43 sheep isolates with two STs (ST50 and ST19) comprising 53.5% of the isolates, and seven MLST genotypes (ST3, ST11 and ST62 included 75% of the isolates) among the 48 pig isolates. The most frequent VAG profile among sheep isolates was tbpA+/toxA+ (69.8% of isolates) and pfhA+ (62.5%) and hgbB+ (33.3%) among pig isolates. Representative ovine and porcine isolates of those STs identified by the Multi-host scheme were further typed using the RIRDC scheme. Seven STs were identified among the ovine isolates (ST95 RIRDC , ST131 RIRDC , ST203 RIRDC , ST320 RIRDC , ST324 RIRDC , ST321 RIRDC , and ST323 RIRDC ), with the latter four sequence types being new STs identified in this study, and six STs (ST9 RIRDC , ST13 RIRDC , ST27 RIRDC , ST50 RIRDC , and ST74 RIRDC and a new sequence type ST322 RIRDC ) among the porcine isolates. STs identified among ovine isolates have been detected exclusively in small ruminants, suggesting an adaptation to these hosts, while the genotypes identified among pig isolates have been previously identified in multiple hosts and therefore they are not restricted to pigs. The differences in genotypes and VAG profiles between ovine and pig isolates suggest they could represent different subpopulations of P. multocida. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Directory of Open Access Journals (Sweden)

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  20. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  1. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  2. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  3. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    KAUST Repository

    Brown, Judith K.

    2015-04-18

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  4. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    KAUST Repository

    Brown, Judith K.; Zerbini, F. Murilo; Navas-Castillo, Jesú s; Moriones, Enrique; Ramos-Sobrinho, Roberto; Silva, José C. F.; Fiallo-Olivé , Elvira; Briddon, Rob W.; Herná ndez-Zepeda, Cecilia; Idris, Ali; Malathi, V. G.; Martin, Darren P.; Rivera-Bustamante, Rafael; Ueda, Shigenori; Varsani, Arvind

    2015-01-01

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  5. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  6. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  7. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Science.gov (United States)

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  8. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. FAST: FAST Analysis of Sequences Toolbox

    Directory of Open Access Journals (Sweden)

    Travis J. Lawrence

    2015-05-01

    Full Text Available FAST (FAST Analysis of Sequences Toolbox provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU’s Not Unix Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics makes FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format. Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.

  10. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  11. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  12. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  13. Gene discovery and transcript analyses in the corn smut pathogen Ustilago maydis: expressed sequence tag and genome sequence comparison

    Directory of Open Access Journals (Sweden)

    Saville Barry J

    2007-09-01

    Full Text Available Abstract Background Ustilago maydis is the basidiomycete fungus responsible for common smut of corn and is a model organism for the study of fungal phytopathogenesis. To aid in the annotation of the genome sequence of this organism, several expressed sequence tag (EST libraries were generated from a variety of U. maydis cell types. In addition to utility in the context of gene identification and structure annotation, the ESTs were analyzed to identify differentially abundant transcripts and to detect evidence of alternative splicing and anti-sense transcription. Results Four cDNA libraries were constructed using RNA isolated from U. maydis diploid teliospores (U. maydis strains 518 × 521 and haploid cells of strain 521 grown under nutrient rich, carbon starved, and nitrogen starved conditions. Using the genome sequence as a scaffold, the 15,901 ESTs were assembled into 6,101 contiguous expressed sequences (contigs; among these, 5,482 corresponded to predicted genes in the MUMDB (MIPS Ustilago maydis database, while 619 aligned to regions of the genome not yet designated as genes in MUMDB. A comparison of EST abundance identified numerous genes that may be regulated in a cell type or starvation-specific manner. The transcriptional response to nitrogen starvation was assessed using RT-qPCR. The results of this suggest that there may be cross-talk between the nitrogen and carbon signalling pathways in U. maydis. Bioinformatic analysis identified numerous examples of alternative splicing and anti-sense transcription. While intron retention was the predominant form of alternative splicing in U. maydis, other varieties were also evident (e.g. exon skipping. Selected instances of both alternative splicing and anti-sense transcription were independently confirmed using RT-PCR. Conclusion Through this work: 1 substantial sequence information has been provided for U. maydis genome annotation; 2 new genes were identified through the discovery of 619

  14. A Comparison of the First Two Sequenced Chloroplast Genomes in Asteraceae: Lettuce and Sunflower

    Energy Technology Data Exchange (ETDEWEB)

    Timme, Ruth E.; Kuehl, Jennifer V.; Boore, Jeffrey L.; Jansen, Robert K.

    2006-01-20

    Asteraceae is the second largest family of plants, with over 20,000 species. For the past few decades, numerous phylogenetic studies have contributed to our understanding of the evolutionary relationships within this family, including comparisons of the fast evolving chloroplast gene, ndhF, rbcL, as well as non-coding DNA from the trnL intron plus the trnLtrnF intergenic spacer, matK, and, with lesser resolution, psbA-trnH. This culminated in a study by Panero and Funk in 2002 that used over 13,000 bp per taxon for the largest taxonomic revision of Asteraceae in over a hundred years. Still, some uncertainties remain, and it would be very useful to have more information on the relative rates of sequence evolution among various genes and on genome structure as a potential set of phylogenetic characters to help guide future phylogenetic structures. By way of contributing to this, we report the first two complete chloroplast genome sequences from members of the Asteraceae, those of Helianthus annuus and Lactuca sativa. These plants belong to two distantly related subfamilies, Asteroideae and Cichorioideae, respectively. In addition to these, there is only one other published chloroplast genome sequence for any plant within the larger group called Eusterids II, that of Panax ginseng (Araliaceae, 156,318 bps, AY582139). Early chloroplast genome mapping studies demonstrated that H. annuus and L. sativa share a 22 kb inversion relative to members of the subfamily Barnadesioideae. By comparison to outgroups, this inversion was shown to be derived, indicating that the Asteroideae and Cichorioideae are more closely related than either is to the Barnadesioideae. Later sequencing study found that taxa that share this 22 kb inversion also contain within this region a second, smaller, 3.3 kb inversion. These sequences also enable an analysis of patterns of shared repeats in the genomes at fine level and of RNA editing by comparison to available EST sequences. In addition, since

  15. Comparison of next generation sequencing technologies for transcriptome characterization

    Directory of Open Access Journals (Sweden)

    Soltis Douglas E

    2009-08-01

    Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance

  16. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  17. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  18. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  19. Statistical analysis of next generation sequencing data

    CERN Document Server

    Nettleton, Dan

    2014-01-01

    Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...

  20. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  1. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    Science.gov (United States)

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  2. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  3. Image sequence analysis workstation for multipoint motion analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  4. Comparison exercise of probabilistic precursor analysis

    International Nuclear Information System (INIS)

    Fauchille, V.; Babst, S.

    2004-01-01

    From 2000 up to 2003, a comparison exercise concerning accident precursor programs was performed by IRSN, GRS, and NUPEC (Japan). The objective of this exercise was to compare the methodologies used to quantify conditional core damage probability related to incidents which can be considered as accident precursors. This exercise provided interesting results concerning the interpretation of such events. Generally, the participants identified similar scenarios of potential degradation. However, for several dominant sequences, differences in the results were noticed. The differences can be attributed to variations in the plant design, the strategy of management and in the methodological approach. For many reasons, comparison of human reliability analysis was difficult and perhaps another exercise in the future could provide more information about this subject. On the other hand, interesting outcomes have been obtained from the quantification of both common cause failures and potential common cause failures. (orig.)

  5. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  6. A symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences

    International Nuclear Information System (INIS)

    Xiao Fanghong

    2004-01-01

    By considering a chaotic pseudo-random sequence as a symbolic sequence, authors present a symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences. The method is applied to the cases of Logistic map and one-way coupled map lattice to demonstrate how it works, and a comparison is made between it and the approximate entropy method. The results show that this method is applicable to distinguish the complexities of different chaotic pseudo-random sequences, and it is superior to the approximate entropy method

  7. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  8. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  9. Improving pairwise comparison of protein sequences with domain co-occurrence

    Science.gov (United States)

    Gascuel, Olivier

    2018-01-01

    Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence PMID:29293498

  10. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).

  11. Quantitative comparison between a multiecho sequence and a single-echo sequence for susceptibility-weighted phase imaging.

    Science.gov (United States)

    Gilbert, Guillaume; Savard, Geneviève; Bard, Céline; Beaudoin, Gilles

    2012-06-01

    The aim of this study was to investigate the benefits arising from the use of a multiecho sequence for susceptibility-weighted phase imaging using a quantitative comparison with a standard single-echo acquisition. Four healthy adult volunteers were imaged on a clinical 3-T system using a protocol comprising two different three-dimensional susceptibility-weighted gradient-echo sequences: a standard single-echo sequence and a multiecho sequence. Both sequences were repeated twice in order to evaluate the local noise contribution by a subtraction of the two acquisitions. For the multiecho sequence, the phase information from each echo was independently unwrapped, and the background field contribution was removed using either homodyne filtering or the projection onto dipole fields method. The phase information from all echoes was then combined using a weighted linear regression. R2 maps were also calculated from the multiecho acquisitions. The noise standard deviation in the reconstructed phase images was evaluated for six manually segmented regions of interest (frontal white matter, posterior white matter, globus pallidus, putamen, caudate nucleus and lateral ventricle). The use of the multiecho sequence for susceptibility-weighted phase imaging led to a reduction of the noise standard deviation for all subjects and all regions of interest investigated in comparison to the reference single-echo acquisition. On average, the noise reduction ranged from 18.4% for the globus pallidus to 47.9% for the lateral ventricle. In addition, the amount of noise reduction was found to be strongly inversely correlated to the estimated R2 value (R=-0.92). In conclusion, the use of a multiecho sequence is an effective way to decrease the noise contribution in susceptibility-weighted phase images, while preserving both contrast and acquisition time. The proposed approach additionally permits the calculation of R2 maps. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. Incident sequence analysis; event trees, methods and graphical symbols

    International Nuclear Information System (INIS)

    1980-11-01

    When analyzing incident sequences, unwanted events resulting from a certain cause are looked for. Graphical symbols and explanations of graphical representations are presented. The method applies to the analysis of incident sequences in all types of facilities. By means of the incident sequence diagram, incident sequences, i.e. the logical and chronological course of repercussions initiated by the failure of a component or by an operating error, can be presented and analyzed simply and clearly

  13. Computer-aided visualization and analysis system for sequence evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  14. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  15. Triangular fibrocartilage lesions: comparison STIR sequence versus arthroscopy findings

    International Nuclear Information System (INIS)

    Wang Zhi; Meng; Xianghong; Wang Linsen; Suo Yongmei

    2013-01-01

    Objective: To explore the diagnostic value of short TI inversion recovery (STIR) sequence in evaluating triangular fibrocartilage (TFC) lesions, and to compare the findings with the arthroscopy findings. Materials and Methods: Wrist joint MR examination using STIR sequence and arthroscopy were performed in 56 patients with TFC lesions. The parameters of STIR sequence were: TR: 1164 ms, TE: 16 ms, and TI: 90 ms. The sensibility, specificity, positive predictive value, negative predictive value, and accuracy in the diagnosis of TFC lesions with STIR sequence were calculated, using arthroscopy as the standard. Results: (1) STIR manifested 10 patients with normal TFC; 6 with small edema or mucous degeneration in the body portion but not involving joint surface edge; 6 with horizontal avulsion in the body portion, but not involving joint surface edge; 6 with avulsion involving joint surface edge; 11 with perforation in central portion; 6 with avulsion in radial attached end; 5 with avulsion in ulnar attached end; 3 with avulsion in both radial and ulnar attached ends; 3 with irregular shape and thin on the whole TFC. (2) Arthroscopy manifested 21 patients with normal TFC; 8 with avulsion involving joint surface edge; 10 with perforation in central portion; 7 with avulsion in radial attached end; 5 with avulsion in ulnar attached end; 2 with avulsion in both radial and ulnar attached ends; 3 with irregular shape on the whole TFC. Using STIR sequence, the sensibility, specificity, positive predictive value, negative predictive value. and accuracy were 85.7%, 23.8%, 65.2%, 50%, and 62.5%, respectively, in detection of TFC lesions, with arthroscopy as the standard. Conclusion: STIR sequence has high diagnostic value in detection of TFC lesions. (authors)

  16. Comparison of ompP5 sequence-based typing and pulsed-filed gel ...

    African Journals Online (AJOL)

    In this study, comparison of the outer membrane protein P5 gene (ompP5) sequence-based typing with pulsed-field gel electrophoresis (PFGE) for the genotyping of Haemophilus parasuis, the 15 serovar reference strains and 43 isolates were investigated. When comparing the two methods, 31 ompP5 sequence types ...

  17. Reverse transcriptase sequences from mulberry LTR retrotransposons: characterization analysis

    Directory of Open Access Journals (Sweden)

    Ma Bi

    2017-10-01

    Full Text Available Copia and Gypsy play important roles in structural, functional and evolutionary dynamics of plant genomes. In this study, a total of 106 and 101, Copia and Gypsy reverse transcriptase (rt were amplified respectively in the Morus notabilis genome using degenerate primers. All sequences exhibited high levels of heterogeneity, were rich in AT and possessed higher sequence divergence of Copia rt in comparison to Gypsy rt. Two reasons are likely to account for this phenomenon: a these elements often experience deletions or fragmentation by illegitimate or unequal homologous recombination in the transposition process; b strong purifying selective pressure drives the evolution of these elements through “selective silencing” with random mutation and eventual deletion from the host genome. Interestingly, mulberry rt clustered with other rt from distantly related taxa according to the phylogenetic analysis. This phenomenon did not result from horizontal transposable element transfer. Results obtained from fluorescence in situ hybridization revealed that most of the hybridization signals were preferentially concentrated in pericentromeric and distal regions of chromosomes, and these elements may play important roles in the regions in which they are found. Results of this study support the continued pursuit of further functional studies of Copia and Gypsy in the mulberry genome.

  18. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  19. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment.

    Science.gov (United States)

    Baichoo, Shakuntala; Ouzounis, Christos A

    A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  1. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    Science.gov (United States)

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  2. Recurrence plot analysis of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  3. Substrate-driven mapping of the degradome by comparison of sequence logos.

    Directory of Open Access Journals (Sweden)

    Julian E Fuchs

    Full Text Available Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available.

  4. CompariMotif: quick and easy comparisons of sequence motifs.

    Science.gov (United States)

    Edwards, Richard J; Davey, Norman E; Shields, Denis C

    2008-05-15

    CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/

  5. 3D reconstruction software comparison for short sequences

    Science.gov (United States)

    Strupczewski, Adam; Czupryński, BłaŻej

    2014-11-01

    Large scale multiview reconstruction is recently a very popular area of research. There are many open source tools that can be downloaded and run on a personal computer. However, there are few, if any, comparisons between all the available software in terms of accuracy on small datasets that a single user can create. The typical datasets for testing of the software are archeological sites or cities, comprising thousands of images. This paper presents a comparison of currently available open source multiview reconstruction software for small datasets. It also compares the open source solutions with a simple structure from motion pipeline developed by the authors from scratch with the use of OpenCV and Eigen libraries.

  6. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order

  7. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  8. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  9. Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

    Directory of Open Access Journals (Sweden)

    Sadreyev Ruslan I

    2004-08-01

    Full Text Available Abstract Background Profile-based analysis of multiple sequence alignments (MSA allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1 MSA position and a set of predicted residue frequencies, and (2 between two MSA positions. These problems are important for (i evaluation and optimization of methods predicting residue occurrence at protein positions; (ii detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii detection of sites that determine functional or structural specificity in two related families. Results For problems (1 and (2, we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion The proposed computational method is of significant potential value for the analysis of protein families.

  10. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Directory of Open Access Journals (Sweden)

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  11. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  12. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

    Science.gov (United States)

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-11-23

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

  13. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  14. Sequence determination and analysis of the NSs genes of two tospoviruses.

    Science.gov (United States)

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  15. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers

    Directory of Open Access Journals (Sweden)

    Quail Michael A

    2012-07-01

    Full Text Available Abstract Background Next generation sequencing (NGS technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent’s PGM, Pacific Biosciences’ RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Results Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. Conclusions All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.

  16. Cloning and sequence analysis of benzo-a-pyreneinducible ...

    African Journals Online (AJOL)

    The phylogenetic tree based on the amino acid sequences clearly shows tilapia CYP1A and killifish CYP1A to be more closely related to each other than to the other CYP1A subfamilies. Sequence analysis of 3727 bp of genomic DNA showed that the clone obtained was the structural gene of CYP1A which consists of ...

  17. Biological sequence analysis: probabilistic models of proteins and nucleic acids

    National Research Council Canada - National Science Library

    Durbin, Richard

    1998-01-01

    ... analysis methods are now based on principles of probabilistic modelling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference...

  18. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  19. HIV drug resistance testing among patients failing second line antiretroviral therapy. Comparison of in-house and commercial sequencing.

    Science.gov (United States)

    Chimukangara, Benjamin; Varyani, Bhavini; Shamu, Tinei; Mutsvangwa, Junior; Manasa, Justen; White, Elizabeth; Chimbetete, Cleophas; Luethy, Ruedi; Katzenstein, David

    2017-05-01

    HIV genotyping is often unavailable in low and middle-income countries due to infrastructure requirements and cost. We compared genotype resistance testing in patients with virologic failure, by amplification of HIV pol gene, followed by "in-house" sequencing and commercial sequencing. Remnant plasma samples from adults and children failing second-line ART were amplified and sequenced using in-house and commercial di-deoxysequencing, and analyzed in Harare, Zimbabwe and at Stanford, U.S.A, respectively. HIV drug resistance mutations were determined using the Stanford HIV drug resistance database. Twenty-six of 28 samples were amplified and 25 were successfully genotyped. Comparison of average percent nucleotide and amino acid identities between 23 pairs sequenced in both laboratories were 99.51 (±0.56) and 99.11 (±0.95), respectively. All pairs clustered together in phylogenetic analysis. Sequencing analysis identified 6/23 pairs with mutation discordances resulting in differences in phenotype, but these did not impact future regimens. The results demonstrate our ability to produce good quality drug resistance data in-house. Despite discordant mutations in some sequence pairs, the phenotypic predictions were not clinically significant. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Parametric inference for biological sequence analysis.

    Science.gov (United States)

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  1. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  2. Comparison of sequence-based and structure-based phylogenetic ...

    Indian Academy of Sciences (India)

    Prakash

    phylogenetic tree construction methods, has been considered as an equivalent of .... Further detailed analysis described is restricted to the first two groups only. ..... Aspartate-ammonia ligase. Plant virus ..... enzymatic activities?; Trends ...

  3. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Indian Academy of Sciences (India)

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  4. Editorial: Special Issue on Algorithms for Sequence Analysis and Storage

    Directory of Open Access Journals (Sweden)

    Veli Mäkinen

    2014-03-01

    Full Text Available This special issue of Algorithms is dedicated to approaches to biological sequence analysis that have algorithmic novelty and potential for fundamental impact in methods used for genome research.

  5. Comparison of the phenomenology of SBO sequences with and without seals LOCA Westinghouse PWRs

    International Nuclear Information System (INIS)

    Mena Rosell, L.; Queral, C.; Jimenez Varas, G.

    2013-01-01

    SBO sequences have gained notoriety after the accident at Fukushima. Within this type of sequence the appearance or not of seals of the RCP LOCA determines the evolution of the accident. This work has been applied the methodology of integrated safety analysis (ISA), developed by the CSN, sequences of SBO. The objective is to compare the evolution of SBO sequences in a wide spectrum of conditions and recovery times of AC and DC loss. The simulations have been performed with the SCAIS tool coupled to MAAP. The set of simulations carried out, of the order of 2,000 sequences, clearly show the differences in the evolution of sequences with and without seals crazy. This type of analysis allows you to verify which would be the most appropriate management of sequence depending on the appearance or not of the MADWOMAN of seals.

  6. Isolation and sequence analysis of a cDNA clone encoding the fifth complement component

    DEFF Research Database (Denmark)

    Lundwall, Åke B; Wetsel, Rick A; Kristensen, Torsten

    1985-01-01

    DNA clone of 1.85 kilobase pairs was isolated. Hybridization of the mixed-sequence probe to the complementary strand of the plasmid insert and sequence analysis by the dideoxy method predicted the expected protein sequence of C5a (positions 1-12), amino-terminal to the anticipated priming site. The sequence......, subcloned into M13 mp8, and sequenced at random by the dideoxy technique, thereby generating a contiguous sequence of 1703 base pairs. This clone contained coding sequence for the C-terminal 262 amino acid residues of the beta-chain, the entire C5a fragment, and the N-terminal 98 residues of the alpha......'-chain. The 3' end of the clone had a polyadenylated tail preceded by a polyadenylation recognition site, a 3'-untranslated region, and base pairs homologous to the human Alu concensus sequence. Comparison of the derived partial human C5 protein sequence with that previously determined for murine C3 and human...

  7. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  8. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece; Hidayah, Lailatul; Preston, Mark D.; Clark, Taane G.; Pain, Arnab

    2014-01-01

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis

  9. Comparison of sequencing based CNV discovery methods using monozygotic twin quartets.

    Directory of Open Access Journals (Sweden)

    Marc-André Legault

    Full Text Available The advent of high throughput sequencing methods breeds an important amount of technical challenges. Among those is the one raised by the discovery of copy-number variations (CNVs using whole-genome sequencing data. CNVs are genomic structural variations defined as a variation in the number of copies of a large genomic fragment, usually more than one kilobase. Here, we aim to compare different CNV calling methods in order to assess their ability to consistently identify CNVs by comparison of the calls in 9 quartets of identical twin pairs. The use of monozygotic twins provides a means of estimating the error rate of each algorithm by observing CNVs that are inconsistently called when considering the rules of Mendelian inheritance and the assumption of an identical genome between twins. The similarity between the calls from the different tools and the advantage of combining call sets were also considered.ERDS and CNVnator obtained the best performance when considering the inherited CNV rate with a mean of 0.74 and 0.70, respectively. Venn diagrams were generated to show the agreement between the different algorithms, before and after filtering out familial inconsistencies. This filtering revealed a high number of false positives for CNVer and Breakdancer. A low overall agreement between the methods suggested a high complementarity of the different tools when calling CNVs. The breakpoint sensitivity analysis indicated that CNVnator and ERDS achieved better resolution of CNV borders than the other tools. The highest inherited CNV rate was achieved through the intersection of these two tools (81%.This study showed that ERDS and CNVnator provide good performance on whole genome sequencing data with respect to CNV consistency across families, CNV breakpoint resolution and CNV call specificity. The intersection of the calls from the two tools would be valuable for CNV genotyping pipelines.

  10. DSAP: deep-sequencing small RNA analysis pipeline.

    Science.gov (United States)

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  11. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Science.gov (United States)

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  12. Comparison of Enzymes / Non-Enzymes Proteins Classification Models Based on 3D, Composition, Sequences and Topological Indices

    OpenAIRE

    Munteanu, Cristian Robert

    2014-01-01

    Comparison of Enzymes / Non-Enzymes Proteins Classification Models Based on 3D, Composition, Sequences and Topological Indices, German Conference on Bioinformatics (GCB), Potsdam, Germany (September, 2007)

  13. A base composition analysis of natural patterns for the preprocessing of metagenome sequences.

    Science.gov (United States)

    Bonham-Carter, Oliver; Ali, Hesham; Bastola, Dhundy

    2013-01-01

    On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.

  14. Image ranking in video sequences using pairwise image comparisons and temporal smoothing

    CSIR Research Space (South Africa)

    Burke, Michael

    2016-12-01

    Full Text Available The ability to predict the importance of an image is highly desirable in computer vision. This work introduces an image ranking scheme suitable for use in video or image sequences. Pairwise image comparisons are used to determine image ‘interest...

  15. Automatic analysis of the 2015 Gorkha earthquake aftershock sequence.

    Science.gov (United States)

    Baillard, C.; Lyon-Caen, H.; Bollinger, L.; Rietbrock, A.; Letort, J.; Adhikari, L. B.

    2016-12-01

    The Mw 7.8 Gorkha earthquake, that partially ruptured the Main Himalayan Thrust North of Kathmandu on the 25th April 2015, was the largest and most catastrophic earthquake striking Nepal since the great M8.4 1934 earthquake. This mainshock was followed by multiple aftershocks, among them, two notable events that occurred on the 12th May with magnitudes of 7.3 Mw and 6.3 Mw. Due to these recent events it became essential for the authorities and for the scientific community to better evaluate the seismic risk in the region through a detailed analysis of the earthquake catalog, amongst others, the spatio-temporal distribution of the Gorkha aftershock sequence. Here we complement this first study by doing a microseismic study using seismic data coming from the eastern part of the Nepalese Seismological Center network associated to one broadband station in Everest. Our primary goal is to deliver an accurate catalog of the aftershock sequence. Due to the exceptional number of events detected we performed an automatic picking/locating procedure which can be splitted in 4 steps: 1) Coarse picking of the onsets using a classical STA/LTA picker, 2) phase association of picked onsets to detect and declare seismic events, 3) Kurtosis pick refinement around theoretical arrival times to increase picking and location accuracy and, 4) local magnitude calculation based amplitude of waveforms. This procedure is time efficient ( 1 sec/event), reduces considerably the location uncertainties ( 2 to 5 km errors) and increases the number of events detected compared to manual processing. Indeed, the automatic detection rate is 10 times higher than the manual detection rate. By comparing to the USGS catalog we were able to give a new attenuation law to compute local magnitudes in the region. A detailed analysis of the seismicity shows a clear migration toward the east of the region and a sudden decrease of seismicity 100 km east of Kathmandu which may reveal the presence of a tectonic

  16. Beyond Linear Sequence Comparisons: The use of genome-levelcharacters for phylogenetic reconstruction

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.

    2004-11-27

    Although the phylogenetic relationships of many organisms have been convincingly resolved by the comparisons of nucleotide or amino acid sequences, others have remained equivocal despite great effort. Now that large-scale genome sequencing projects are sampling many lineages, it is becoming feasible to compare large data sets of genome-level features and to develop this as a tool for phylogenetic reconstruction that has advantages over conventional sequence comparisons. Although it is unlikely that these will address a large number of evolutionary branch points across the broad tree of life due to the infeasibility of such sampling, they have great potential for convincingly resolving many critical, contested relationships for which no other data seems promising. However, it is important that we recognize potential pitfalls, establish reasonable standards for acceptance, and employ rigorous methodology to guard against a return to earlier days of scenario-driven evolutionary reconstructions.

  17. Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

    Science.gov (United States)

    Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

    2013-01-01

    Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121

  18. Nonlinear analysis of river flow time sequences

    Science.gov (United States)

    Porporato, Amilcare; Ridolfi, Luca

    1997-06-01

    Within the field of chaos theory several methods for the analysis of complex dynamical systems have recently been proposed. In light of these ideas we study the dynamics which control the behavior over time of river flow, investigating the existence of a low-dimension deterministic component. The present article follows the research undertaken in the work of Porporato and Ridolfi [1996a] in which some clues as to the existence of chaos were collected. Particular emphasis is given here to the problem of noise and to nonlinear prediction. With regard to the latter, the benefits obtainable by means of the interpolation of the available time series are reported and the remarkable predictive results attained with this nonlinear method are shown.

  19. Comparison of single-shot fast spin-echo sequence and T2-weighted fast spin-echo sequence in MR imaging of the brain

    International Nuclear Information System (INIS)

    Cha, Sung Ho; Seo, Jeong Jin; Jeong, Gwang Woo; Kim, Jae Kyu; Kim, Yun Hyeon; Jeong, Yong Yeon; Kang, Heoung Keun; Oh, Hee Yeon; Yoon, Jong Hoon

    1998-01-01

    The purpose of this study was to evaluate the usefulness of the single-shot fast spinecho (SS-FSE) sequence in comparison with the T2-weighted fast spin-echo (T2-FSE) sequence in brain MR imaging. In 41 patients aged 15-75 years with intracranial lesion, both SS-FSE and T2-FES images were obtained using a 1.5-T MR system. Lesions included cerebral ischemia or infarcts (n=3D23), tumors (n=3D10), hemorrhages (n=3D3), inflammatory diseases (n=3D2), arachnoid cysts(n=3D2), and vascular disease (n=3D1), and the MR images were retrospectively evaluated. To calculate contrast-to-noise ratio (CNR), percentage contrast, and signal-to-noise ratio (SNR)-and thus make a quantitative comparison-the mean signal intensities of lesions, normal brain tissue, and noise out-side the patient were measured. For qualitative comparison, the visibility, margin, and extent of the lesions were rated using a five-grade system, and the degree of MR artifacts was also evaluated. Wilcoxon's signed ranks test was used for statistical analysis. The mean CNR of lesions was significantly higher on SS-FSE (31.3) than on T2-FSE images (27.5) (p=3D0.0131). Mean percentage contrast was also higher on SS-FSE (159.0) than on T2-FSE images (108.5) (p=3D0.0222), but mean SNR was higher on T2-FSE (80.3) than on SS-FSE images (53.5) (p=3D0.0000). No significant differences in lesion visibility were observed between the two imaging sequences, though margin and extent of the lesion were worse on SS-FSE images. For MR artifacts, no significant differences were demonstrated. For the evaluation of most intracranial lesions, MR imaging using the SS-FSE sequence appears to be slightly inferior to the T2-FSE sequence, but may be useful where patients are ill or uncooperative, or where children require sedation.=20

  20. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    Science.gov (United States)

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  1. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  2. Accident sequence analysis of human-computer interface design

    International Nuclear Information System (INIS)

    Fan, C.-F.; Chen, W.-H.

    2000-01-01

    It is important to predict potential accident sequences of human-computer interaction in a safety-critical computing system so that vulnerable points can be disclosed and removed. We address this issue by proposing a Multi-Context human-computer interaction Model along with its analysis techniques, an Augmented Fault Tree Analysis, and a Concurrent Event Tree Analysis. The proposed augmented fault tree can identify the potential weak points in software design that may induce unintended software functions or erroneous human procedures. The concurrent event tree can enumerate possible accident sequences due to these weak points

  3. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  4. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering

  5. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  6. An optimum analysis sequence for environmental gamma-ray spectrometry

    Energy Technology Data Exchange (ETDEWEB)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L., E-mail: fta777@hotmail.co [Universidad Autonoma de Zacatecas, Centro Regional de Estudis Nucleares, Calle Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2010-10-15

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced {chi}{sup 2} criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  7. An optimum analysis sequence for environmental gamma-ray spectrometry

    International Nuclear Information System (INIS)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  8. The complete genome sequences of poxviruses isolated from a penguin and a pigeon in South Africa and comparison to other sequenced avipoxviruses.

    Science.gov (United States)

    Offerman, Kristy; Carulei, Olivia; van der Walt, Anelda Philine; Douglass, Nicola; Williamson, Anna-Lise

    2014-06-12

    Two novel avipoxviruses from South Africa have been sequenced, one from a Feral Pigeon (Columba livia) (FeP2) and the other from an African penguin (Spheniscus demersus) (PEPV). We present a purpose-designed bioinformatics pipeline for analysis of next generation sequence data of avian poxviruses and compare the different avipoxviruses sequenced to date with specific emphasis on their evolution and gene content. The FeP2 (282 kbp) and PEPV (306 kbp) genomes encode 271 and 284 open reading frames respectively and are more closely related to one another (94.4%) than to either fowlpox virus (FWPV) (85.3% and 84.0% respectively) or Canarypox virus (CNPV) (62.0% and 63.4% respectively). Overall, FeP2, PEPV and FWPV have syntenic gene arrangements; however, major differences exist throughout their genomes. The most striking difference between FeP2 and the FWPV-like avipoxviruses is a large deletion of ~16 kbp from the central region of the genome of FeP2 deleting a cc-chemokine-like gene, two Variola virus B22R orthologues, an N1R/p28-like gene and a V-type Ig domain family gene. FeP2 and PEPV both encode orthologues of vaccinia virus C7L and Interleukin 10. PEPV contains a 77 amino acid long orthologue of Ubiquitin sharing 97% amino acid identity to human ubiquitin. The genome sequences of FeP2 and PEPV have greatly added to the limited repository of genomic information available for the Avipoxvirus genus. In the comparison of FeP2 and PEPV to existing sequences, FWPV and CNPV, we have established insights into African avipoxvirus evolution. Our data supports the independent evolution of these South African avipoxviruses from a common ancestral virus to FWPV and CNPV.

  9. Genomic 3' terminal sequence comparison of three isolates of rabbit haemorrhagic disease virus.

    Science.gov (United States)

    Milton, I D; Vlasak, R; Nowotny, N; Rodak, L; Carter, M J

    1992-05-15

    Comparison of sequence data is necessary in older to investigate virus origins, identify features common to virulent strains, and characterize genomic organization within virus families. A virulent caliciviral disease of rabbits recently emerged in China. We have sequenced 1100 bases from the 3' ends of two independent European isolates of this virus, and compared these with previously determined calicivirus sequences. Rabbit caliciviruses were closely related, despite the different countries in which isolation was made. This supports the rapid spread of a new virus across Europe. The capsid protein sequences of these rabbit viruses differ markedly from those determined for feline calicivirus, but a hypothetical 3' open reading frame is relatively well conserved between the caliciviruses of these two different hosts and argues for a functional role.

  10. Comparison of the effects of the CHESS sequence and the SPAIR sequence for fat saturation

    Science.gov (United States)

    Dong, Kyung-Rae; Goo, Eun-Hoe; Kweon, Dae-Cheol; Chung, Woon-Kwan; Lee, Jong-Woong

    2013-06-01

    This study compared the abilities of the chemical-shift selective saturation(CHESS) and the spectrally-adiabatic inversion recovery (SPAIR) fat-saturation techniques to resolve the recent problems in fat saturation caused by areas of changing volume such as the head and the neck and by metal artifacts when T1 fat-saturation techniques representing the anatomical images and T2 fat-saturation techniques representing pathological images are used. To compare the abilities of CHESS and SPAIR, we acquired images of the head and the neck and of the pelvis, and we compared the contrast-to-noise ratios (CNRs) and the signal-to-noise ratios (SNRs) of the signals from the flexed body parts. Images were taken of the abdomens, heads and necks, and pelvises of 15 men and 15 women (30 in total). In all scanning techniques, the SNRs and the CNRs were calculated based on a quantitative analysis method with a view to obtaining uniform data. According to the study results, the CNRs of the SPAIR and the CHESS techniques for the pelvis in the T1-weighted image were 55.10 and 67.23, respectively. The SNRs of the SPAIR technique were70.61 for muscle and 15.50 for fat whereas the SNRs of the CHESS technique were 79.23 for muscle and 12.00 for fat. For the pelvis in the T2-weighted image, the CNRs of the SPAIR and the CHESS technique were 12.50 and 16.66, respectively. The SNRs of the SPAIR technique were 16.98 for muscle and 5.14 for fat. In contrast, the SNRs of the CHESS technique were 27.90 for muscle and 11.23 for fat. Consequently, the signal intensity was higher in the CHESS than in the SPAIR technique. Nevertheless, with regard to the clinical usefulness, the image quality was higher in the SPAIR technique than in the CHESS technique.

  11. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Science.gov (United States)

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  12. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  13. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    Directory of Open Access Journals (Sweden)

    Jonathan Z Li

    Full Text Available The impact of raltegravir-resistant HIV-1 minority variants (MVs on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs.A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser.Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001. Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454.In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  14. A novel RNA sequencing data analysis method for cell line authentication.

    Directory of Open Access Journals (Sweden)

    Erik Fasterius

    Full Text Available We have developed a novel analysis method that can interrogate the authenticity of biological samples used for generation of transcriptome profiles in public data repositories. The method uses RNA sequencing information to reveal mutations in expressed transcripts and subsequently confirms the identity of analysed cells by comparison with publicly available cell-specific mutational profiles. Cell lines constitute key model systems widely used within cancer research, but their identity needs to be confirmed in order to minimise the influence of cell contaminations and genetic drift on the analysis. Using both public and novel data, we demonstrate the use of RNA-sequencing data analysis for cell line authentication by examining the validity of COLO205, DLD1, HCT15, HCT116, HKE3, HT29 and RKO colorectal cancer cell lines. We successfully authenticate the studied cell lines and validate previous reports indicating that DLD1 and HCT15 are synonymous. We also show that the analysed HKE3 cells harbour an unexpected KRAS-G13D mutation and confirm that this cell line is a genuine KRAS dosage mutant, rather than a true isogenic derivative of HCT116 expressing only the wild type KRAS. This authentication method could be used to revisit the numerous cell line based RNA sequencing experiments available in public data repositories, analyse new experiments where whole genome sequencing is not available, as well as facilitate comparisons of data from different experiments, platforms and laboratories.

  15. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library.

    Science.gov (United States)

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes provides the necessary foundation for

  16. Generation and analysis of expressed sequence tags from Botrytis cinerea

    Directory of Open Access Journals (Sweden)

    EVELYN SILVA

    2006-01-01

    Full Text Available Botrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23% have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively. The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively. Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen

  17. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    Energy Technology Data Exchange (ETDEWEB)

    Ovacik, Meric A. [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Androulakis, Ioannis P., E-mail: yannis@rci.rutgers.edu [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States)

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  18. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    International Nuclear Information System (INIS)

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-01-01

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy

  19. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  20. Comparison of Pre-Analytical FFPE Sample Preparation Methods and Their Impact on Massively Parallel Sequencing in Routine Diagnostics

    Science.gov (United States)

    Heydt, Carina; Fassunke, Jana; Künstlinger, Helen; Ihle, Michaela Angelika; König, Katharina; Heukamp, Lukas Carl; Schildhaus, Hans-Ulrich; Odenthal, Margarete; Büttner, Reinhard; Merkelbach-Bruse, Sabine

    2014-01-01

    Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE) material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany) seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3–24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can be used for

  1. Comparison of pre-analytical FFPE sample preparation methods and their impact on massively parallel sequencing in routine diagnostics.

    Directory of Open Access Journals (Sweden)

    Carina Heydt

    Full Text Available Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3-24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA. No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can

  2. 3D representations of amino acids—applications to protein sequence comparison and classification

    Directory of Open Access Journals (Sweden)

    Jie Li

    2014-08-01

    Full Text Available The amino acid sequence of a protein is the key to understanding its structure and ultimately its function in the cell. This paper addresses the fundamental issue of encoding amino acids in ways that the representation of such a protein sequence facilitates the decoding of its information content. We show that a feature-based representation in a three-dimensional (3D space derived from amino acid substitution matrices provides an adequate representation that can be used for direct comparison of protein sequences based on geometry. We measure the performance of such a representation in the context of the protein structural fold prediction problem. We compare the results of classifying different sets of proteins belonging to distinct structural folds against classifications of the same proteins obtained from sequence alone or directly from structural information. We find that sequence alone performs poorly as a structure classifier. We show in contrast that the use of the three dimensional representation of the sequences significantly improves the classification accuracy. We conclude with a discussion of the current limitations of such a representation and with a description of potential improvements.

  3. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  4. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  5. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    Science.gov (United States)

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  6. Sequence symmetry analysis in pharmacovigilance and pharmacoepidemiologic studies

    DEFF Research Database (Denmark)

    Lai, Edward Chia Cheng; Pratt, Nicole; Hsieh, Cheng Yang

    2017-01-01

    Sequence symmetry analysis (SSA) is a method for detecting adverse drug events by utilizing computerized claims data. The method has been increasingly used to investigate safety concerns of medications and as a pharmacovigilance tool to identify unsuspected side effects. Validation studies have i...

  7. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    Science.gov (United States)

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  8. DNAApp: a mobile application for sequencing data analysis.

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  9. DNAApp: a mobile application for sequencing data analysis

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  10. Long-read sequencing data analysis for yeasts.

    Science.gov (United States)

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  11. Testing statistical significance scores of sequence comparison methods with structure similarity

    Directory of Open Access Journals (Sweden)

    Leunissen Jack AM

    2006-10-01

    Full Text Available Abstract Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.

  12. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  13. Nucleotide sequence analysis of the Legionella micdadei mip gene, encoding a 30-kilodalton analog of the Legionella pneumophila Mip protein

    DEFF Research Database (Denmark)

    Bangsborg, Jette Marie; Cianciotto, N P; Hindersson, P

    1991-01-01

    After the demonstration of analogs of the Legionella pneumophila macrophage infectivity potentiator (Mip) protein in other Legionella species, the Legionella micdadei mip gene was cloned and expressed in Escherichia coli. DNA sequence analysis of the L. micdadei mip gene contained in the plasmid p...... homology with the mip-like genes of several Legionella species. Furthermore, amino acid sequence comparisons revealed significant homology to two eukaryotic proteins with isomerase activity (FK506-binding proteins)....

  14. Analysis of Sequence Diagram Layout in Advanced UML Modelling Tools

    Directory of Open Access Journals (Sweden)

    Ņikiforova Oksana

    2016-05-01

    Full Text Available System modelling using Unified Modelling Language (UML is the task that should be solved for software development. The more complex software becomes the higher requirements are stated to demonstrate the system to be developed, especially in its dynamic aspect, which in UML is offered by a sequence diagram. To solve this task, the main attention is devoted to the graphical presentation of the system, where diagram layout plays the central role in information perception. The UML sequence diagram due to its specific structure is selected for a deeper analysis on the elements’ layout. The authors research represents the abilities of modern UML modelling tools to offer automatic layout of the UML sequence diagram and analyse them according to criteria required for the diagram perception.

  15. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  16. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Science.gov (United States)

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  17. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  18. Now And Next Generation Sequencing Techniques: Future of Sequence Analysis using Cloud Computing

    Directory of Open Access Journals (Sweden)

    Radhe Shyam Thakur

    2012-12-01

    Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.

  19. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    Science.gov (United States)

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  20. SEQUENCING AND SEQUENCE ANALYSIS OF MYOSTATIN GENE IN THE EXON 1 OF THE CAMEL (CAMELUS DROMEDARIUS

    Directory of Open Access Journals (Sweden)

    M. G. SHAH, A. S. QURESHI1, M. REISSMANN2 AND H. J. SCHWARTZ3

    2006-10-01

    Full Text Available Myostatin, also called growth differentiation factor-8 (GDF-8, is a member of the mammalian growth transforming family (TGF-beta superfamily, which is expressed specifically in developing an adult skeletal muscle. Muscular hypertrophy allele (mh allele in the double muscle breeds involved mutation within the myostatin gene. Genomic DNA was isolated from the camel hair using NucleoSpin Tissue kit. Two animals of each of the six breeds namely, Marecha, Dhatti, Larri, Kohi, Sakrai and Cambelpuri were used for sequencing. For PCR amplification of the gene, a primer pair was designed from homolog regions of already published sequences of farm animals from GenBank. Results showed that camel myostatin possessed more than 90% homology with that of cattle, sheep and pig. Camel formed separate cluster from the pig in spite of having high homology (98% and showed 94% homology with cattle and sheep as reported in literature. Sequence analysis of the PCR amplified part of exon 1 (256 bp of the camel myostatin was identical among six camel breeds.

  1. RNA2 of grapevine fanleaf virus: sequence analysis and coat protein cistron location.

    Science.gov (United States)

    Serghini, M A; Fuchs, M; Pinck, M; Reinbolt, J; Walter, B; Pinck, L

    1990-07-01

    The nucleotide sequence of the genomic RNA2 (3774 nucleotides) of grapevine fanleaf virus strain F13 was determined from overlapping cDNA clones and its genetic organization was deduced. Two rapid and efficient methods were used for cDNA cloning of the 5' region of RNA2. The complete sequence contained only one long open reading frame of 3555 nucleotides (1184 codons, 131K product). The analysis of the N-terminal sequence of purified coat protein (CP) and identification of its C-terminal residue have allowed the CP cistron to be precisely positioned within the polyprotein. The CP produced by proteolytic cleavage at the Arg/Gly site between residues 680 and 681 contains 504 amino acids (Mr 56019) and has hydrophobic properties. The Arg/Gly cleavage site deduced by N-terminal amino acid sequence analysis is the first for a nepovirus coat protein and for plant viruses expressing their genomic RNAs by polyprotein synthesis. Comparison of GFLV RNA2 with M RNA of cowpea mosaic comovirus and with RNA2 of two closely related nepoviruses, tomato black ring virus and Hungarian grapevine chrome mosaic virus, showed strong similarities among the 3' non-coding regions but less similarity among the 5' end non-coding sequences than reported among other nepovirus RNAs.

  2. SeqVISTA: a graphical tool for sequence feature visualization and comparison

    Directory of Open Access Journals (Sweden)

    Niu Tianhua

    2003-01-01

    Full Text Available Abstract Background Many readers will sympathize with the following story. You are viewing a gene sequence in Entrez, and you want to find whether it contains a particular sequence motif. You reach for the browser's "find in page" button, but those darn spaces every 10 bp get in the way. And what if the motif is on the opposite strand? Subsequently, your favorite sequence analysis software informs you that there is an interesting feature at position 13982–14013. By painstakingly counting the 10 bp blocks, you are able to examine the sequence at this location. But now you want to see what other features have been annotated close by, and this information is buried several screenfuls higher up the web page. Results SeqVISTA presents a holistic, graphical view of features annotated on nucleotide or protein sequences. This interactive tool highlights the residues in the sequence that correspond to features chosen by the user, and allows easy searching for sequence motifs or extraction of particular subsequences. SeqVISTA is able to display results from diverse sequence analysis tools in an integrated fashion, and aims to provide much-needed unity to the bioinformatics resources scattered around the Internet. Our viewer may be launched on a GenBank record by a single click of a button installed in the web browser. Conclusion SeqVISTA allows insights to be gained by viewing the totality of sequence annotations and predictions, which may be more revealing than the sum of their parts. SeqVISTA runs on any operating system with a Java 1.4 virtual machine. It is freely available to academic users at http://zlab.bu.edu/SeqVISTA.

  3. An Imaging And Graphics Workstation For Image Sequence Analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  4. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Directory of Open Access Journals (Sweden)

    Mo Sisu

    2013-02-01

    Full Text Available Abstract Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA to 8.9% (dnaN. Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic

  5. Sirius PSB: a generic system for analysis of biological sequences.

    Science.gov (United States)

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  6. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    Science.gov (United States)

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Directory of Open Access Journals (Sweden)

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  8. Retrospective comparison of three-dimensional imaging sequences in the visualization of posterior fossa cranial nerves.

    Science.gov (United States)

    Ors, Suna; Inci, Ercan; Turkay, Rustu; Kokurcan, Atilla; Hocaoglu, Elif

    2017-12-01

    To compare efficancy of three-dimentional SPACE (sampling perfection with application-optimized contrasts using different flip-angle evolutions) and CISS (constructive interference in steady state) sequences in the imaging of the cisternal segments of cranial nerves V-XII. Temporal MRI scans from 50 patients (F:M ratio, 27:23; mean age, 44.5±15.9 years) admitted to our hospital with vertigo, tinnitus, and hearing loss were retrospectively analyzed. All patients had both CISS and SPACE sequences. Quantitative analysis of SPACE and CISS sequences was performed by measuring the ventricle-to-parenchyma contrast-to-noise ratio (CNR). Qualitative analysis of differences in visualization capability, image quality, and severity of artifacts was also conducted. A score ranging 'no artefact' to 'severe artefacts and unreadable' was used for the assessment of artifacts and from 'not visualized' to 'completely visualized' for the assesment of image quality, respectively. The distribution of variables was controlled by the Kolmogorov-Smirnov test. Samples t-test and McNemar's test were used to determine statistical significance. Rates of visualization of posterior fossa cranial nerves in cases of complete visualization were as follows: nerve V (100% for both sequences), nerve VI (94% in SPACE, 86% in CISS sequences), nerves VII-VIII (100% for both sequences), IX-XI nerve complex (96%, 88%); nerve XII (58%, 46%) (p<0.05). SPACE sequences showed fewer artifacts than CISS sequences (p<0.002). Copyright © 2017 Elsevier B.V. All rights reserved.

  9. A comparison of 454 sequencing and clonal sequencing for the characterization of hepatitis C virus NS3 variants

    NARCIS (Netherlands)

    Ho, Cynthia K. Y.; Welkers, Matthijs R. A.; Thomas, Xiomara V.; Sullivan, James C.; Kieffer, Tara L.; Reesink, Henk W.; Rebers, Sjoerd P. H.; de Jong, Menno D.; Schinkel, Janke; Molenkamp, Richard

    2015-01-01

    We compared 454 amplicon sequencing with clonal sequencing for the characterization of intra-host hepatitis C virus (HCV) NS3 variants. Clonal and 454 sequences were obtained from 12 patients enrolled in a clinical phase I study for telaprevir, an NS3-4a protease inhibitor. Thirty-nine datasets were

  10. HBVRegDB: Annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences

    Directory of Open Access Journals (Sweden)

    Firth Andrew E

    2007-12-01

    Full Text Available Abstract Background The many Hepadnaviridae sequences available have widely varied functional annotation. The genomes are very compact (~3.2 kb but contain multiple layers of functional regulatory elements in addition to coding regions. Key regions are subject to purifying selection, as mutations in these regions will produce non-functional viruses. Results These genomic sequences have been organized into a structured database to facilitate research at the molecular level. HBVRegDB is a comparative genomic analysis tool with an integrated underlying sequence database. The database contains genomic sequence data from representative viruses. In addition to INSDC and RefSeq annotation, HBVRegDB also contains expert and systematically calculated annotations (e.g. promoters and comparative genome analysis results (e.g. blastn, tblastx. It also contains analyses based on curated HBV alignments. Information about conserved regions – including primary conservation (e.g. CDS-Plotcon and RNA secondary structure predictions (e.g. Alidot – is integrated into the database. A large amount of data is graphically presented using the GBrowse (Generic Genome Browser adapted for analysis of viral genomes. Flexible query access is provided based on any annotated genomic feature. Novel regulatory motifs can be found by analysing the annotated sequences. Conclusion HBVRegDB serves as a knowledge database and as a comparative genomic analysis tool for molecular biologists investigating HBV. It is publicly available and complementary to other viral and HBV focused datasets and tools http://hbvregdb.otago.ac.nz. The availability of multiple and highly annotated sequences of viral genomes in one database combined with comparative analysis tools facilitates detection of novel genomic elements.

  11. Comparison of modern 3D and 2D MR imaging sequences of the wrist at 3 Tesla

    International Nuclear Information System (INIS)

    Rehnitz, C.; Klaan, B.; Amarteifio, E.; Kauczor, H.U.; Weber, M.A.; Stillfried, F. von; Burkholder, I.

    2016-01-01

    To compare the image quality of modern 3 D and 2 D sequences for dedicated wrist imaging at 3 Tesla (T) MRI. At 3 T MRI, 18 patients (mean age: 36.2 years) with wrist pain and 16 healthy volunteers (mean age: 26.4 years) were examined using 2 D proton density-weighted fat-saturated (PDfs), isotropic 3 D TrueFISP, 3 D MEDIC, and 3 D PDfs SPACE sequences. Image quality was rated on a five-point scale (0 - 4) including overall image quality (OIQ), visibility of important structures (cartilage, ligaments, TFCC) and degree of artifacts. Signal-to-noise ratios (SNR) and contrast-to-noise ratios (CNR) of cartilage/bone/muscle/fluid as well as the mean overall SNR/CNR were calculated using region-of-interest analysis. ANOVA, paired t-, and Wilcoxon-signed-rank tests were applied. The image quality of all tested sequences was superior to 3 D PDfs SPACE (p < 0.01). 3 D TrueFISP had the highest combined cartilage score (mean: 3.4) and performed better in cartilage comparisons against 3 D PDfs SPACE in both groups and 2 D PDfs in volunteers (p < 0.05). 3 D MEDIC performed better in 7 of 8 comparisons (p < 0.05) regarding ligaments and TFCC. 2 D PDfs provided constantly high scores. The mean overall SNR/CNR for 2 D PDfs, 3 D PDfs SPACE, 3 D TrueFISP, and 3 D MEDIC were 68/65, 32/27, 45/47, and 57/45, respectively. 2 D PDfs performed best in most SNR/CNR comparisons (p < 0.05) and 3 D MEDIC performed best within the 3 D sequences (p < 0.05). Except 3 D PDfs SPACE, all tested 3 D and 2 D sequences provided high image quality. 3 D TrueFISP was best for cartilage imaging, 3 D MEDIC for ligaments and TFCC and 2 D PDfs for general wrist imaging.

  12. Environmental impact analysis for the main accidental sequences of ignitor

    International Nuclear Information System (INIS)

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-01-01

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs

  13. Comparison of static model and dynamic model for the evaluation of station blackout sequences

    International Nuclear Information System (INIS)

    Lee, Kwang-Nam; Kang, Sun-Koo; Hong, Sung-Yull.

    1992-01-01

    Station blackout is one of major contributors to the core damage frequency (CDF) in many PSA studies. Since station blackout sequence exhibits dynamic features, accurate calculation of CDF for the station blackout sequence is not possible with event tree/fault tree (ET/FT) method. Although the integral method can determine accurate CDF, it is time consuming and is difficult to evaluate various alternative AC source configuration and sensitivities. In this study, a comparison is made between static model and dynamic model and a new methodology which combines static model and dynamic model is provided for the accurate quantification of CDF and evaluation of improvement alternatives. Results of several case studies show that accurate calculation of CDF is possible by introducing equivalent mission time. (author)

  14. Cloning, characterization and sequence comparison of the gene coding for IMP dehydrogenase from Pyrococcus furiosus.

    Science.gov (United States)

    Collart, F R; Osipiuk, J; Trent, J; Olsen, G J; Huberman, E

    1996-10-03

    We have cloned and characterized the gene encoding inosine monophosphate dehydrogenase (IMPDH) from Pyrococcus furiosus (Pf), a hyperthermophillic archeon. Sequence analysis of the Pf gene indicated an open reading frame specifying a protein of 485 amino acids (aa) with a calculated M(r) of 52900. Canonical Archaea promoter elements, Box A and Box B, are located -49 and -17 nucleotides (nt), respectively, upstream of the putative start codon. The sequence of the putative active-site region conforms to the IMPDH signature motif and contains a putative active-site cysteine. Phylogenetic relationships derived by using all available IMPDH sequences are consistent with trees developed for other molecules; they do not precisely resolve the history of Pf IMPDH but indicate a close similarity to bacterial IMPDH proteins. The phylogenetic analysis indicates that a gene duplication occurred prior to the division between rodents and humans, accounting for the Type I and II isoforms identified in mice and humans.

  15. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    Science.gov (United States)

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  16. Whole genome sequencing and bioinformatics analysis of two Egyptian genomes.

    Science.gov (United States)

    ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong

    2018-05-15

    We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.

  17. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    Science.gov (United States)

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  18. Using Behavior Sequence Analysis to Map Serial Killers' Life Histories.

    Science.gov (United States)

    Keatley, David A; Golightly, Hayley; Shephard, Rebecca; Yaksic, Enzo; Reid, Sasha

    2018-03-01

    The aim of the current research was to provide a novel method for mapping the developmental sequences of serial killers' life histories. An in-depth biographical account of serial killers' lives, from birth through to conviction, was gained and analyzed using Behavior Sequence Analysis. The analyses highlight similarities in behavioral events across the serial killers' lives, indicating not only which risk factors occur, but the temporal order of these factors. Results focused on early childhood environment, indicating the role of parental abuse; behaviors and events surrounding criminal histories of serial killers, showing that many had previous convictions and were known to police for other crimes; behaviors surrounding their murders, highlighting differences in victim choice and modus operandi; and, finally, trial pleas and convictions. The present research, therefore, provides a novel approach to synthesizing large volumes of data on criminals and presenting results in accessible, understandable outcomes.

  19. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    Science.gov (United States)

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  20. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    Directory of Open Access Journals (Sweden)

    Nathan D. Olson

    2015-03-01

    Full Text Available This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1 identity of biologically conserved position, (2 ratio of 16S rRNA gene copies featuring identified variants, and (3 the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies.

  1. Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx

    Directory of Open Access Journals (Sweden)

    Colbourne John K

    2009-05-01

    Full Text Available Abstract Background New methods are needed for genomic-scale analysis of emerging model organisms that exemplify important biological questions but lack fully sequenced genomes. For example, there is an urgent need to understand the potential for corals to adapt to climate change, but few molecular resources are available for studying these processes in reef-building corals. To facilitate genomics studies in corals and other non-model systems, we describe methods for transcriptome sequencing using 454, as well as strategies for assembling a useful catalog of genes from the output. We have applied these methods to sequence the transcriptome of planulae larvae from the coral Acropora millepora. Results More than 600,000 reads produced in a single 454 sequencing run were assembled into ~40,000 contigs with five-fold average sequencing coverage. Based on sequence similarity with known proteins, these analyses identified ~11,000 different genes expressed in a range of conditions including thermal stress and settlement induction. Assembled sequences were annotated with gene names, conserved domains, and Gene Ontology terms. Targeted searches using these annotations identified the majority of genes associated with essential metabolic pathways and conserved signaling pathways, as well as novel candidate genes for stress-related processes. Comparisons with the genome of the anemone Nematostella vectensis revealed ~8,500 pairs of orthologs and ~100 candidate coral-specific genes. More than 30,000 SNPs were detected in the coral sequences, and a subset of these validated by re-sequencing. Conclusion The methods described here for deep sequencing of the transcriptome should be widely applicable to generate catalogs of genes and genetic markers in emerging model organisms. Our data provide the most comprehensive sequence resource currently available for reef-building corals, and include an extensive collection of potential genetic markers for association and

  2. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  3. Molecular cloning and sequence analysis of growth hormone cDNA of Neotropical freshwater fish Pacu (Piaractus mesopotamicus

    Directory of Open Access Journals (Sweden)

    Janeth Silva Pinheiro

    2008-01-01

    Full Text Available RT-PCR was used for amplifying Piaractus mesopotamicus growth hormone (GH cDNA obtained from mRNA extracted from pituitary cells. The amplified fragment was cloned and the complete cDNA sequence was determined. The cloned cDNA encompassed a sequence of 543 nucleotides that encoded a polypeptide of 178 amino acids corresponding to mature P. mesopotamicus GH. Comparison with other GH sequences showed a gap of 10 amino acids localized in the N terminus of the putative polypeptide of P. mesopotamicus. This same gap was also observed in other members of the family. Neighbor-joining tree analysis with GH sequences from fishes belonging to different taxonomic groups placed the P. mesopotamicus GH within the Otophysi group. To our knowledge, this is the first GH sequence of a Neotropical characiform fish deposited in GenBank.

  4. Draft Genome Sequencing and Comparative Analysis of Aspergillus sojae NBRC4239

    Science.gov (United States)

    Sato, Atsushi; Oshima, Kenshiro; Noguchi, Hideki; Ogawa, Masahiro; Takahashi, Tadashi; Oguma, Tetsuya; Koyama, Yasuji; Itoh, Takehiko; Hattori, Masahira; Hanya, Yoshiki

    2011-01-01

    We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of α-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries. PMID:21659486

  5. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit; Chaudhuri, Probal; Ghosh, Anil

    2014-01-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  6. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Science.gov (United States)

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  7. Analysis of correlations between sites in models of protein sequences

    International Nuclear Information System (INIS)

    Giraud, B.G.; Lapedes, A.; Liu, L.C.

    1998-01-01

    A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences; however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from 2, illustrating the relation to familiar classical spin systems, to 20 states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces nonindependence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: The more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor. copyright 1998 The American Physical Society

  8. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit

    2014-02-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  9. Computational Software to Fit Seismic Data Using Epidemic-Type Aftershock Sequence Models and Modeling Performance Comparisons

    Science.gov (United States)

    Chu, A.

    2016-12-01

    Modern earthquake catalogs are often analyzed using spatial-temporal point process models such as the epidemic-type aftershock sequence (ETAS) models of Ogata (1998). My work implements three of the homogeneous ETAS models described in Ogata (1998). With a model's log-likelihood function, my software finds the Maximum-Likelihood Estimates (MLEs) of the model's parameters to estimate the homogeneous background rate and the temporal and spatial parameters that govern triggering effects. EM-algorithm is employed for its advantages of stability and robustness (Veen and Schoenberg, 2008). My work also presents comparisons among the three models in robustness, convergence speed, and implementations from theory to computing practice. Up-to-date regional seismic data of seismic active areas such as Southern California and Japan are used to demonstrate the comparisons. Data analysis has been done using computer languages Java and R. Java has the advantages of being strong-typed and easiness of controlling memory resources, while R has the advantages of having numerous available functions in statistical computing. Comparisons are also made between the two programming languages in convergence and stability, computational speed, and easiness of implementation. Issues that may affect convergence such as spatial shapes are discussed.

  10. Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

    Science.gov (United States)

    Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

    2018-04-01

    The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.

  11. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  12. A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Wenyu Zhang

    Full Text Available The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.

  13. Comparison of two approaches for the classification of 16S rRNA gene sequences.

    Science.gov (United States)

    Chatellier, Sonia; Mugnier, Nathalie; Allard, Françoise; Bonnaud, Bertrand; Collin, Valérie; van Belkum, Alex; Veyrieras, Jean-Baptiste; Emler, Stefan

    2014-10-01

    The use of 16S rRNA gene sequences for microbial identification in clinical microbiology is accepted widely, and requires databases and algorithms. We compared a new research database containing curated 16S rRNA gene sequences in combination with the lca (lowest common ancestor) algorithm (RDB-LCA) to a commercially available 16S rDNA Centroid approach. We used 1025 bacterial isolates characterized by biochemistry, matrix-assisted laser desorption/ionization time-of-flight MS and 16S rDNA sequencing. Nearly 80 % of isolates were identified unambiguously at the species level by both classification platforms used. The remaining isolates were mostly identified correctly at the genus level due to the limited resolution of 16S rDNA sequencing. Discrepancies between both 16S rDNA platforms were due to differences in database content and the algorithm used, and could amount to up to 10.5 %. Up to 1.4 % of the analyses were found to be inconclusive. It is important to realize that despite the overall good performance of the pipelines for analysis, some inconclusive results remain that require additional in-depth analysis performed using supplementary methods. © 2014 The Authors.

  14. [Sequence analysis of LEAFY homologous gene from Dendrobium moniliforme and application for identification of medicinal Dendrobium].

    Science.gov (United States)

    Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu

    2013-04-01

    The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.

  15. Chimira: analysis of small RNA sequencing data and microRNA modifications.

    Science.gov (United States)

    Vitsios, Dimitrios M; Enright, Anton J

    2015-10-15

    Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3'-modifications (e.g. uridylation, adenylation), 5'-modifications and also internal modifications or variation (ADAR editing or single nucleotide polymorphisms). Besides cleaning and mapping of input sequences to miRNAs, Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary Material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplementary Material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface. Chimira has been developed as a web application and it is accessible here: http://www.ebi.ac.uk/research/enright/software/chimira. aje@ebi.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  16. Zucchini yellow mosaic virus: biological properties, detection procedures and comparison of coat protein gene sequences.

    Science.gov (United States)

    Coutts, B A; Kehoe, M A; Webster, C G; Wylie, S J; Jones, R A C

    2011-12-01

    Between 2006 and 2010, 5324 samples from at least 34 weed, two cultivated legume and 11 native species were collected from three cucurbit-growing areas in tropical or subtropical Western Australia. Two new alternative hosts of zucchini yellow mosaic virus (ZYMV) were identified, the Australian native cucurbit Cucumis maderaspatanus, and the naturalised legume species Rhyncosia minima. Low-level (0.7%) seed transmission of ZYMV was found in seedlings grown from seed collected from zucchini (Cucurbita pepo) fruit infected with isolate Cvn-1. Seed transmission was absent in >9500 pumpkin (C. maxima and C. moschata) seedlings from fruit infected with isolate Knx-1. Leaf samples from symptomatic cucurbit plants collected from fields in five cucurbit-growing areas in four Australian states were tested for the presence of ZYMV. When 42 complete coat protein (CP) nucleotide (nt) sequences from the new ZYMV isolates obtained were compared to those of 101 complete CP nt sequences from five other continents, phylogenetic analysis of the 143 ZYMV sequences revealed three distinct groups (A, B and C), with four subgroups in A (I-IV) and two in B (I-II). The new Australian sequences grouped according to collection location, fitting within A-I, A-II and B-II. The 16 new sequences from one isolated location in tropical northern Western Australia all grouped into subgroup B-II, which contained no other isolates. In contrast, the three sequences from the Northern Territory fitted into A-II with 94.6-99.0% nt identities with isolates from the United States, Iran, China and Japan. The 23 new sequences from the central west coast and two east coast locations all fitted into A-I, with 95.9-98.9% nt identities to sequences from Europe and Japan. These findings suggest that (i) there have been at least three separate ZYMV introductions into Australia and (ii) there are few changes to local isolate CP sequences following their establishment in remote growing areas. Isolates from A-I and B

  17. Magnetic resonance imaging of anterior cruciate ligament of the knee: a comparison of four sequences

    International Nuclear Information System (INIS)

    Casillas, C.; Marti-Bonmati, L.; Molla, E.; Ferrer, P.; Dosda, R.

    1999-01-01

    To compare the diagnostic efficacy of the four magnetic resonance imaging (MRI) sequences that compose the standard protocol for the study of the knee in our center when employed in the examination of anterior cruciate ligament (ACL). A prospective study was carried out based on MRI findings in the knees of 326 consecutive patients. Sagittal [proton density (PD w eighted turbo-spin-echo and T2*-weighted gradient echo], coronal (PD-weighted turbo-spin-echo with fat suppression) and transverse (T2*-weighted gradient echo with magnetization transfer) images were evaluated. Each sequence was analyzed independently by two radiologists, while another two assessed all the sequences together with the clinical findings. Four categories were established: normal ACL, partially torn, completely torn and synovialized. The sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) with respect to the definitive diagnosis were calculated for each sequence. The statistical analysis of the findings for each category was done using the chi-squared test and the Kappa test was employed to assess the degree of agreement. According to the final diagnosis, 263 ACL were normal, 29 were partially torn, 33 were completely torn and there was 1 case of synovialization associated with a completely torn ACL. The relationship between the analysis of the ACL according to each sequence and the definitive diagnosis was very significant (p<0.001) and the agreement was excellent. All the sequences presented similar levels of diagnostic precision. The coronal sequence had least number of diagnostic errors (2.1%). The combinations of imaging techniques that resulted in the lowest error rate with respect to the definitive diagnosis were coronal PD-weighted turbo-spin-echo with fat suppression and sagittal PD-weighted turbo-spin-echo. Coronal images are highly precise in the evaluation of ACL. Sagittal sequences are the most valid for diagnosis of torn ACL. Transverse

  18. Streaming support for data intensive cloud-based sequence analysis.

    Science.gov (United States)

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  19. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Shadi A. Issa

    2013-01-01

    Full Text Available Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  20. Next-generation sequence analysis of cancer xenograft models.

    Directory of Open Access Journals (Sweden)

    Fernando J Rossello

    Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

  1. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  2. Extended -Regular Sequence for Automated Analysis of Microarray Images

    Directory of Open Access Journals (Sweden)

    Jin Hee-Jeong

    2006-01-01

    Full Text Available Microarray study enables us to obtain hundreds of thousands of expressions of genes or genotypes at once, and it is an indispensable technology for genome research. The first step is the analysis of scanned microarray images. This is the most important procedure for obtaining biologically reliable data. Currently most microarray image processing systems require burdensome manual block/spot indexing work. Since the amount of experimental data is increasing very quickly, automated microarray image analysis software becomes important. In this paper, we propose two automated methods for analyzing microarray images. First, we propose the extended -regular sequence to index blocks and spots, which enables a novel automatic gridding procedure. Second, we provide a methodology, hierarchical metagrid alignment, to allow reliable and efficient batch processing for a set of microarray images. Experimental results show that the proposed methods are more reliable and convenient than the commercial tools.

  3. Sequence Quality Analysis Tool for HIV Type 1 Protease and Reverse Transcriptase

    OpenAIRE

    DeLong, Allison K.; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W.; Kantor, Rami

    2012-01-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802...

  4. Comparison of double-locus sequence typing (DLST) and multilocus sequence typing (MLST) for the investigation of Pseudomonas aeruginosa populations.

    Science.gov (United States)

    Cholley, Pascal; Stojanov, Milos; Hocquet, Didier; Thouverez, Michelle; Bertrand, Xavier; Blanc, Dominique S

    2015-08-01

    Reliable molecular typing methods are necessary to investigate the epidemiology of bacterial pathogens. Reference methods such as multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE) are costly and time consuming. Here, we compared our newly developed double-locus sequence typing (DLST) method for Pseudomonas aeruginosa to MLST and PFGE on a collection of 281 isolates. DLST was as discriminatory as MLST and was able to recognize "high-risk" epidemic clones. Both methods were highly congruent. Not surprisingly, a higher discriminatory power was observed with PFGE. In conclusion, being a simple method (single-strand sequencing of only 2 loci), DLST is valuable as a first-line typing tool for epidemiological investigations of P. aeruginosa. Coupled to a more discriminant method like PFGE or whole genome sequencing, it might represent an efficient typing strategy to investigate or prevent outbreaks. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Apophysomyces variabilis: draft genome sequence and comparison of predictive virulence determinants with other medically important Mucorales.

    Science.gov (United States)

    Prakash, Hariprasath; Rudramurthy, Shivaprakash Mandya; Gandham, Prasad S; Ghosh, Anup Kumar; Kumar, Milner M; Badapanda, Chandan; Chakrabarti, Arunaloke

    2017-09-18

    Apophysomyces species are prevalent in tropical countries and A. variabilis is the second most frequent agent causing mucormycosis in India. Among Apophysomyces species, A. elegans, A. trapeziformis and A. variabilis are commonly incriminated in human infections. The genome sequences of A. elegans and A. trapeziformis are available in public database, but not A. variabilis. We, therefore, performed the whole genome sequence of A. variabilis to explore its genomic structure and possible genes determining the virulence of the organism. The whole genome of A. variabilis NCCPF 102052 was sequenced and the genomic structure of A. variabilis was compared with already available genome structures of A. elegans, A. trapeziformis and other medically important Mucorales. The total size of genome assembly of A. variabilis was 39.38 Mb with 12,764 protein-coding genes. The transposable elements (TEs) were low in Apophysomyces genome and the retrotransposon Ty3-gypsy was the common TE. Phylogenetically, Apophysomyces species were grouped closely with Phycomyces blakesleeanus. OrthoMCL analysis revealed 3025 orthologues proteins, which were common in those three pathogenic Apophysomyces species. Expansion of multiple gene families/duplication was observed in Apophysomyces genomes. Approximately 6% of Apophysomyces genes were predicted to be associated with virulence on PHIbase analysis. The virulence determinants included the protein families of CotH proteins (invasins), proteases, iron utilisation pathways, siderophores and signal transduction pathways. Serine proteases were the major group of proteases found in all Apophysomyces genomes. The carbohydrate active enzymes (CAZymes) constitute the majority of the secretory proteins. The present study is the maiden attempt to sequence and analyze the genomic structure of A. variabilis. Together with available genome sequence of A. elegans and A. trapeziformis, the study helped to indicate the possible virulence determinants of

  6. Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism spe...cies Data detail Data name Cluster based on sequence comparison of homologous proteins of 95 organism specie...istory of This Database Site Policy | Contact Us Cluster based on sequence compariso

  7. cDNA cloning and nucleotide sequence comparison of Chinese hamster metallothionein I and II mRNAs

    Energy Technology Data Exchange (ETDEWEB)

    Griffith, B B; Walters, R A; Enger, M D; Hildebrand, C E; Griffith, J K

    1983-01-01

    Polyadenylated RNA was extracted from a cadmium resistant Chinese hamster (CHO) cell line, enriched for metal-induced, abundant RNA sequences and cloned as double-stranded cDNA in the plasmid pBR322. Two cDNA clones, pCHMT1 and pCHMT2, encoding two Chinese hamster isometallothioneins were identified, and the nucleotide sequence of each insert was determined. The two Chinese hamster metallothioneins show nucleotide sequence homologies of 80% in the protein coding region and approximately 35% in both the 5' and 3' untranslated regions. Interestingly, an 8 nucleotide sequence (TGTAAATA) has been conserved in sequence and position in the 3' untranslated regions of each metallothionein mRNA sequenced thus far. Estimated nucleotide substitution rates derived from interspecies comparisons were used to calculate a metallothionein gene duplication time of 45 to 120 million years ago. 39 references, 1 figure, 1 table.

  8. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley.

    Directory of Open Access Journals (Sweden)

    Martin Mascher

    sequencing technologies, analysis tools and genomic resources develop.

  9. Analysis and prediction of stacking sequences in intercalated lamellar vanadium phosphates

    Energy Technology Data Exchange (ETDEWEB)

    Gautier, Romain [Institut des Sciences Chimiques de Rennes, UMR 6226 CNRS - Ecole Nationale Superieure de Chimie de Rennes (France); Centre Nationale de la Recherche Scientifique (CNRS), Institut des Materiaux Jean Rouxel (IMN), Universite de Nantes (France); Fourre, Yoann; Furet, Eric; Gautier, Regis; Le Fur, Eric [Institut des Sciences Chimiques de Rennes, UMR 6226 CNRS - Ecole Nationale Superieure de Chimie de Rennes (France)

    2015-04-15

    An approach is presented that enables the analysis and prediction of stacking sequences in intercalated lamellar vanadium phosphates. A comparison of previously reported vanadium phosphates reveals two modes of intercalation: (i) 3d transition metal ions intercalated between VOPO{sub 4} layers and (ii) alkali/alkaline earth metal ions between VOPO{sub 4}.H{sub 2}O layers. Both intercalations were investigated using DFT calculations in order to understand the relative shifts of the vanadium phosphate layers. These calculations in addition to an analysis of the stacking sequences in previously reported materials enable the prediction of the crystal structures of M{sub x}(VOPO{sub 4}).yH{sub 2}O (M = Cs{sup +}, Cd{sup 2+} and Sn{sup 2+}). Experimental realization and structural determination of Cd(VOPO{sub 4}){sub 2}.4H{sub 2}O by single-crystal X-ray diffraction confirmed the predicted stacking sequences. (Copyright copyright 2015 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim)

  10. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus

  11. Genome-Wide Analysis of Simple Sequence Repeats in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2017-06-01

    Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.

  12. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison.

    Science.gov (United States)

    Martins, W S; Del Cuvillo, J B; Useche, F J; Theobald, K B; Gao, G R

    2001-01-01

    This paper discusses the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general-purpose parallel computing platform based on a fine-grain event-driven multithreaded program execution model. Fine-grain multithreading permits efficient parallelism exploitation in this application both by taking advantage of asynchronous point-to-point synchronizations and communication with low overheads and by effectively tolerating latency through the overlapping of computation and communication. We have implemented our scheme on EARTH, a fine-grain event-driven multithreaded execution and architecture model which has been ported to a number of parallel machines with off-the-shelf processors. Our experimental results show that the dynamic programming algorithm can be efficiently implemented on EARTH systems with high performance (e.g., speedup of 90 on 120 nodes), good programmability and reasonable cost.

  13. Comparison of pause predictions of two sequence-dependent transcription models

    International Nuclear Information System (INIS)

    Bai, Lu; Wang, Michelle D

    2010-01-01

    Two recent theoretical models, Bai et al (2004, 2007) and Tadigotla et al (2006), formulated thermodynamic explanations of sequence-dependent transcription pausing by RNA polymerase (RNAP). The two models differ in some basic assumptions and therefore make different yet overlapping predictions for pause locations, and different predictions on pause kinetics and mechanisms. Here we present a comprehensive comparison of the two models. We show that while they have comparable predictive power of pause locations at low NTP concentrations, the Bai et al model is more accurate than Tadigotla et al at higher NTP concentrations. The pausing kinetics predicted by Bai et al is also consistent with time-course transcription reactions, while Tadigotla et al is unsuited for this type of kinetic prediction. More importantly, the two models in general predict different pausing mechanisms even for the same pausing sites, and the Bai et al model provides an explanation more consistent with recent single molecule observations

  14. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Directory of Open Access Journals (Sweden)

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  15. Abdominal MR imaging using a HASTE sequence : image comparison on the different echo times

    International Nuclear Information System (INIS)

    Park, Kwang Bo; Lee, Moon Gyu; Lim, Tae Hwan; Jeong, Yoong Ki; Ha, Hyun Kwon; Kim, Pyo Nyun; Auh, Yong Ho

    1999-01-01

    To determine the optimal parameters of abdominal HASTE imaging by means of a comparison of intermediate and long TE (echo time). We evaluated 30 consecutive patients who had undergone liver MR during a three-month period. Twelve patients were diagnosed as normal, four as having liver cirrhosis, and 14 were found to be suffering form hepatic hemangioma. On the basis of measured signal intensity of the liver, spleen, pancreas and gallbladder, and of fat, muscle, hemangioma, and background, we calculated the ratios of signal to noise (S/N), signal difference to noise (SD/N), and signal intensity (SI). Image quality was compared using these three ratios, and using two HASTE sequences with TEs of 90 msec and 134 msec, images were qualitatively evaluated. S/N ratio of the liver was higher when TE was 90 msec(p<.05), though S/N, SD/N and SI rations of the spleen, gallbladder, and pancreas-and of hemangiom-were higher when TE was 134 msec (p<.05). However, in muscle, all these three ratios were higher at a TE of 90 msec. SD/N ratio and SI of fat were higher at a TE of 134 msec. Overall image quality was better at a TE of 134 msec than at one of 90msec. A HASTE sequence with a TE of 134msec showed greater tissue contrast and stronger T2-weighted images than one with a TE of 90msec

  16. Fast comparison of IS radar code sequences for lag profile inversion

    Directory of Open Access Journals (Sweden)

    M. S. Lehtinen

    2008-08-01

    Full Text Available A fast method for theoretically comparing the posteriori variances produced by different phase code sequences in incoherent scatter radar (ISR experiments is introduced. Alternating codes of types 1 and 2 are known to be optimal for selected range resolutions, but the code sets are inconveniently long for many purposes like ground clutter estimation and in cases where coherent echoes from lower ionospheric layers are to be analyzed in addition to standard F-layer spectra.

    The method is used in practice for searching binary code quads that have estimation accuracy almost equal to that of much longer alternating code sets. Though the code sequences can consist of as few as four different transmission envelopes, the lag profile estimation variances are near to the theoretical minimum. Thus the short code sequence is equally good as a full cycle of alternating codes with the same pulse length and bit length. The short code groups cannot be directly decoded, but the decoding is done in connection with more computationally expensive lag profile inversion in data analysis.

    The actual code searches as well as the analysis and real data results from the found short code searches are explained in other papers sent to the same issue of this journal. We also discuss interesting subtle differences found between the different alternating codes by this method. We assume that thermal noise dominates the incoherent scatter signal.

  17. A comparison of rumen microbial profiles in dairy cows as retrieved by 454 Roche and Ion Torrent (PGM sequencing platforms

    Directory of Open Access Journals (Sweden)

    Nagaraju Indugu

    2016-02-01

    Full Text Available Next generation sequencing (NGS technology is a widely accepted tool used by microbial ecologists to explore complex microbial communities in different ecosystems. As new NGS platforms continue to become available, it becomes imperative to compare data obtained from different platforms and analyze their effect on microbial community structure. In the present study, we compared sequencing data from both the 454 and Ion Torrent (PGM platforms on the same DNA samples obtained from the rumen of dairy cows during their transition period. Despite the substantial difference in the number of reads, error rate and length of reads among both platforms, we identified similar community composition between the two data sets. Procrustes analysis revealed similar correlations (M2 = 0.319; P = 0.001 in the microbial community composition between the two platforms. Both platforms revealed the abundance of the same bacterial phyla which were Bacteroidetes and Firmicutes; however, PGM recovered an additional four phyla. Comparisons made at the genus level by each platforms revealed differences in only a few genera such as Prevotella, Ruminococcus, Succiniclasticum and Treponema (p < 0.05; chi square test. Collectively, we conclude that the output generated from PGM and 454 yielded concurrent results, provided stringent bioinformatics pipelines are employed.

  18. Phylogeny of the genus Haemophilus as determined by comparison of partial infB sequences

    DEFF Research Database (Denmark)

    Hedegaard, J; Okkels, H; Bruun, B

    2001-01-01

    A 453 bp fragment of infB, the gene encoding translation initiation factor 2, was sequenced and compared from 66 clinical isolates and type strains of Haemophilus species and related bacteria. Analysis of the partial infB sequences obtained suggested that the human isolates dependent on X and V...... factor, H. influenzae, H. haemolyticus, H. aegyptius and some cryptic genospecies of H. influenzae, were closely related to each other. H. parainfluenzae constituted a heterogeneous group within the boundaries of the genus, whereas H. aphrophilus/paraphrophilus and Actinobacillus actinomycetemcomitans...... were only remotely related to the type species of the genus Haemophilus H. parahaemolyticus and H. paraphrohaemolyticus took up an intermediary position and may not belong in the genus Haemophilus sensu stricto. Ambiguous results were obtained with seven isolates tentatively identified as H. segnis...

  19. Complete genome sequence analysis of novel human bocavirus reveals genetic recombination between human bocavirus 2 and human bocavirus 4.

    Science.gov (United States)

    Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

    2013-07-01

    Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.

  20. Combining real-time PCR and next-generation DNA sequencing to provide quantitative comparisons of fungal aerosol populations

    Science.gov (United States)

    Dannemiller, Karen C.; Lang-Yona, Naama; Yamamoto, Naomichi; Rudich, Yinon; Peccia, Jordan

    2014-02-01

    We examined fungal communities associated with the PM10 mass of Rehovot, Israel outdoor air samples collected in the spring and fall seasons. Fungal communities were described by 454 pyrosequencing of the internal transcribed spacer (ITS) region of the fungal ribosomal RNA encoding gene. To allow for a more quantitative comparison of fungal exposure in humans, the relative abundance values of specific taxa were transformed to absolute concentrations through multiplying these values by the sample's total fungal spore concentration (derived from universal fungal qPCR). Next, the sequencing-based absolute concentrations for Alternaria alternata, Cladosporium cladosporioides, Epicoccum nigrum, and Penicillium/Aspergillus spp. were compared to taxon-specific qPCR concentrations for A. alternata, C. cladosporioides, E. nigrum, and Penicillium/Aspergillus spp. derived from the same spring and fall aerosol samples. Results of these comparisons showed that the absolute concentration values generated from pyrosequencing were strongly associated with the concentration values derived from taxon-specific qPCR (for all four species, p 0.70). The correlation coefficients were greater for species present in higher concentrations. Our microbial aerosol population analyses demonstrated that fungal diversity (number of fungal operational taxonomic units) was higher in the spring compared to the fall (p = 0.02), and principal coordinate analysis showed distinct seasonal differences in taxa distribution (ANOSIM p = 0.004). Among genera containing allergenic and/or pathogenic species, the absolute concentrations of Alternaria, Aspergillus, Fusarium, and Cladosporium were greater in the fall, while Cryptococcus, Penicillium, and Ulocladium concentrations were greater in the spring. The transformation of pyrosequencing fungal population relative abundance data to absolute concentrations can improve next-generation DNA sequencing-based quantitative aerosol exposure assessment.

  1. Nonlinear analysis of sequence repeats of multi-domain proteins

    Energy Technology Data Exchange (ETDEWEB)

    Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: lmf_bill@sina.com

    2007-11-15

    Many multi-domain proteins have repetitive three-dimensional structures but nearly-random amino acid sequences. In the present paper, by using a modified recurrence plot proposed by us previously, we show that these amino acid sequences have hidden repetitions in fact. These results indicate that the repetitive domain structures are encoded by the repetitive sequences. This also gives a method to detect the repetitive domain structures directly from amino acid sequences.

  2. Human factors review for Severe Accident Sequence Analysis (SASA)

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure

  3. Sequence analysis of cereal sucrose synthase genes and isolation ...

    African Journals Online (AJOL)

    SERVER

    2007-10-18

    Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.

  4. Chimera: construction of chimeric sequences for phylogenetic analysis

    NARCIS (Netherlands)

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  5. Meniscal tear evaluation. Comparison of a conventional spin-echo proton density sequence with a fast spin-echo sequence utilizing a 512x358 matrix size

    International Nuclear Information System (INIS)

    Hopper, M.A.; Robinson, P.; Grainger, A.J.

    2011-01-01

    Aim: To determine the sensitivities, specificities, and receiver-operating characteristics (ROCs) for sagittal conventional spin-echo proton density (SE-PD) and fast spin-echo proton density (FSE-PD) sequences in the diagnosis of meniscal tears when compared to arthroscopic findings utilizing increased FSE matrix acquisition size. Method and materials: Magnetic resonance imaging (MRI) studies of 97 knees (194 menisci) were independently and prospectively interpreted by two experienced musculoskeletal radiologists over four separate readings at least 3 weeks apart. Readings 1 and 2 included images in all three planes in accordance with the standard protocol with either a SE or FSE sagittal PD, at readings 3 and 4 just the SE or FSE sagittal PD sequences were reported. The FSE sequence was acquired with an increased matrix size, compared to the SE sequence, to provide increased resolution. Menisci were graded for the presence of a tear and statistical analysis to calculate sensitivity and specificity was performed comparing to arthroscopy as the reference standard. ROC analysis for the diagnosis of meniscal tears on the SE and FSE sagittal sequences was also evaluated. Reader concordance for the SE and FSE sequences was calculated. Results: Sixty-seven tears were noted at arthroscopy; 60 were detected on SE and 56 on FSE. The sensitivity and specificity for SE was 90 and 90%, and for FSE was 84 and 94%, respectively, with no significant difference. ROC analysis showed no significant difference between the two sequences and kappa values demonstrated a higher level of reader agreement for the FSE than for the SE reading. Conclusion: Use of a FSE sagittal PD sequence with an increased matrix size provides comparable performance to conventional SE sagittal PD when evaluating meniscal disease with a modern system. The present study indicates an increased level of concordance between readers for the FSE sagittal sequence compared to the conventional SE.

  6. Meniscal tear evaluation. Comparison of a conventional spin-echo proton density sequence with a fast spin-echo sequence utilizing a 512x358 matrix size

    Energy Technology Data Exchange (ETDEWEB)

    Hopper, M.A.; Robinson, P. [Leeds Teaching Hospitals NHS Trust, Leeds (United Kingdom); Grainger, A.J., E-mail: andrew.grainger@leedsth.nhs.u [Leeds Teaching Hospitals NHS Trust, Leeds (United Kingdom)

    2011-04-15

    Aim: To determine the sensitivities, specificities, and receiver-operating characteristics (ROCs) for sagittal conventional spin-echo proton density (SE-PD) and fast spin-echo proton density (FSE-PD) sequences in the diagnosis of meniscal tears when compared to arthroscopic findings utilizing increased FSE matrix acquisition size. Method and materials: Magnetic resonance imaging (MRI) studies of 97 knees (194 menisci) were independently and prospectively interpreted by two experienced musculoskeletal radiologists over four separate readings at least 3 weeks apart. Readings 1 and 2 included images in all three planes in accordance with the standard protocol with either a SE or FSE sagittal PD, at readings 3 and 4 just the SE or FSE sagittal PD sequences were reported. The FSE sequence was acquired with an increased matrix size, compared to the SE sequence, to provide increased resolution. Menisci were graded for the presence of a tear and statistical analysis to calculate sensitivity and specificity was performed comparing to arthroscopy as the reference standard. ROC analysis for the diagnosis of meniscal tears on the SE and FSE sagittal sequences was also evaluated. Reader concordance for the SE and FSE sequences was calculated. Results: Sixty-seven tears were noted at arthroscopy; 60 were detected on SE and 56 on FSE. The sensitivity and specificity for SE was 90 and 90%, and for FSE was 84 and 94%, respectively, with no significant difference. ROC analysis showed no significant difference between the two sequences and kappa values demonstrated a higher level of reader agreement for the FSE than for the SE reading. Conclusion: Use of a FSE sagittal PD sequence with an increased matrix size provides comparable performance to conventional SE sagittal PD when evaluating meniscal disease with a modern system. The present study indicates an increased level of concordance between readers for the FSE sagittal sequence compared to the conventional SE.

  7. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Energy Technology Data Exchange (ETDEWEB)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  8. Accident Sequence Evaluation Program: Human reliability analysis procedure

    International Nuclear Information System (INIS)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs

  9. A Quantitative Accident Sequence Analysis for a VHTR

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jintae; Lee, Joeun; Jae, Moosung [Hanyang University, Seoul (Korea, Republic of)

    2016-05-15

    In Korea, the basic design features of VHTR are currently discussed in the various design concepts. Probabilistic risk assessment (PRA) offers a logical and structured method to assess risks of a large and complex engineered system, such as a nuclear power plant. It will be introduced at an early stage in the design, and will be upgraded at various design and licensing stages as the design matures and the design details are defined. Risk insights to be developed from the PRA are viewed as essential to developing a design that is optimized in meeting safety objectives and in interpreting the applicability of the existing demands to the safety design approach of the VHTR. In this study, initiating events which may occur in VHTRs were selected through MLD method. The initiating events were then grouped into four categories for the accident sequence analysis. Initiating events frequency and safety systems failure rate were calculated by using reliability data obtained from the available sources and fault tree analysis. After quantification, uncertainty analysis was conducted. The SR and LR frequency are calculated respectively 7.52E- 10/RY and 7.91E-16/RY, which are relatively less than the core damage frequency of LWRs.

  10. Chloroplast Genome Sequence of pigeonpea (Cajanus cajan (L. Millspaugh and Cajanus scarabaeoides: Genome organization and Comparison with other legumes

    Directory of Open Access Journals (Sweden)

    Tanvi Kaila

    2016-12-01

    Full Text Available Pigeonpea (Cajanus cajan (L. Millspaugh, a diploid (2n = 22 legume crop with a genome size of 852 Mbp, serves as an important source of human dietary protein especially in South East Asian and African regions. In this study, the draft chloroplast genomes of Cajanus cajan and Cajanus scarabaeoides were sequenced. Cajanus scarabaeoides is an important species of the Cajanus gene pool and has also been used for developing promising CMS system by different groups. A male sterile genotype harbouring the Cajanus scarabaeoides cytoplasm was used for sequencing the plastid genome. The cp genome of Cajanus cajan is 152,242bp long, having a quadripartite structure with LSC of 83,455 bp and SSC of 17,871 bp separated by IRs of 25,398 bp. Similarly, the cp genome of Cajanus scarabaeoides is 152,201bp long, having a quadripartite structure in which IRs of 25,402 bp length separates 83,423 bp of LSC and 17,854 bp of SSC. The pigeonpea cp genome contains 116 unique genes, including 30 tRNA, 4 rRNA, 78 predicted protein coding genes and 5 pseudogenes. A 50kb inversion was observed in the LSC region of pigeonpea cp genome, consistent with other legumes. Comparison of cp genome with other legumes revealed the contraction of IR boundaries due to the absence of rps19 gene in the IR region. Chloroplast SSRs were mined and a total of 280 and 292 cpSSRs were identified in Cajanus scarabaeoides and Cajanus cajan respectively. RNA editing was observed at 37 sites in both Cajanus scarabaeoides and Cajanus cajan, with maximum occurrence in the ndh genes. The pigeonpea cp genome sequence would be beneficial in providing informative molecular markers which can be utilized for genetic diversity analysis and aid in understanding the plant systematics studies among major grain legumes.

  11. Comparing methods of classifying life courses: Sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Elzinga, C.H.; Liefbroer, Aart C.; Han, Sapphire

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  12. Comparing methods of classifying life courses: sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Han, Y.; Liefbroer, A.C.; Elzinga, C.

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  13. Genetic divergence of Asiatic Bdellocephala (Turbellaria, Tricladida, Paludicola) as revealed by partial 18S rRNA gene sequence comparisons.

    Science.gov (United States)

    Kuznedelov, K D; Timoshkin, O A; Goldman, E

    1997-01-01

    Polymerase chain reaction (PCR) and direct sequencing of small ribosomal RNA genes were used for analysis of genetic differences among Asiatic species of freshwater triclad genus Bdellocephala. Representatives of four species and four subspecies of this genus were used to establish homology between nucleotides in the 5'-end portion of small ribosomal RNA gene sequences. Within 552 nucleotide sites of aligned sequences compared, six variable base positions were discovered, dividing Bdellocephala into five different genotypes. Sequence data allow to distinguish two groups of these genotypes. One of them unites species from Kamchatka and Japan, another one unites Baikalian taxa. Agreement between available morphological, cytological and sequence data is discussed.

  14. Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads.

    Directory of Open Access Journals (Sweden)

    Jovan Rebolledo-Mendez

    Full Text Available The reference assembly for the domestic horse, EquCab2, published in 2009, was built using approximately 30 million Sanger reads from a Thoroughbred mare named Twilight. Contiguity in the assembly was facilitated using nearly 315 thousand BAC end sequences from Twilight's half brother Bravo. Since then, it has served as the foundation for many genome-wide analyses that include not only the modern horse, but ancient horses and other equid species as well. As data mapped to this reference has accumulated, consistent variation between mapped datasets and the reference, in terms of regions with no read coverage, single nucleotide variants, and small insertions/deletions have become apparent. In many cases, it is not clear whether these differences are the result of true sequence variation between the research subjects' and Twilight's genome or due to errors in the reference. EquCab2 is regarded as "The Twilight Assembly." The objective of this study was to identify inconsistencies between the EquCab2 assembly and the source Twilight Sanger data used to build it. To that end, the original Sanger and BAC end reads have been mapped back to this equine reference and assessed with the addition of approximately 40X coverage of new Illumina Paired-End sequence data. The resulting mapped datasets identify those regions with low Sanger read coverage, as well as variation in genomic content that is not consistent with either the original Twilight Sanger data or the new genomic sequence data generated from Twilight on the Illumina platform. As the haploid EquCab2 reference assembly was created using Sanger reads derived largely from a single individual, the vast majority of variation detected in a mapped dataset comprised of those same Sanger reads should be heterozygous. In contrast, homozygous variations would represent either errors in the reference or contributions from Bravo's BAC end sequences. Our analysis identifies 720,843 homozygous discrepancies

  15. Complete Chloroplast Genome Sequences and Comparative Analysis of Chenopodium quinoa and C. album.

    Science.gov (United States)

    Hong, Su-Young; Cheon, Kyeong-Sik; Yoo, Ki-Oug; Lee, Hyun-Oh; Cho, Kwang-Soo; Suh, Jong-Taek; Kim, Su-Jeong; Nam, Jeong-Hwan; Sohn, Hwang-Bae; Kim, Yul-Ho

    2017-01-01

    The Chenopodium genus comprises ~150 species, including Chenopodium quinoa and Chenopodium album , two important crops with high nutritional value. To elucidate the phylogenetic relationship between the two species, the complete chloroplast (cp) genomes of these species were obtained by next generation sequencing. We performed comparative analysis of the sequences and, using InDel markers, inferred phylogeny and genetic diversity of the Chenopodium genus. The cp genome is 152,099 bp ( C. quinoa ) and 152,167 bp ( C. album ) long. In total, 119 genes (78 protein-coding, 37 tRNA, and 4 rRNA) were identified. We found 14 ( C. quinoa ) and 15 ( C. album ) tandem repeats (TRs); 14 TRs were present in both species and C. album and C. quinoa each had one species-specific TR. The trnI-GAU intron sequences contained one ( C. quinoa ) or two ( C. album ) copies of TRs (66 bp); the InDel marker was designed based on the copy number variation in TRs. Using the InDel markers, we detected this variation in the TR copy number in four species, Chenopodium hybridum, Chenopodium pumilio, Chenopodium ficifolium , and Chenopodium koraiense , but not in Chenopodium glaucum . A comparison of coding and non-coding regions between C. quinoa and C. album revealed divergent sites. Nucleotide diversity >0.025 was found in 17 regions-14 were located in the large single copy region (LSC), one in the inverted repeats, and two in the small single copy region (SSC). A phylogenetic analysis based on 59 protein-coding genes from 25 taxa resolved Chenopodioideae monophyletic and sister to Betoideae. The complete plastid genome sequences and molecular markers based on divergence hotspot regions in the two Chenopodium taxa will help to resolve the phylogenetic relationships of Chenopodium .

  16. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  17. Data Analysis of Seismic Sequence in Central Italy in 2016 using CTBTO- International Monitoring System

    Science.gov (United States)

    Mumladze, Tea; Wang, Haijun; Graham, Gerhard

    2017-04-01

    The seismic network that forms the International Monitoring System (IMS) of the Comprehensive Nuclear-test-ban Treaty Organization (CTBTO) will ultimately consist of 170 seismic stations (50 primary and 120 auxiliary) in 76 countries around the world. The Network is still under the development, but currently more than 80% of the network is in operation. The objective of seismic monitoring is to detect and locate underground nuclear explosions. However, the data from the IMS also can be widely used for scientific and civil purposes. In this study we present the results of data analysis of the seismic sequence in 2016 in Central Italy. Several hundred earthquakes were recorded for this sequence by the seismic stations of the IMS. All events were accurately located the analysts of the International Data Centre (IDC) of the CTBTO. In this study we will present the epicentral and magnitude distribution, station recordings and teleseismic phases as obtained from the Reviewed Event Bulletin (REB). We will also present a comparison of the database of the IDC with the databases of the European-Mediterranean Seismological Centre (EMSC) and U.S. Geological Survey (USGS). Present work shows that IMS data can be used for earthquake sequence analyses and can play an important role in seismological research.

  18. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    Science.gov (United States)

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant.

  19. Frame sequences analysis technique of linear objects movement

    Science.gov (United States)

    Oshchepkova, V. Y.; Berg, I. A.; Shchepkin, D. V.; Kopylova, G. V.

    2017-12-01

    Obtaining data by noninvasive methods are often needed in many fields of science and engineering. This is achieved through video recording in various frame rate and light spectra. In doing so quantitative analysis of movement of the objects being studied becomes an important component of the research. This work discusses analysis of motion of linear objects on the two-dimensional plane. The complexity of this problem increases when the frame contains numerous objects whose images may overlap. This study uses a sequence containing 30 frames at the resolution of 62 × 62 pixels and frame rate of 2 Hz. It was required to determine the average velocity of objects motion. This velocity was found as an average velocity for 8-12 objects with the error of 15%. After processing dependencies of the average velocity vs. control parameters were found. The processing was performed in the software environment GMimPro with the subsequent approximation of the data obtained using the Hill equation.

  20. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    DEFF Research Database (Denmark)

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence re...... with highest similarity to DNA repair protein from Campylobacter jejuni (25% aa). Orf34 showed similarity to sigma factors with highest similarity (28% aa) to the sporulation specific Sigma factor, Sigma 28(K) from Bacillus thuringiensis....

  1. Partial Sequence Analysis of Merozoite Surface Proteine-3α Gene in Plasmodium vivax Isolates from Malarious Areas of Iran

    Directory of Open Access Journals (Sweden)

    H Mirhendi

    2008-12-01

    Full Text Available Background: Approximately 85-90% of malaria infections in Iran are attributed to Plasmodium vivax, while little is known about the genetic of the parasite and its strain types in this region. This study was designed and performed for describing genetic characteristics of Plasmodium vivax population of Iran based on the merozoite surface protein-3α gene sequence. Methods: Through a descriptive study we analyzed partial P. vivax merozoite surface protein-3α gene sequences from 17 clinical P. vivax isolates collected from malarious areas of Iran. Genomic DNA was extracted by Q1Aamp® DNA blood mini kit, amplified through nested PCR for a partial nucleotide sequence of PvMSP-3 gene in P. vivax. PCR-amplified products were sequenced with an ABI Prism Perkin-Elmer 310 sequencer machine and the data were analyzed with clustal W software. Results: Analysis of PvMSP-3 gene sequences demonstrated extensive polymorphisms, but the sequence identity between isolates of same types was relatively high. We identified specific insertions and deletions for the types A, B and C variants of P. vivax in our isolates. In phylogenetic comparison of geographically separated isolates, there was not a significant geo­graphical branching of the parasite populations. Conclusion: The highly polymorphic nature of isolates suggests that more investigations of the PvMSP-3 gene are needed to explore its vaccine potential.

  2. Delineation of the genus Actinobacillus by comparison of partial infB sequences

    DEFF Research Database (Denmark)

    Nørskov-Lauritsen, Niels; Christensen, H; Okkels, H.

    2004-01-01

    A 426 bp fragment of infB, a housekeeping gene that encodes translation initiation factor 2, was sequenced from 59 clinical isolates and type strains of Actinobacillus species and sequences were compared. Partial sequences of 16S rRNA genes were also obtained. By comparing infB sequences, Actinob...

  3. Comparison of spin echo T1-weighted sequences versus fast spin-echo proton density-weighted sequences for evaluation of meniscal tears at 1.5 T

    International Nuclear Information System (INIS)

    Wolff, Andrew B.; Pesce, Lorenzo L.; Wu, Jim S.; Smart, L.R.; Medvecky, Michael J.; Haims, Andrew H.

    2009-01-01

    At our institution, fast spin-echo (FSE) proton density (PD) imaging is used to evaluate articular cartilage, while conventional spin-echo (CSE) T1-weighted sequences have been traditionally used to characterize meniscal pathology. We sought to determine if FSE PD-weighted sequences are equivalent to CSE T1-weighted sequences in the detection of meniscal tears, obviating the need to perform both sequences. We retrospectively reviewed the records of knee arthroscopies performed by two arthroscopy-focused surgeons from an academic medical center over a 2-year period. The preoperative MRI images were interpreted independently by two fellowship-trained musculoskeletal radiologists who graded the sagittal CSE T1 and FSE PD sequences at different sittings with grades 1-5, where 1 = normal meniscus, 2 = probable normal meniscus, 3 indeterminate, 4 = probable torn meniscus, and 5 = torn meniscus. Each meniscus was divided into an anterior and posterior half, and these halves were graded separately. Operative findings provided the gold standard. Receiver operating characteristic (ROC) analysis was performed to compare the two sequences. There were 131 tears in 504 meniscal halves. Using ROC analysis, the reader 1 area under curve for FSE PD was significantly better than CSE T1 (0.939 vs. 0.902, >95% confidence). For reader 2, the difference met good criteria for statistical non-inferiority but not superiority (0.913 for FSE PD and 0.908 for CSE T1; >95% non-inferiority for difference at most of -0.027). FSE PD-weighted sequences, using our institutional protocol, are not inferior to CSE T1-weighted sequences for the detection of meniscal tears and may be superior. (orig.)

  4. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  5. MR of normal pancreas : comparison of five pulse sequences and enhancing patterns on dynamic imaging

    International Nuclear Information System (INIS)

    Jang, Hyun Jung; Kim, Tae Kyoung; Hong, Sung Hwan; Han, Joon Koo; Choi, Byung Ihn

    1997-01-01

    To compare T1-weighted FLASH and turbo spin echo (SE) T2-weighted sequences with conventional T1- and T2-weighted sequences in imaging normal pancreas and to describe the enhancing patterns on dynamic MR imging. Forty-four patients with presumed hepatic hemangiomas were studied at 1.0T or 1.5T by using conventional SE sequences (T1-weighted, T2-weighted, and heavily T2-weighted), turbo-SE T2-weighted sequences, and breath-hold T1-weighted FLASH sequences acquired before, immediately on, and at 1, 2, 3, and 5 or 10 minutes after injection of a bolus of gadopentetate dimeglumine. No patients had either a history or its clinical features of pancreatic disease. Images were quantitatively analyzed for signal-difference-to noise ratios (SD/Ns) between the pancreas and peripancreatic fat. Percentage enhancement of the pancreas was measured on each dynamic MR image. Conspicuity of the pancreatic border was qualitatively evaluated according to a consensus, reached by three radiologists. Turbo-SE T2-weighted images had a significantly higher SD/N ratio (p<0.001) and better conspicuity of the pancreatic border (p<0.001) than SE T2- and heavily T2-weighted images;T1-weighted SE images had a significantly higher SD/N ratio than T1-weighted FLASH images (p<0.001), but there was no significant difference between tham in qualitative analysis (p=0.346). Percentage enhancement immediately on and at 1, 2, 3, 5, and 10 minutes after administration of contrast material was 39.9%, 44.5%, 42.9%, 40.8%, 36.3%, 29.9%, respectively, with peak enhancement at 1 minute. In MR imaging of normal pancreas, turbo-SE T2-weighted imaging is superior to SE T2- and heavily T2- weighted imaging, and SE T1-weighted imaging is superior to T1-weighted FLASH imaging. On serial gadolinium-enhanced FLASH imaging, normal pancreas shows peak enhancement at 1 minute

  6. Very high resolution single pass HLA genotyping using amplicon sequencing on the 454 next generation DNA sequencers: Comparison with Sanger sequencing.

    Science.gov (United States)

    Yamamoto, F; Höglund, B; Fernandez-Vina, M; Tyan, D; Rastrou, M; Williams, T; Moonsamy, P; Goodridge, D; Anderson, M; Erlich, H A; Holcomb, C L

    2015-12-01

    Compared to Sanger sequencing, next-generation sequencing offers advantages for high resolution HLA genotyping including increased throughput, lower cost, and reduced genotype ambiguity. Here we describe an enhancement of the Roche 454 GS GType HLA genotyping assay to provide very high resolution (VHR) typing, by the addition of 8 primer pairs to the original 14, to genotype 11 HLA loci. These additional amplicons help resolve common and well-documented alleles and exclude commonly found null alleles in genotype ambiguity strings. Simplification of workflow to reduce the initial preparation effort using early pooling of amplicons or the Fluidigm Access Array™ is also described. Performance of the VHR assay was evaluated on 28 well characterized cell lines using Conexio Assign MPS software which uses genomic, rather than cDNA, reference sequence. Concordance was 98.4%; 1.6% had no genotype assignment. Of concordant calls, 53% were unambiguous. To further assess the assay, 59 clinical samples were genotyped and results compared to unambiguous allele assignments obtained by prior sequence-based typing supplemented with SSO and/or SSP. Concordance was 98.7% with 58.2% as unambiguous calls; 1.3% could not be assigned. Our results show that the amplicon-based VHR assay is robust and can replace current Sanger methodology. Together with software enhancements, it has the potential to provide even higher resolution HLA typing. Copyright © 2015. Published by Elsevier Inc.

  7. a Comparison of Morphological Taxonomy and Next Generation DNA Sequencing for the Assessment of Zooplankton Diversity

    Science.gov (United States)

    Harvey, J.; Fisher, J. L.; Johnson, S.; Morgan, S.; Peterson, W. T.; Satterthwaite, E. V.; Vrijenhoek, R. C.

    2016-02-01

    Our ability to accurately characterize the diversity of planktonic organisms is affected by both the methods we use to collect water samples and our approaches to assessing sample contents. Plankton nets collect organisms from high volumes of water, but integrate sample contents along the net's path. In contrast, plankton pumps collect water from discrete depths. Autonomous underwater vehicles (AUVs) can collect water samples with pinpoint accuracy from physical features such as upwelling fronts or biological features such as phytoplankton blooms, but sample volumes are necessarily much smaller than those possible with nets. Characterization of plankton diversity and abundances in water samples may also vary with the assessment method we apply. Morphological taxonomy provides visual identification and enumeration of organisms via microscopy, but is labor intensive. Next generation DNA sequencing (NGS) shows great promise for assessing plankton diversity in water samples but accurate assessment of relative abundances may not be possible in all cases. Comparison of morphological taxonomy to molecular approaches is necessary to identify areas of overlap and also areas of disagreement between these methods. We have compared morphological taxonomic assessments to mitochondrial COI and nuclear 28S ribosomal RNA NGS results for plankton net samples collected in Monterey bay, California. We have made a similar comparison for plankton pump samples, and have also applied our NGS methods to targeted, small volume water samples collected by an AUV. Our goal is to communicate current results and lessons learned regarding application of traditional taxonomy and novel molecular approaches to the study of plankton diversity in spatially and temporally variable, coastal marine environments.

  8. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  9. SeqAn An efficient, generic C++ library for sequence analysis

    Directory of Open Access Journals (Sweden)

    Rausch Tobias

    2008-01-01

    Full Text Available Abstract Background The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome 1 would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. Results To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use. Conclusion We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.

  10. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  11. Exome Sequence Analysis of 14 Families With High Myopia

    DEFF Research Database (Denmark)

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...

  12. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications...

  13. Accelerating next generation sequencing data analysis with system level optimizations.

    Science.gov (United States)

    Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid

    2017-08-22

    Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.

  14. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  15. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Science.gov (United States)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  16. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    Science.gov (United States)

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  17. Comparison of variable region 3 sequences of human immunodeficiency virus type 1 from infected children with the RNA and DNA sequences of the virus populations of their mothers.

    Science.gov (United States)

    Scarlatti, G; Leitner, T; Halapi, E; Wahlberg, J; Marchisio, P; Clerici-Schoeller, M A; Wigzell, H; Fenyö, E M; Albert, J; Uhlén, M

    1993-01-01

    We have compared the variable region 3 sequences from 10 human immunodeficiency virus type 1 (HIV-1)-infected infants to virus sequences from the corresponding mothers. The sequences were derived from DNA of uncultured peripheral blood mononuclear cells (PBMC), DNA of cultured PBMC, and RNA from serum collected at or shortly after delivery. The infected infants, in contrast to the mothers, harbored homogeneous virus populations. Comparison of sequences from the children and clones derived from DNA of the corresponding mothers showed that the transmitted virus represented either a minor or a major virus population of the mother. In contrast to an earlier study, we found no evidence of selection of minor virus variants during transmission. Furthermore, the transmitted virus variant did not show any characteristic molecular features. In some cases the transmitted virus was more related to the virus RNA population of the mother and in other cases it was more related to the virus DNA population. This suggests that either cell-free or cell-associated virus may be transmitted. These data will help AIDS researchers to understand the mechanism of transmission and to plan strategies for prevention of transmission. PMID:8446584

  18. [Comparison research on two-stage sequencing batch MBR and one-stage MBR].

    Science.gov (United States)

    Yuan, Xin-Yan; Shen, Heng-Gen; Sun, Lei; Wang, Lin; Li, Shi-Feng

    2011-01-01

    Aiming at resolving problems in MBR operation, like low nitrogen and phosphorous removal efficiency, severe membrane fouling and etc, comparison research on two-stage sequencing batch MBR (TSBMBR) and one-stage aerobic MBR has been done in this paper. The results indicated that TSBMBR owned advantages of SBR in removing nitrogen and phosphorous, which could make up the deficiency of traditional one-stage aerobic MBR in nitrogen and phosphorous removal. During steady operation period, effluent average NH4(+) -N, TN and TP concentration is 2.83, 12.20, 0.42 mg/L, which could reach domestic scenic environment use. From membrane fouling control point of view, TSBMBR has lower SMP in supernatant, specific trans-membrane flux deduction rate, membrane fouling resistant than one-stage aerobic MBR. The sedimentation and gel layer resistant of TSBMBR was only 6.5% and 33.12% of one-stage aerobic MBR. Besides high efficiency in removing nitrogen and phosphorous, TSBMBR could effectively reduce sedimentation and gel layer pollution on membrane surface. Comparing with one-stage MBR, TSBMBR could operate with higher trans-membrane flux, lower membrane fouling rate and better pollutants removal effects.

  19. The optimal design of stepped wedge trials with equal allocation to sequences and a comparison to other trial designs.

    Science.gov (United States)

    Thompson, Jennifer A; Fielding, Katherine; Hargreaves, James; Copas, Andrew

    2017-12-01

    Background/Aims We sought to optimise the design of stepped wedge trials with an equal allocation of clusters to sequences and explored sample size comparisons with alternative trial designs. Methods We developed a new expression for the design effect for a stepped wedge trial, assuming that observations are equally correlated within clusters and an equal number of observations in each period between sequences switching to the intervention. We minimised the design effect with respect to (1) the fraction of observations before the first and after the final sequence switches (the periods with all clusters in the control or intervention condition, respectively) and (2) the number of sequences. We compared the design effect of this optimised stepped wedge trial to the design effects of a parallel cluster-randomised trial, a cluster-randomised trial with baseline observations, and a hybrid trial design (a mixture of cluster-randomised trial and stepped wedge trial) with the same total cluster size for all designs. Results We found that a stepped wedge trial with an equal allocation to sequences is optimised by obtaining all observations after the first sequence switches and before the final sequence switches to the intervention; this means that the first sequence remains in the control condition and the last sequence remains in the intervention condition for the duration of the trial. With this design, the optimal number of sequences is [Formula: see text], where [Formula: see text] is the cluster-mean correlation, [Formula: see text] is the intracluster correlation coefficient, and m is the total cluster size. The optimal number of sequences is small when the intracluster correlation coefficient and cluster size are small and large when the intracluster correlation coefficient or cluster size is large. A cluster-randomised trial remains more efficient than the optimised stepped wedge trial when the intracluster correlation coefficient or cluster size is small. A

  20. Image sequence analysis in nuclear medicine: (1) Parametric imaging using statistical modelling

    International Nuclear Information System (INIS)

    Liehn, J.C.; Hannequin, P.; Valeyre, J.

    1989-01-01

    This is a review of parametric imaging methods on Nuclear Medicine. A Parametric Image is an image in which each pixel value is a function of the value of the same pixel of an image sequence. The Local Model Method is the fitting of each pixel time activity curve by a model which parameter values form the Parametric Images. The Global Model Method is the modelling of the changes between two images. It is applied to image comparison. For both methods, the different models, the identification criterion, the optimization methods and the statistical properties of the images are discussed. The analysis of one or more Parametric Images is performed using 1D or 2D histograms. The statistically significant Parametric Images, (Images of significant Variances, Amplitudes and Differences) are also proposed [fr

  1. Cardiorespiratory Kinetics Determined by Pseudo-Random Binary Sequences - Comparisons between Walking and Cycling.

    Science.gov (United States)

    Koschate, J; Drescher, U; Thieschäfer, L; Heine, O; Baum, K; Hoffmann, U

    2016-12-01

    This study aims to compare cardiorespiratory kinetics as a response to a standardised work rate protocol with pseudo-random binary sequences between cycling and walking in young healthy subjects. Muscular and pulmonary oxygen uptake (V̇O 2 ) kinetics as well as heart rate kinetics were expected to be similar for walking and cycling. Cardiac data and V̇O 2 of 23 healthy young subjects were measured in response to pseudo-random binary sequences. Kinetics were assessed applying time series analysis. Higher maxima of cross-correlation functions between work rate and the respective parameter indicate faster kinetics responses. Muscular V̇O 2 kinetics were estimated from heart rate and pulmonary V̇O 2 using a circulatory model. Muscular (walking vs. cycling [mean±SD in arbitrary units]: 0.40±0.08 vs. 0.41±0.08) and pulmonary V̇O 2 kinetics (0.35±0.06 vs. 0.35±0.06) were not different, although the time courses of the cross-correlation functions of pulmonary V̇O 2 showed unexpected biphasic responses. Heart rate kinetics (0.50±0.14 vs. 0.40±0.14; P=0.017) was faster for walking. Regarding the biphasic cross-correlation functions of pulmonary V̇O 2 during walking, the assessment of muscular V̇O 2 kinetics via pseudo-random binary sequences requires a circulatory model to account for cardio-dynamic distortions. Faster heart rate kinetics for walking should be considered by comparing results from cycle and treadmill ergometry. © Georg Thieme Verlag KG Stuttgart · New York.

  2. Code comparison for accelerator design and analysis

    International Nuclear Information System (INIS)

    Parsa, Z.

    1988-01-01

    We present a comparison between results obtained from standard accelerator physics codes used for the design and analysis of synchrotrons and storage rings, with programs SYNCH, MAD, HARMON, PATRICIA, PATPET, BETA, DIMAD, MARYLIE and RACE-TRACK. In our analysis we have considered 5 (various size) lattices with large and small angles including AGS Booster (10/degree/ bend), RHIC (2.24/degree/), SXLS, XLS (XUV ring with 45/degree/ bend) and X-RAY rings. The differences in the integration methods used and the treatment of the fringe fields in these codes could lead to different results. The inclusion of nonlinear (e.g., dipole) terms may be necessary in these calculations specially for a small ring. 12 refs., 6 figs., 10 tabs

  3. Molecular phylogeny of Toxoplasmatinae: comparison between inferences based on mitochondrial and apicoplast genetic sequences

    Directory of Open Access Journals (Sweden)

    Michelle Klein Sercundes

    2016-03-01

    Full Text Available Abstract Phylogenies within Toxoplasmatinae have been widely investigated with different molecular markers. Here, we studied molecular phylogenies of the Toxoplasmatinae subfamily based on apicoplast and mitochondrial genes. Partial sequences of apicoplast genes coding for caseinolytic protease (clpC and beta subunit of RNA polymerase (rpoB, and mitochondrial gene coding for cytochrome B (cytB were analyzed. Laboratory-adapted strains of the closely related parasites Sarcocystis falcatula and Sarcocystis neurona were investigated, along with Neospora caninum, Neospora hughesi, Toxoplasma gondii (strains RH, CTG and PTG, Besnoitia akodoni, Hammondia hammondiand two genetically divergent lineages of Hammondia heydorni. The molecular analysis based on organellar genes did not clearly differentiate between N. caninum and N. hughesi, but the two lineages of H. heydorni were confirmed. Slight differences between the strains of S. falcatula and S. neurona were encountered in all markers. In conclusion, congruent phylogenies were inferred from the three different genes and they might be used for screening undescribed sarcocystid parasites in order to ascertain their phylogenetic relationships with organisms of the family Sarcocystidae. The evolutionary studies based on organelar genes confirm that the genusHammondia is paraphyletic. The primers used for amplification of clpC and rpoB were able to amplify genetic sequences of organisms of the genus Sarcocystisand organisms of the subfamily Toxoplasmatinae as well.

  4. Comparison of global sensitivity analysis techniques and importance measures in PSA

    International Nuclear Information System (INIS)

    Borgonovo, E.; Apostolakis, G.E.; Tarantola, S.; Saltelli, A.

    2003-01-01

    This paper discusses application and results of global sensitivity analysis techniques to probabilistic safety assessment (PSA) models, and their comparison to importance measures. This comparison allows one to understand whether PSA elements that are important to the risk, as revealed by importance measures, are also important contributors to the model uncertainty, as revealed by global sensitivity analysis. We show that, due to epistemic dependence, uncertainty and global sensitivity analysis of PSA models must be performed at the parameter level. A difficulty arises, since standard codes produce the calculations at the basic event level. We discuss both the indirect comparison through importance measures computed for basic events, and the direct comparison performed using the differential importance measure and the Fussell-Vesely importance at the parameter level. Results are discussed for the large LLOCA sequence of the advanced test reactor PSA

  5. Event Sequence Analysis of the Air Intelligence Agency Information Operations Center Flight Operations

    National Research Council Canada - National Science Library

    Larsen, Glen

    1998-01-01

    This report applies Event Sequence Analysis, methodology adapted from aircraft mishap investigation, to an investigation of the performance of the Air Intelligence Agency's Information Operations Center (IOC...

  6. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

    Science.gov (United States)

    Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

    2005-01-01

    We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.

  7. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Unknown

    For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. ... been widely used for phylogenetic studies and sequence differences in ... In order to fill up the internal gap, a new set.

  8. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    1998). Cross- species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol. Biol. Evol. 15:1275-1287.

  9. Sequence and expression analysis of gaps in human chromosome 20

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Seemann, Stefan; Mang, Yuan

    2012-01-01

    /or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing......The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and...... and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum...

  10. DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

    Science.gov (United States)

    Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S

    2012-10-01

    An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. sharmila@atc.tcs.com Supplementary data are available at Bioinformatics online.

  11. Compilation and analysis of Escherichia coli promoter DNA sequences.

    OpenAIRE

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter ...

  12. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    Science.gov (United States)

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or 15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  13. Foundations of Sequence-to-Sequence Modeling for Time Series

    OpenAIRE

    Kuznetsov, Vitaly; Mariet, Zelda

    2018-01-01

    The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practiti...

  14. Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis

    Directory of Open Access Journals (Sweden)

    Posch Stefan

    2010-03-01

    Full Text Available Abstract Background One of the challenges of bioinformatics remains the recognition of short signal sequences in genomic DNA such as donor or acceptor splice sites, splicing enhancers or silencers, translation initiation sites, transcription start sites, transcription factor binding sites, nucleosome binding sites, miRNA binding sites, or insulator binding sites. During the last decade, a wealth of algorithms for the recognition of such DNA sequences has been developed and compared with the goal of improving their performance and to deepen our understanding of the underlying cellular processes. Most of these algorithms are based on statistical models belonging to the family of Markov random fields such as position weight matrix models, weight array matrix models, Markov models of higher order, or moral Bayesian networks. While in many comparative studies different learning principles or different statistical models have been compared, the influence of choosing different prior distributions for the model parameters when using different learning principles has been overlooked, and possibly lead to questionable conclusions. Results With the goal of allowing direct comparisons of different learning principles for models from the family of Markov random fields based on the same a-priori information, we derive a generalization of the commonly-used product-Dirichlet prior. We find that the derived prior behaves like a Gaussian prior close to the maximum and like a Laplace prior in the far tails. In two case studies, we illustrate the utility of the derived prior for a direct comparison of different learning principles with different models for the recognition of binding sites of the transcription factor Sp1 and human donor splice sites. Conclusions We find that comparisons of different learning principles using the same a-priori information can lead to conclusions different from those of previous studies in which the effect resulting from different

  15. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Science.gov (United States)

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  16. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  17. Comparison of MRI sequences for evaluation of multiple sclerosis of the cervical spinal cord at 3 T

    International Nuclear Information System (INIS)

    Philpott, Cristina; Brotchie, Peter

    2011-01-01

    Purpose: Debate remains regarding the utility of the traditional STIR (short inversion time recovery) sequence in aiding MRI diagnosis of spinal cord lesions in patients with multiple sclerosis (MS) and this sequence is not included in the current imaging guidelines. A recent study proposed a T1 weighted STIR as a superior alternative to the traditional STIR and T2 fast spin echo (FSE). Thus, the aim of this study was to compare the sensitivity of T2, standard STIR and T1 weighted STIR sequences in the evaluation of MS plaques on our 3 T system. Methods and materials: A retrospective analysis of patients with multiple sclerosis who presented to our institution over a period of 5 months and who had cervical cord lesions was undertaken. Patients had been examined with our institutional protocol which included T2 FSE, STIR and the recommended T1 STIR. Quantitative analysis of the lesions versus background cord using sample T-tests was performed for each sequence, and comparative analysis of the lesion contrast:background cord ratios of the 3 sequences (using two-way ANOVA tests) was performed. Results: The T2 sequence was not as sensitive in detecting lesions versus the traditional STIR and T1 weighted STIR, with 10% of lesions not detected using statistical analysis (p < 0.05). The traditional STIR also demonstrated greater contrast ratios than the T2 sequence (p < 0.05) suggesting increased sensitivity. However, the T1 STIR demonstrated even greater contrast ratios than both the traditional STIR and T2 sequences (p < 0.05). Conclusion: This study confirms earlier findings of the traditional STIRs increased sensitivity versus the T2 sequence. However, the new “T1 weighted STIR” appears to be even more sensitive than both these sequences showing potential promise as an alternative method to monitor demyelinating plaques of MS.

  18. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  19. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Directory of Open Access Journals (Sweden)

    Nicholas R Thomson

    2006-12-01

    Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common

  20. Genome sequencing and analysis of BCG vaccine strains.

    Directory of Open Access Journals (Sweden)

    Wen Zhang

    Full Text Available BACKGROUND: Although the Bacillus Calmette-Guérin (BCG vaccine against tuberculosis (TB has been available for more than 75 years, one third of the world's population is still infected with Mycobacterium tuberculosis and approximately 2 million people die of TB every year. To reduce this immense TB burden, a clearer understanding of the functional genes underlying the action of BCG and the development of new vaccines are urgently needed. METHODS AND FINDINGS: Comparative genomic analysis of 19 M. tuberculosis complex strains showed that BCG strains underwent repeated human manipulation, had higher region of deletion rates than those of natural M. tuberculosis strains, and lost several essential components such as T-cell epitopes. A total of 188 BCG strain T-cell epitopes were lost to various degrees. The non-virulent BCG Tokyo strain, which has the largest number of T-cell epitopes (359, lost 124. Here we propose that BCG strain protection variability results from different epitopes. This study is the first to present BCG as a model organism for genetics research. BCG strains have a very well-documented history and now detailed genome information. Genome comparison revealed the selection process of BCG strains under human manipulation (1908-1966. CONCLUSIONS: Our results revealed the cause of BCG vaccine strain protection variability at the genome level and supported the hypothesis that the restoration of lost BCG Tokyo epitopes is a useful future vaccine development strategy. Furthermore, these detailed BCG vaccine genome investigation results will be useful in microbial genetics, microbial engineering and other research fields.

  1. Collagen Sequence Analysis of the Extinct Giant Ground Sloths Lestodon and Megatherium.

    Directory of Open Access Journals (Sweden)

    Michael Buckley

    Full Text Available For over 200 years, fossils of bizarre extinct creatures have been described from the Americas that have ranged from giant ground sloths to the 'native' South American ungulates, groups of mammals that evolved in relative isolation on South America. Ground sloths belong to the South American xenarthrans, a group with modern although morphologically and ecologically very different representatives (anteaters, armadillos and sloths, which has been proposed to be one of the four main eutherian clades. Recently, proteomics analyses of bone collagen have recently been used to yield a molecular phylogeny for a range of mammals including the unusual 'Malagasy aardvark' shown to be most closely related to the afrotherian tenrecs, and the south American ungulates supporting their morphological association with condylarths. However, proteomics results generate partial sequence information that could impact upon the phylogenetic placement that has not been appropriately tested. For comparison, this paper examines the phylogenetic potential of proteomics-based sequencing through the analysis of collagen extracted from two extinct giant ground sloths, Lestodon and Megatherium. The ground sloths were placed as sister taxa to extant sloths, but with a closer relationship between Lestodon and the extant sloths than the basal Megatherium. These results highlight that proteomics methods could yield plausible phylogenies that share similarities with other methods, but have the potential to be more useful in fossils beyond the limits of ancient DNA survival.

  2. A novel analysis strategy for HLA typing using a sequence-specific oligonucleotide probe method.

    Science.gov (United States)

    Won, D I

    2017-11-01

    The technique of reverse sequence-specific oligonucleotide probes (SSOPs) is commonly used in human leukocyte antigen (HLA) typing. In the conventional method for data analysis (exact pattern matching, EPM), the larger is the number of mismatched probes, the longer the time for final typing assignment. A novel strategy, filtering and scoring (FnS), has been developed to easily assign the best-fit allele pair. In the FnS method, candidate alleles and allele pairs were filtered based on (1) subject's ethnicity, and (2) the measured partial reaction pattern with only definitely negative or positive probes. Then, the complete reaction pattern for all probes (CRPoAPs) were compared between the raw sample and expected residual allele pairs to obtain mismatch scores. To compare the FnS and EPM methods, each analysis time (minutes:seconds) for reverse SSOP HLA typing with intermediate resolution (n = 507) was measured. The analysis time with FnS method was shorter than that of the EPM method [00:21 (00:08-01:47) and 01:04 (00:15-23:45), respectively, P typing in a comprehensive and quantitative comparison between measured and expected CRPoAPs of candidate allele pairs. Therefore, this analysis strategy might be useful in a clinical setting. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  3. Multilocus sequence analysis of Echinococcus granulosus strains isolated from humans and animals in Iran.

    Science.gov (United States)

    Nikmanesh, Bahram; Mirhendi, Hossein; Mahmoudi, Shahram; Rokni, Mohammad Bagher

    2017-12-01

    Echinococcus granulosus is now considered a complex consisting of at least four species and ten genotypes. Different molecular targets have been described for molecular characterization of E. granulosus; however, in almost all studies only one or two of the targets have been used, and only limited data is available on the utilization of multiple loci. Therefore, we investigated the genetic diversity among 64 strains isolated from 138 cyst specimens of human and animal isolates, using a set of nuclear and mitochondrial genes; i.e., cytochrome c oxidase subunit 1 (cox1), NADH dehydrogenase subunit 1 (nad1), ATPase subunit 6 (atp6), 12S rRNA (12S), and Actin II (act II). In comparison to the use of molecular reference targets (nad1 + cox1), using singular target (act II or 12S or atp6) yielded lower discriminatory power. Act II and 12S genes could accurately discriminate the G6 genotype, but they were not able to differentiate between G1 and G3 genotypes. As the G1 and G3 genotypes belong to the E. granulosus sensu stricto, low intra-species variation was observed for act II and 12S. The atp6 gene could identify the G3 genotype but could not differentiate G6 and G1 genotypes. Using concatenated sequence of five genes (cox1 + nad1 + atp6 + 12S + act II), genotypes were identified accurately, and markedly higher resolution was obtained in comparison with the use of reference markers (nad1 + cox1) only. Application of multilocus sequence analysis (MLSA) to large-scale studies could provide valuable epidemiological data to make efficient control and management measures for cystic echinococcosis. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Genome Sequencing and Comparative Analysis of the Biocontrol Agent Trichoderma harzianum sensu stricto TR274

    Energy Technology Data Exchange (ETDEWEB)

    Steindorff, Andrei S.; Noronha, Elilane F.; Ulhoa, Cirano J.; Kuo, Alan; Salamov, Asaf A.; Haridas, Sajeet; Riley, Robert W.; Druzhinina, Irina S.; Kubicek, Christian P.; Grigoriev, Igor V.

    2015-03-17

    Biological control is a complex process which requires many mechanisms and a high diversity of biochemical pathways. The species of Trichoderma harzianum are well known for their biocontrol activity against many plant pathogens. To gain new insights into the biocontrol mechanism used by T. harzianum, we sequenced the isolate TR274 genome using Illumina. The assembly was performed using AllPaths-LG with a maximum coverage of 100x. The assembly resulted in 2282 contigs with a N50 of 37033bp. The genome size generated was 40.8 Mb and the GC content was 47.7%, similar to other Trichoderma genomes. Using the JGI Annotation Pipeline we predicted 13,932 genes with a high transcriptome support. CEGMA tests suggested 100% genome completeness and 97.9% of RNA-SEQ reads were mapped to the genome. The phylogenetic comparison using orthologous proteins with all Trichoderma genomes sequenced at JGI, corroborates the Trichoderma (T. asperellum and T. atroviride), Longibrachiatum (T. reesei and T. longibrachiatum) and Pachibasium (T. harzianum and T. virens) section division described previously. The comparison between two Trichoderma harzianum species suggests a high genome similarity but some strain-specific expansions. Analyses of the secondary metabolites, CAZymes, transporters, proteases, transcription factors were performed. The Pachybasium section expanded virtually all categories analyzed compared with the other sections, specially Longibrachiatum section, that shows a clear contraction. These results suggests that these proteins families have an important role in their respective phenotypes. Future analysis will improve the understanding of this complex genus and give some insights about its lifestyle and the interactions with the environment.

  5. Sequencing and analysis of the gene-rich space of cowpea

    Directory of Open Access Journals (Sweden)

    Cheung Foo

    2008-02-01

    total of 5,888 GSRs had homology to genes encoding transcription factors (TFs and transcription associated factors (TAFs representing about 5% of the total annotated sequences in the dataset. Sixty-two (62 of the 64 well-characterized plant transcription factor (TF gene families are represented in the cowpea GSRs, and these families are of similar size and phylogenetic organization to those characterized in other plants. The cowpea GSRs also provides a rich source of genes involved in photoperiodic control, symbiosis, and defense-related responses. Comparisons to available databases revealed that about 74% of cowpea ESTs and 70% of all legume ESTs were represented in the GSR dataset. As approximately 12% of all GSRs contain an identifiable simple-sequence repeat, the dataset is a powerful resource for the design of microsatellite markers. Conclusion The availability of extensive publicly available genomic data for cowpea, a non-model legume with significant importance in the developing world, represents a significant step forward in legume research. Not only does the gene space sequence enable the detailed analysis of gene structure, gene family organization and phylogenetic relationships within cowpea, but it also facilitates the characterization of syntenic relationships with other cultivated and model legumes, and will contribute to determining patterns of chromosomal evolution in the Leguminosae. The micro and macrosyntenic relationships detected between cowpea and other cultivated and model legumes should simplify the identification of informative markers for marker-assisted trait selection and map-based gene isolation necessary for cowpea improvement.

  6. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  7. Comparative genome sequence analysis of Choristoneura occidentalis Freeman and C. rosaceana Harris (Lepidoptera: Tortricidae alphabaculoviruses.

    Directory of Open Access Journals (Sweden)

    David K Thumbi

    Full Text Available The complete genome sequences of Choristoneura occidentalis and C. rosaceana nucleopolyhedroviruses (ChocNPV and ChroNPV, respectively (Baculoviridae: Alphabaculovirus were determined and compared with each other and with those of other baculoviruses, including the genome of the closely related C. fumiferana NPV (CfMNPV. The ChocNPV genome was 128,446 bp in length (1147 bp smaller than that of CfMNPV, had a G+C content of 50.1%, and contained 148 open reading frames (ORFs. In comparison, the ChroNPV genome was 129,052 bp in length, had a G+C content of 48.6% and contained 149 ORFs. ChocNPV and ChroNPV shared 144 ORFs in common, and had a 77% sequence identity with each other and 96.5% and 77.8% sequence identity, respectively, with CfMNPV. Five homologous regions (hrs, with sequence similarities to those of CfMNPV, were identified in ChocNPV, whereas the ChroNPV genome contained three hrs featuring up to 14 repeats. Both genomes encoded three inhibitors of apoptosis (IAP-1, IAP-2, and IAP-3, as reported for CfMNPV, and the ChocNPV IAP-3 gene represented the most divergent functional region of this genome relative to CfMNPV. Two ORFs were unique to ChocNPV, and four were unique to ChroNPV. ChroNPV ORF chronpv38 is a eukaryotic initiation factor 5 (eIF-5 homolog that has also been identified in the C. occidentalis granulovirus (ChocGV and is believed to be the product of horizontal gene transfer from the host. Based on levels of sequence identity and phylogenetic analysis, both ChocNPV and ChroNPV fall within group I alphabaculoviruses, where ChocNPV appears to be more closely related to CfMNPV than does ChroNPV. Our analyses suggest that it may be appropriate to consider ChocNPV and CfMNPV as variants of the same virus species.

  8. Complete mitochondrial genome sequence of Indian medium carp, Labeo gonius (Hamilton, 1822) and its comparison with other related carp species.

    Science.gov (United States)

    Behera, Bijay Kumar; Kumari, Kavita; Baisvar, Vishwamitra Singh; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Jena, J K

    2017-01-01

    In the present study, the complete mitochondrial genome sequence of Labeo gonius is reported using PGM sequencer (Ion Torrent). The complete mitogenome of L. gonius is obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP) which is 16 614 bp in length. The mitogenome of L. gonius comprised of 13 protein-coding genes, 22 tRNAs, 2 rRNA genes, and D-loop as control region along with gene order and organization, being similar to most of other fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of Labeo fimbriatus, as reported earlier. The phylogenetic analysis of Cypriniformes depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of L. gonius would be helpful in understanding the population genetics, phylogenetics, and evolution of Indian Carps.

  9. Accident Sequence Precursor Analysis for SGTR by Using Dynamic PSA Approach

    International Nuclear Information System (INIS)

    Lee, Han Sul; Heo, Gyun Young; Kim, Tae Wan

    2016-01-01

    In order to address this issue, this study suggests the sequence tree model to analyze accident sequence systematically. Using the sequence tree model, all possible scenarios which need a specific safety action to prevent the core damage can be identified and success conditions of safety action under complicated situation such as combined accident will be also identified. Sequence tree is branch model to divide plant condition considering the plant dynamics. Since sequence tree model can reflect the plant dynamics, arising from interaction of different accident timing and plant condition and from the interaction between the operator action, mitigation system, and the indicators for operation, sequence tree model can be used to develop the dynamic event tree model easily. Target safety action for this study is a feed-and-bleed (F and B) operation. A F and B operation directly cools down the reactor cooling system (RCS) using the primary cooling system when residual heat removal by the secondary cooling system is not available. In this study, a TLOFW accident and a TLOFW accident with LOCA were the target accidents. Based on the conventional PSA model and indicators, the sequence tree model for a TLOFW accident was developed. Based on the results of a sampling analysis and data from the conventional PSA model, the CDF caused by Sequence no. 26 can be realistically estimated. For a TLOFW accident with LOCA, second accident timings were categorized according to plant condition. Indicators were selected as branch point using the flow chart and tables, and a corresponding sequence tree model was developed. If sampling analysis is performed, practical accident sequences can be identified based on the sequence analysis. If a realistic distribution for the variables can be obtained for sampling analysis, much more realistic accident sequences can be described. Moreover, if the initiating event frequency under a combined accident can be quantified, the sequence tree model

  10. Sequencing and phylogenetic analysis of Herpes simplex virus type ...

    African Journals Online (AJOL)

    For determination of the genetic relationship of HSV-2 glycoprotein G gene (gG) in Iran with those in other countries, DNA fragment of 1100 bp corresponding to gG from six HSV-2 strains have been isolated from human infected sera samples in Iran, it was amplified in PCR system and was sequenced for determining ...

  11. Transcriptome analysis of blueberry using 454 EST sequencing

    Science.gov (United States)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  12. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Tarek

    2011-04-18

    Apr 18, 2011 ... nucleotide alignment of both native buffalo and cattle CSRP3 cDNAs sequences ..... Exon III, Identities = 71/75 (94%), Gaps = 1/75 (1%) Strand=Plus/Plus ... Band MR, Larson JH, Rebeiz M, Green CA, Heyen DW, Donovan J,.

  13. Functional analysis of bipartite begomovirus coat protein promoter sequences

    International Nuclear Information System (INIS)

    Lacatus, Gabriela; Sunter, Garry

    2008-01-01

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters

  14. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, D.M.; Bolund, Lars; As part of the Chinese Human Genome Sequencing Consortium, E.T.A.L.

    2006-01-01

    as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion...

  15. Sequence analysis of mitochondrial 16S ribosomal RNA gene

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  16. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  17. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  18. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  19. Molecular cloning, sequence analysis and structure prediction of the ...

    African Journals Online (AJOL)

    AJL

    2012-04-19

    Apr 19, 2012 ... The primers were based on the rBAT sequences of other animals deposited in GenBank. .... fragment; M1, 2000 bp DNA ladder; M2, 1000 bp DNA ladder. spliced to obtain the ..... A traffic signal for heterodimeric amino acid.

  20. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  1. Cloning and sequence analysis of the defective in anther ...

    African Journals Online (AJOL)

    To clone the defective in anther dehiscence1 (DAD1) gene fragment of Chinese kale, about 700 bp product was obtained by PCR amplification using Chinese kale genomic DNA as the template and a pair of specific primers designed according to the conserved sequence of DAD1 genes of Arabidopsis thaliana and ...

  2. Sequence and comparative analysis of Leuconostoc dairy bacteriophages

    DEFF Research Database (Denmark)

    Kot, Witold; Hansen, Lars Henrik; Neve, Horst

    2014-01-01

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc...

  3. Nucleotide Sequences and Comparison of Two Large Conjugative Plasmids from Different Campylobacter species

    National Research Council Canada - National Science Library

    Batchelor, Roger A; Pearson, Bruce M; Friis, Lorna M; Guerry, Patricia; Wells, Jerry M

    2004-01-01

    .... Both plasmids are mosaic in structure, having homologues of genes found in a variety of different commensal and pathogenic bacteria, but nevertheless, showed striking similarities in DNA sequence...

  4. Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.

    Science.gov (United States)

    Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing

    2015-08-05

    To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the

  5. Comparison of MRI pulse sequences in defining prostate volume after permanent implantation

    International Nuclear Information System (INIS)

    McLaughlin, P.W.; Narayana, V.; Drake, D.G.; Miller, B.M.; Marsh, L.; Chan, J.; Gonda, R.; Winfield, R.J.; Roberson, P.L.

    2002-01-01

    Purpose: To determine the relative value of three MRI pulse sequences in defining the prostate volume after permanent implantation. Methods and Materials: A total of 45 patients who received a permanent 125 I implant were studied. Two weeks after implantation, an axial CT scan (2 mm thickness) and T 1 -weighted, T 1 -weighted fat saturation, and T 2 -weighted axial MRI (3-mm) studies were obtained. The prostate volumes were compared with the initial ultrasound planning volumes, and subsequently the CT, T 1 -weighted, and T 1 -weighted fat saturation MRI volumes were compared with the T 2 -weighted volumes. Discrepancies in volume were evaluated by visual inspection of the registered axial images and the registration of axial volumes on the sagittal T 2 -weighted volumes. In a limited set of patients, pre- and postimplant CT and T 2 -weighted MRI studies were available for comparison to determine whether prostate volume changes after implant were dependent on the imaging modality. Results: T 1 -weighted and T 1 -weighted fat saturation MRI and CT prostate volumes were consistently larger than the T 2 -weighted MRI prostate volumes, with a volume on average 1.33 (SD 0.24) times the T 2 -weighted volume. This discrepancy was due to the superiority of T 2 -weighted MRI for prostate definition at the following critical interfaces: membranous urethra, apex, and anterior base-bladder and posterior base-seminal vesicle interfaces. The differences in prostate definition in the anterior base region suggest that the commonly reported underdose may be due to overestimation of the prostate in this region by CT. The consistent difference in volumes suggests that the degree of swelling observed after implantation is in part a function of the imaging modality. In patients with pre- and postimplant CT and T 2 -weighted MRI images, swelling on the T 2 -weighted images was 1.1 times baseline and on CT was 1.3 times baseline, confirming the imaging modality dependence of prostate

  6. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Science.gov (United States)

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  7. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  8. Sequence comparison of the rDNA introns from six different species of Tetrahymena

    DEFF Research Database (Denmark)

    Nielsen, Henrik; Engberg, J

    1985-01-01

    model for the intron RNA of Cech et al. (Proc. Natl. Acad. Sci. U.S.A. 80, 3903 (83)). Most of the sequence variation in the four new sequences reported here is found in single stranded loops in the model. However, in four cases we found nucleotide substitutions in duplex stem regions, two of them...

  9. Comparison of some characteristics of aerobic granules and sludge flocs from sequencing batch reactors.

    Science.gov (United States)

    Li, J; Garny, K; Neu, T; He, M; Lindenblatt, C; Horn, H

    2007-01-01

    Physical, chemical and biological characteristics were investigated for aerobic granules and sludge flocs from three laboratory-scale sequencing batch reactors (SBRs). One reactor was operated as normal SBR (N-SBR) and two reactors were operated as granular SBRs (G-SBR1 and G-SBR2). G-SBR1 was inoculated with activated sludge and G-SBR2 with granules from the municipal wastewater plant in Garching (Germany). The following major parameters and functions were measured and compared between the three reactors: morphology, settling velocity, specific gravity (SG), sludge volume index (SVI), specific oxygen uptake rate (SOUR), distribution of the volume fraction of extracellular polymeric substances (EPS) and bacteria, organic carbon and nitrogen removal. Compared with sludge flocs, granular sludge had excellent settling properties, good solid-liquid separation, high biomass concentration, simultaneous nitrification and denitrification. Aerobic granular sludge does not have a higher microbial activity and there are some problems including higher effluent suspended solids, lower ratio of VSS/SS and no nitrification at the beginning of cultivation. Measurement with CLSM and additional image analysis showed that EPS glycoconjugates build one main fraction inside the granules. The aerobic granules from G-SBR1 prove to be heavier, smaller and have a higher microbial activity compared with G-SBR2. Furthermore, the granules were more compact, with lower SVI and less filamentous bacteria.

  10. Systematic Internal Transcribed Spacer Sequence Analysis for Identification of Clinical Mold Isolates in Diagnostic Mycology: a 5-Year Study▿ †

    Science.gov (United States)

    Ciardo, Diana E.; Lucke, Katja; Imhof, Alex; Bloemberg, Guido V.; Böttger, Erik C.

    2010-01-01

    The implementation of internal transcribed spacer (ITS) sequencing for routine identification of molds in the diagnostic mycology laboratory was analyzed in a 5-year study. All mold isolates (n = 6,900) recovered in our laboratory from 2005 to 2009 were included in this study. According to a defined work flow, which in addition to troublesome phenotypic identification takes clinical relevance into account, 233 isolates were subjected to ITS sequence analysis. Sequencing resulted in successful identification for 78.6% of the analyzed isolates (57.1% at species level, 21.5% at genus level). In comparison, extended in-depth phenotypic characterization of the isolates subjected to sequencing achieved taxonomic assignment for 47.6% of these, with a mere 13.3% at species level. Optimization of DNA extraction further improved the efficacy of molecular identification. This study is the first of its kind to testify to the systematic implementation of sequence-based identification procedures in the routine workup of mold isolates in the diagnostic mycology laboratory. PMID:20573873

  11. Systematic internal transcribed spacer sequence analysis for identification of clinical mold isolates in diagnostic mycology: a 5-year study.

    Science.gov (United States)

    Ciardo, Diana E; Lucke, Katja; Imhof, Alex; Bloemberg, Guido V; Böttger, Erik C

    2010-08-01

    The implementation of internal transcribed spacer (ITS) sequencing for routine identification of molds in the diagnostic mycology laboratory was analyzed in a 5-year study. All mold isolates (n = 6,900) recovered in our laboratory from 2005 to 2009 were included in this study. According to a defined work flow, which in addition to troublesome phenotypic identification takes clinical relevance into account, 233 isolates were subjected to ITS sequence analysis. Sequencing resulted in successful identification for 78.6% of the analyzed isolates (57.1% at species level, 21.5% at genus level). In comparison, extended in-depth phenotypic characterization of the isolates subjected to sequencing achieved taxonomic assignment for 47.6% of these, with a mere 13.3% at species level. Optimization of DNA extraction further improved the efficacy of molecular identification. This study is the first of its kind to testify to the systematic implementation of sequence-based identification procedures in the routine workup of mold isolates in the diagnostic mycology laboratory.

  12. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    A 2.7 kb DNA fragment encoding the 60 kDa common antigen (CA) and a 13 kDa protein of Legionella micdadei was sequenced. Two open reading frames of 57,677 and 10,456 Da were identified, corresponding to the heat shock proteins GroEL and GroES, respectively. Typical -35, -10, and Shine-Dalgarno heat...

  13. The Matrix Method of Representation, Analysis and Classification of Long Genetic Sequences

    Directory of Open Access Journals (Sweden)

    Ivan V. Stepanyan

    2017-01-01

    Full Text Available The article is devoted to a matrix method of comparative analysis of long nucleotide sequences by means of presenting each sequence in the form of three digital binary sequences. This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type. In contrast to natural nucleotide sequences, artificial random sequences give non-regular patterns. Examples of binary mosaics of long nucleotide sequences are shown, including cases of human chromosomes and penicillins. The obtained results are then discussed.

  14. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  15. MR colonography with fecal tagging: comparison between 2D turbo FLASH and 3D FLASH sequences

    International Nuclear Information System (INIS)

    Papanikolaou, Nickolas; Grammatikakis, John; Maris, Thomas; Prassopoulos, Panos; Gourtsoyiannis, Nicholas; Lauenstein, Thomas

    2003-01-01

    The objective of this study was to compare inversion recovery turbo 2D fast low-angle shot (FLASH) and 3D FLASH sequences for fecal-tagged MR colonography studies. Fifteen consecutive patients with indications for colonoscopy underwent MR colonography with fecal tagging. An inversion recovery turbo-FLASH sequence was applied and compared in terms of artifacts presence, efficiency for masking residual stool, and colonic wall conspicuity with a fat-saturated 3D FLASH sequence. Both sequences were acquired following administration of paramagnetic contrast agent. Contrast-to-noise ratio and relative contrast between colonic wall and lumen were calculated and compared for both sequences. Turbo 2D FLASH provided fewer artifacts, higher efficiency for masking the residual stool, and colonic wall conspicuity equivalent to 3D FLASH. An inversion time of 10 ms provided homogeneously low signal intensity of the colonic lumen. Contrast to noise between colonic wall and lumen was significantly higher in the 3D FLASH images, whereas differences in relative contrast were not statistically significant. An optimized inversion-recovery 2D turbo-FLASH sequence provides better fecal tagging results and should be added to the 3D FLASH sequence when designing dark-lumen MR colonography examination protocols. (orig.)

  16. Resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods.

    Directory of Open Access Journals (Sweden)

    Kouki Yonezawa

    Full Text Available A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm-called the closest-neighbor trimming method-that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and [Formula: see text]-medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias.

  17. [Comparison of rDNA internal transcribed spacer sequences in asparagus].

    Science.gov (United States)

    Ou, Li-Jun; Ye, Wei; Zeng, Gui-Ping; Jiang, Xiang-Hui; She, Chao-Wen; Xu, Dong; Yang, Jia-Qiang

    2010-10-01

    Using ITS sequence of nine species to identify counterfeiting medicine and analyse phylogenetic of Asparagus. Analysing ITS sequences by amplification, cloning,sequencing and alignment. The length range of ITS sequence of nine species was from 711 to 748 bp, the percentage of G + C content was about 60%. The phylogenetic tree constructed on the basis of the ITS sequences showed that nine species were divided into two branches: Asparagus cochinchinensis, Asparagus officinalis, Asparagus densiflorus, Asparagus densiflorus cv. Myers and Asparagus densiflorus cv. Sprengeri were a branch and the others were a branch. Asparagus densiflorus and Asparagus densflorus cv. Myers those were from Africa had priority to clustering and then clustering with Asparagus densiflorus cv. Sprengeri that was a variant of Asparagus densiflorus in the first branch. Asparagus setaceus had relatively distant genetic relationship with the others three materials in another branch. The ITS sequences could distinguish species of Asparagus to test the counterfeit. Division status in phylogenetic tree of some species were debatable and ITS sequence was combined with others analytical tools to analyze the realistic phylogeny.

  18. The genome sequence and transcriptome of Potentilla micrantha and their comparison to Fragaria vesca (the woodland strawberry).

    Science.gov (United States)

    Buti, Matteo; Moretto, Marco; Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Brilli, Matteo; Lomsadze, Alexandre; Sonego, Paolo; Giongo, Lara; Alonge, Michael; Velasco, Riccardo; Varotto, Claudio; Šurbanovski, Nada; Borodovsky, Mark; Ward, Judson A; Engelen, Kristof; Cavallini, Andrea; Cestaro, Alessandro; Sargent, Daniel James

    2018-04-01

    The genus Potentilla is closely related to that of Fragaria, the economically important strawberry genus. Potentilla micrantha is a species that does not develop berries but shares numerous morphological and ecological characteristics with Fragaria vesca. These similarities make P. micrantha an attractive choice for comparative genomics studies with F. vesca. In this study, the P. micrantha genome was sequenced and annotated, and RNA-Seq data from the different developmental stages of flowering and fruiting were used to develop a set of gene predictions. A 327 Mbp sequence and annotation of the genome of P. micrantha, spanning 2674 sequence contigs, with an N50 size of 335,712, estimated to cover 80% of the total genome size of the species was developed. The genus Potentilla has a characteristically larger genome size than Fragaria, but the recovered sequence scaffolds were remarkably collinear at the micro-syntenic level with the genome of F. vesca, its closest sequenced relative. A total of 33,602 genes were predicted, and 95.1% of bench-marking universal single-copy orthologous genes were complete within the presented sequence. Thus, we argue that the majority of the gene-rich regions of the genome have been sequenced. Comparisons of RNA-Seq data from the stages of floral and fruit development revealed genes differentially expressed between P. micrantha and F. vesca.The data presented are a valuable resource for future studies of berry development in Fragaria and the Rosaceae and they also shed light on the evolution of genome size and organization in this family.

  19. Analysis of xylem formation in pine by cDNA sequencing

    Science.gov (United States)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  20. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  1. Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma.

    Science.gov (United States)

    Kubicek, Christian P; Herrera-Estrella, Alfredo; Seidl-Seiboth, Verena; Martinez, Diego A; Druzhinina, Irina S; Thon, Michael; Zeilinger, Susanne; Casas-Flores, Sergio; Horwitz, Benjamin A; Mukherjee, Prasun K; Mukherjee, Mala; Kredics, László; Alcaraz, Luis D; Aerts, Andrea; Antal, Zsuzsanna; Atanasova, Lea; Cervantes-Badillo, Mayte G; Challacombe, Jean; Chertkov, Olga; McCluskey, Kevin; Coulpier, Fanny; Deshpande, Nandan; von Döhren, Hans; Ebbole, Daniel J; Esquivel-Naranjo, Edgardo U; Fekete, Erzsébet; Flipphi, Michel; Glaser, Fabian; Gómez-Rodríguez, Elida Y; Gruber, Sabine; Han, Cliff; Henrissat, Bernard; Hermosa, Rosa; Hernández-Oñate, Miguel; Karaffa, Levente; Kosti, Idit; Le Crom, Stéphane; Lindquist, Erika; Lucas, Susan; Lübeck, Mette; Lübeck, Peter S; Margeot, Antoine; Metz, Benjamin; Misra, Monica; Nevalainen, Helena; Omann, Markus; Packer, Nicolle; Perrone, Giancarlo; Uresti-Rivera, Edith E; Salamov, Asaf; Schmoll, Monika; Seiboth, Bernhard; Shapiro, Harris; Sukno, Serenella; Tamayo-Ramos, Juan Antonio; Tisch, Doris; Wiest, Aric; Wilkinson, Heather H; Zhang, Michael; Coutinho, Pedro M; Kenerley, Charles M; Monte, Enrique; Baker, Scott E; Grigoriev, Igor V

    2011-01-01

    Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocladium virens, teleomorph Hypocrea virens), and a comparison with Trichoderma reesei (teleomorph Hypocrea jecorina). These three Trichoderma species display a remarkable conservation of gene order (78 to 96%), and a lack of active mobile elements probably due to repeat-induced point mutation. Several gene families are expanded in the two mycoparasitic species relative to T. reesei or other ascomycetes, and are overrepresented in non-syntenic genome regions. A phylogenetic analysis shows that T. reesei and T. virens are derived relative to T. atroviride. The mycoparasitism-specific genes thus arose in a common Trichoderma ancestor but were subsequently lost in T. reesei. The data offer a better understanding of mycoparasitism, and thus enforce the development of improved biocontrol strains for efficient and environmentally friendly protection of plants. © 2011 Kubicek et al.; licensee BioMed Central Ltd.

  2. The Genome Sequence of Leishmania (Leishmania) amazonensis: Functional Annotation and Extended Analysis of Gene Models

    Science.gov (United States)

    Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; e Ferreira, Renata Carmona; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana

    2013-01-01

    We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3′-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904

  3. Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma

    Science.gov (United States)

    2011-01-01

    Background Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Results Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocladium virens, teleomorph Hypocrea virens), and a comparison with Trichoderma reesei (teleomorph Hypocrea jecorina). These three Trichoderma species display a remarkable conservation of gene order (78 to 96%), and a lack of active mobile elements probably due to repeat-induced point mutation. Several gene families are expanded in the two mycoparasitic species relative to T. reesei or other ascomycetes, and are overrepresented in non-syntenic genome regions. A phylogenetic analysis shows that T. reesei and T. virens are derived relative to T. atroviride. The mycoparasitism-specific genes thus arose in a common Trichoderma ancestor but were subsequently lost in T. reesei. Conclusions The data offer a better understanding of mycoparasitism, and thus enforce the development of improved biocontrol strains for efficient and environmentally friendly protection of plants. PMID:21501500

  4. Whole-genome sequence analysis of the Mycobacterium avium complex and proposal of the transfer of Mycobacterium yongonense to Mycobacterium intracellulare subsp. yongonense subsp. nov.

    Science.gov (United States)

    Castejon, Maria; Menéndez, Maria Carmen; Comas, Iñaki; Vicente, Ana; Garcia, Maria J

    2018-06-01

    Bacterial whole-genome sequences contain informative features of their evolutionary pathways. Comparison of whole-genome sequences have become the method of choice for classification of prokaryotes, thus allowing the identification of bacteria from an evolutionary perspective, and providing data to resolve some current controversies. Currently, controversy exists about the assignment of members of the Mycobacterium avium complex, as is for the cases of Mycobacterium yongonense and 'Mycobacterium indicus pranii'. These two mycobacteria, closely related to Mycobacterium intracellulare on the basis of standard phenotypic and single gene-sequences comparisons, were not considered a member of such species on the basis on some particular differences displayed by a single strain. Whole-genome sequence comparison procedures, namely the average nucleotide identity and the genome distance, showed that those two mycobacteria should be considered members of the species M. intracellulare. The results were confirmed with other whole-genome comparison supplementary methods. According to the data provided, Mycobacterium yongonense and 'Mycobacterium indicus pranii' should be considered and renamed and included as members of M. intracellulare. This study highlights the problems caused when a novel species is accepted on the basis of a single strain, as was the case for M. yongonense. Based mainly on whole-genome sequence analysis, we conclude that M. yongonense should be reclassified as a subspecies of Mycobacterium intracellulareas Mycobacterium intracellularesubsp. yongonense and 'Mycobacterium indicus pranii' classified in the same subspecies as the type strain of Mycobacterium intracellulare and classified as Mycobacterium intracellularesubsp. intracellulare.

  5. MRI in neuro-Behcet's syndrome: comparison of conventional spin-echo and FLAIR pulse sequences

    International Nuclear Information System (INIS)

    Jaeger, H.R.; Albrecht, T.; Curati-Alasonatti, W.L.; Williams, E.J.; Haskard, D.O.

    1999-01-01

    We compared the sensitivity of a fluid-attenuated inversion-recovery (FLAIR) sequence with that of a conventional dual-echo spin-echo (SE) sequence to brain lesions in 20 patients with Behcet's syndrome. They underwent 25 MRI examinations. The images were independently analysed for the number, type and anatomical location of lesions shown. There were 18 abnormal studies (13 initial and 5 follow-up). The FLAIR sequence detected significantly more lesions than the SE TE 80 (P < 0.05) and SE TE 20 (P < 0.01) sequences. It was particularly useful for demonstrating lesions in the juxtacortical white matter, which accounted for over half the lesions detected on the FLAIR images. Of patients presenting with nonspecific symptoms such as headache, seven had normal and five had abnormal studies. All patients presenting with focal neurological signs had abnormal imaging. We found supratentorial and, in particular, juxtacortical lesions to be more frequent than previously described. (orig.)

  6. Nucleotide and deduced amino acid sequence of the envelope gene of the Vasilchenko strain of TBE virus; comparison with other flaviviruses.

    Science.gov (United States)

    Gritsun, T S; Frolova, T V; Pogodina, V V; Lashkevich, V A; Venugopal, K; Gould, E A

    1993-02-01

    A strain of tick-borne encephalitis virus known as Vasilchenko (Vs) exhibits relatively low virulence characteristics in monkeys, Syrian hamsters and humans. The gene encoding the envelope glycoprotein of this virus was cloned and sequenced. Alignment of the sequence with those of other known tick-borne flaviviruses and identification of the recognised amino acid genetic marker EHLPTA confirmed its identity as a member of the TBE complex. However, Vs virus was distinguishable from eastern and western tick-borne serotypes by the presence of the sequence AQQ at amino acid positions 232-234 and also by the presence of other specific amino acid substitutions which may be genetic markers for these viruses and could determine their pathogenetic characteristics. When compared with other tick-borne flaviviruses, Vs virus had 12 unique amino acid substitutions including an additional potential glycosylation site at position (315-317). The Vs virus strain shared closest nucleotide and amino acid homology (84.5% and 95.5% respectively) with western and far eastern strains of tick-borne encephalitis virus. Comparison with the far eastern serotype of tick-borne encephalitis virus, by cross-immunoelectrophoresis of Vs virions and PAGE analysis of the extracted virion proteins, revealed differences in surface charge and virus stability that may account for the different virulence characteristics of Vs virus. These results support and enlarge upon previous data obtained from molecular and serological analysis.

  7. Comparison of Human and Guinea Pig Acetylcholinesterase Sequences and Rates of Oxime-Assisted Reactivation

    Science.gov (United States)

    2010-01-01

    of appropriate animal model systems. For OP poisoning, the guinea pig (Cavia porcellus) is a commonly used animal model because guinea pigs more...endogenous bioscavenger in vivo. Although guinea pigs historically have been used to test OP poisoning therapies, it has been found recently that guinea pig AChE...transcribed mRNA encoding guinea pig AChE, amplified the resulting cDNA, and sequenced this product. The nucleotide and deduced amino acid sequences of

  8. Maturity onset diabetes of youth (MODY) in Turkish children: sequence analysis of 11 causative genes by next generation sequencing.

    Science.gov (United States)

    Ağladıoğlu, Sebahat Yılmaz; Aycan, Zehra; Çetinkaya, Semra; Baş, Veysel Nijat; Önder, Aşan; Peltek Kendirci, Havva Nur; Doğan, Haldun; Ceylaner, Serdar

    2016-04-01

    Maturity-onset diabetes of the youth (MODY), is a genetically and clinically heterogeneous group of diseasesand is often misdiagnosed as type 1 or type 2 diabetes. The aim of this study is to investigate both novel and proven mutations of 11 MODY genes in Turkish children by using targeted next generation sequencing. A panel of 11 MODY genes were screened in 43 children with MODY diagnosed by clinical criterias. Studies of index cases was done with MISEQ-ILLUMINA, and family screenings and confirmation studies of mutations was done by Sanger sequencing. We identified 28 (65%) point mutations among 43 patients. Eighteen patients have GCK mutations, four have HNF1A, one has HNF4A, one has HNF1B, two have NEUROD1, one has PDX1 gene variations and one patient has both HNF1A and HNF4A heterozygote mutations. This is the first study including molecular studies of 11 MODY genes in Turkish children. GCK is the most frequent type of MODY in our study population. Very high frequency of novel mutations (42%) in our study population, supports that in heterogenous disorders like MODY sequence analysis provides rapid, cost effective and accurate genetic diagnosis.

  9. Accident sequence precursor analysis level 2/3 model development

    International Nuclear Information System (INIS)

    Lui, C.H.; Galyean, W.J.; Brownson, D.A.

    1997-01-01

    The US Nuclear Regulatory Commission's Accident Sequence Precursor (ASP) program currently uses simple Level 1 models to assess the conditional core damage probability for operational events occurring in commercial nuclear power plants (NPP). Since not all accident sequences leading to core damage will result in the same radiological consequences, it is necessary to develop simple Level 2/3 models that can be used to analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude of the resulting radioactive releases to the environment, and calculate the consequences associated with these releases. The simple Level 2/3 model development work was initiated in 1995, and several prototype models have been completed. Once developed, these simple Level 2/3 models are linked to the simple Level 1 models to provide risk perspectives for operational events. This paper describes the methods implemented for the development of these simple Level 2/3 ASP models, and the linkage process to the existing Level 1 models

  10. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  11. Comparison of MR sequences in early cerebral infarction at 0.5 T

    International Nuclear Information System (INIS)

    Saatci, I.; Baskan, O.; Cekirge, H.S.; Besim, A.

    2000-01-01

    To compare the diagnostic values of fluid-attenuated inversion recovery (FLAIR) and gradient spin-echo (GRASE) with those of conventional spin-echo (SE) and fast SE T2-weighted sequences in the evaluation of acute cerebrovascular lesions at 0.5 T. Material and Methods: Twenty-two consecutive patients with the clinical diagnosis of acute cerebrovascular accident were examined by MR imaging within the first 48 h of ictus. MR examination included 5-mm axial conventional SE and turbo SE (TSE) T2-weighted, dual-echo GRASE and FLAIR sequences. The patients also had pre- and postcontrast T1-weighted axial images. Two examiners evaluated the images and scored the conspicuity of the acute lesions. Results: Regardless of location, FLAIR provided the best lesion conspicuity in the detection of acute infarcts, followed by the GRASE sequence. In the posterior fossa, TSE and SE demonstrated the lesions better than GRASE and FLAIR techniques. In the detection of hemorrhagic elements within the ischemic region, TSE demonstrated statistically significant superiority over other sequences. Conclusion: In the detection of acute ischemic lesions in locations other than the posterior fossa, FLAIR provided the best lesion conspicuity among four T2-weighted sequences, including SE, TSE, GRASE and FLAIR. However, for the posterior fossa examination, preference of SE or TSE T2-weighted sequences is suggested

  12. Comparison of MR sequences in early cerebral infarction at 0.5 T

    Energy Technology Data Exchange (ETDEWEB)

    Saatci, I.; Baskan, O.; Cekirge, H.S.; Besim, A. [Hacettepe Univ. Hospital, Ankara (Turkey). Radiology Dept.

    2000-11-01

    To compare the diagnostic values of fluid-attenuated inversion recovery (FLAIR) and gradient spin-echo (GRASE) with those of conventional spin-echo (SE) and fast SE T2-weighted sequences in the evaluation of acute cerebrovascular lesions at 0.5 T. Material and Methods: Twenty-two consecutive patients with the clinical diagnosis of acute cerebrovascular accident were examined by MR imaging within the first 48 h of ictus. MR examination included 5-mm axial conventional SE and turbo SE (TSE) T2-weighted, dual-echo GRASE and FLAIR sequences. The patients also had pre- and postcontrast T1-weighted axial images. Two examiners evaluated the images and scored the conspicuity of the acute lesions. Results: Regardless of location, FLAIR provided the best lesion conspicuity in the detection of acute infarcts, followed by the GRASE sequence. In the posterior fossa, TSE and SE demonstrated the lesions better than GRASE and FLAIR techniques. In the detection of hemorrhagic elements within the ischemic region, TSE demonstrated statistically significant superiority over other sequences. Conclusion: In the detection of acute ischemic lesions in locations other than the posterior fossa, FLAIR provided the best lesion conspicuity among four T2-weighted sequences, including SE, TSE, GRASE and FLAIR. However, for the posterior fossa examination, preference of SE or TSE T2-weighted sequences is suggested.

  13. Sequence comparison for non-enhanced MRA of the lower extremity arteries at 7 Tesla.

    Directory of Open Access Journals (Sweden)

    Sören Johst

    Full Text Available In this study three sequences for non-contrast-enhanced MRA of the lower extremity arteries at 7T were compared. Cardiac triggering was used with the aim to reduce signal variations in the arteries. Two fast single-shot 2D sequences, a modified Ultrafast Spoiled Gradient Echo (UGRE sequence and a variant of the Quiescent-Interval Single-Shot (QISS sequence were triggered via phonocardiogram and compared in volunteer examinations to a non-triggered 2D gradient echo (GRE sequence. For image acquisition, a 16-channel transmit/receive coil and a manually positionable AngioSURF table were used. To tackle B1 inhomogeneities at 7T, Time-Interleaved Acquisition of Modes (TIAMO was integrated in GRE and UGRE. To compare the three sequences quantitatively, a vessel-to-background ratio (VBR was measured in all volunteers and stations. In conclusion, cardiac triggering was able to suppress flow artifacts satisfactorily. The modified UGRE showed only moderate image artifacts. Averaged over all volunteers and stations, GRE reached a VBR of 4.18±0.05, UGRE 5.20±0.06, and QISS 2.72±0.03. Using cardiac triggering and TIAMO imaging technique was essential to perform non-enhanced MRA of the lower extremities vessels at 7T. The modified UGRE performed best, as observed artifacts were only moderate and the highest average VBR was reached.

  14. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    Serratia marcescens produces biosurfactant serrawettin, essential for its population migration behavior. Serrawettin W1 was revealed to be an antibiotic serratamolide that makes it significant for deoxyribonucleic acid (DNA) and protein sequence analysis. Four nucleotide and amino-acid sequences from local strains ...

  15. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  16. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  17. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  18. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    Science.gov (United States)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic

  19. Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

    Directory of Open Access Journals (Sweden)

    Rachel Caldwell

    2015-01-01

    Full Text Available There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

  20. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    Directory of Open Access Journals (Sweden)

    Chengwei Luo

    Full Text Available Next-generation sequencing (NGS is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage correlated highly between the two platforms (R(2>0.9. Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  1. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    Science.gov (United States)

    Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

    2012-01-01

    Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  2. MR imaging of articular cartilage in the ankle: comparison of available imaging sequences and methods of measurement in cadavers

    Energy Technology Data Exchange (ETDEWEB)

    Tan, T.C.F. [Department of Radiology, Veterans Administrative Medical Center, San Diego, CA (United States)]|[University of California Medical Center, San Diego, CA (United States)]|[Department of Radiology, Chang Gung Memorial Hospital, Linkou Medical Center, Taoyuan, Taiwan (Taiwan, Province of China); Wilcox, D.M. [Department of Radiology, Veterans Administrative Medical Center, San Diego, CA (United States)]|[University of California Medical Center, San Diego, CA (United States); Frank, L. [Department of Radiology, Veterans Administrative Medical Center, San Diego, CA (United States)]|[University of California Medical Center, San Diego, CA (United States); Shih, C. [Department of Radiology, Veterans Administrative Medical Center, San Diego, CA (United States)]|[University of California Medical Center, San Diego, CA (United States)]|[Department of Radiology, Veterans General Hospital-Taipei (Taiwan, Province of China); Trudell, D.J. [Department of Radiology, Veterans Administrative Medical Center, San Diego, CA (United States)]|[University of California Medical Center, San Diego, CA (United States); Sartoris, D.J. [Department of Radiology, Veterans Administrative Medical Center, San Diego, CA (United States)]|[University of California Medical Center, San Diego, CA (United States); Resnick, D. [Department of Radiology, Veterans Administrative Medical Center, San Diego, CA (United States)]|[University of California Medical Center, San Diego, CA (United States)

    1996-11-01

    Objective. To assess hyaline cartilage of cadaveric ankles using different magnetic resonance (MR) imaging techniques and various methods of measurement. Design and patients. Cartilage thicknesses of the talus and tibia were measured in ten cadaveric ankles by naked eye and by digitized image analysis from MR images of fat-suppressed T1-weighted gradient recalled (FS-SPGR), sequences and pulsed transfer saturation sequences with (FS-STS) and without fat-suppression (STS); these measurements were compared with those derived from direct inspection of cadaveric sections. The accuracy and precision errors were evaluated statistically for each imaging technique as well as measuring method. Contrast-to-noise ratios of cartilage versus joint fluid and marrow were compared for each of the imaging sequences. Results. Statistically, measurements from FS-SPGR images were associated with the smallest estimation error. Precision error of measurements derived from digitized image analysis was found to be smaller than that derived from naked eye measurements. Cartilage thickness measurements in images from STS and FS-STS sequences revealed larger errors in both accuracy and precision. Interobserver variance was larger in naked eye assessment of the cartilage. Contrast-to-noise ratio of cartilage versus joint fluid and marrow was higher with FS-SPGR than with FS-STS or STS sequences. Conclusion. Of the sequences and measurement techniques studied, the FS-SPGR sequence combined with the use of digitized image analysis provides the most accurate method for the assessment of ankle hyaline cartilage. (orig.). With 3 figs., 2 tabs.

  3. MR imaging of articular cartilage in the ankle: comparison of available imaging sequences and methods of measurement in cadavers

    International Nuclear Information System (INIS)

    Tan, T.C.F.; Wilcox, D.M.; Frank, L.; Shih, C.; Trudell, D.J.; Sartoris, D.J.; Resnick, D.

    1996-01-01

    Objective. To assess hyaline cartilage of cadaveric ankles using different magnetic resonance (MR) imaging techniques and various methods of measurement. Design and patients. Cartilage thicknesses of the talus and tibia were measured in ten cadaveric ankles by naked eye and by digitized image analysis from MR images of fat-suppressed T1-weighted gradient recalled (FS-SPGR), sequences and pulsed transfer saturation sequences with (FS-STS) and without fat-suppression (STS); these measurements were compared with those derived from direct inspection of cadaveric sections. The accuracy and precision errors were evaluated statistically for each imaging technique as well as measuring method. Contrast-to-noise ratios of cartilage versus joint fluid and marrow were compared for each of the imaging sequences. Results. Statistically, measurements from FS-SPGR images were associated with the smallest estimation error. Precision error of measurements derived from digitized image analysis was found to be smaller than that derived from naked eye measurements. Cartilage thickness measurements in images from STS and FS-STS sequences revealed larger errors in both accuracy and precision. Interobserver variance was larger in naked eye assessment of the cartilage. Contrast-to-noise ratio of cartilage versus joint fluid and marrow was higher with FS-SPGR than with FS-STS or STS sequences. Conclusion. Of the sequences and measurement techniques studied, the FS-SPGR sequence combined with the use of digitized image analysis provides the most accurate method for the assessment of ankle hyaline cartilage. (orig.). With 3 figs., 2 tabs

  4. The sequence and analysis of duplication rich human chromosome 16

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  5. Analysis of decision procedures for a sequence of inventory periods

    International Nuclear Information System (INIS)

    Avenhaus, R.

    1982-07-01

    Optimal test procedures for a sequence of inventory periods will be discussed. Starting with a game theoretical description of the conflict situation between the plant operator and the inspector, the objectives of the inspector as well as the general decision theoretical problem will be formulated. In the first part the objective of 'secure' detection will be emphasized which means that only at the end of the reference time a decision is taken by the inspector. In the second part the objective of 'timely' detection will be emphasized which will lead to sequential test procedures. At the end of the paper all procedures will be summarized, and in view of the multitude of procedures available at the moment some comments about future work will be given. (orig./HP) [de

  6. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    Science.gov (United States)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  7. Comparison of Nucleotide Sequence of P2C Region in Diabetogenic and Non-Diabetogenic Coxsackie Virus B5 Isolates

    Directory of Open Access Journals (Sweden)

    Cheng-Chong Chou

    2004-11-01

    Full Text Available Enteroviruses are environmental triggers in the pathogenesis of type 1 diabetes mellitus (DM. A sequence of six identical amino acids (PEVKEK is shared by the 2C protein of Coxsackie virus B and the glutamic acid decarboxylase (GAD molecules. Between 1995 and 2002, we investigated 22 Coxsackie virus B5 (CVB5 isolates from southern Taiwan. Four of these isolates were obtained from four new-onset type 1 DM patients with diabetic ketoacidosis. We compared a 300 nucleotide sequence in the 2C protein gene (p2C in 24 CVB5 isolates (4 diabetogenic, 18 non-diabetogenic and 2 prototype. We found 0.3-10% nucleotide differences. In the four isolates from type 1 DM patients, there was only 2.4-3.4% nucleotide difference, and there was only 1.7-7.1% nucleotide difference between type 1 DM isolates and non-diabetogenic isolates. Comparison of the nucleotide sequence between prototype virus and 22 CVB5 isolates revealed 18.4-24.1% difference. Twenty-one CVB5 isolates from type 1 DM and non-type 1 DM patients contained the PEVKEK sequence, as shown by the p2C nucleotide sequence. Our data showed that the viral p2C sequence with homology with GAD is highly conserved in CVB5 isolates. There was no difference between diabetogenic and non-diabetogenic CVB5 isolates. All four type 1 DM patients had at least one of the genetic susceptibility alleles HLA-DR, DQA1, DQB1. Other genetic and autoimmune factors such as HLA genetic susceptibility and GAD may also play important roles in the pathogenesis in type 1 DM.

  8. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  9. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  10. Information-Theoretical Analysis of EEG Microstate Sequences in Python

    Directory of Open Access Journals (Sweden)

    Frederic von Wegner

    2018-06-01

    Full Text Available We present an open-source Python package to compute information-theoretical quantities for electroencephalographic data. Electroencephalography (EEG measures the electrical potential generated by the cerebral cortex and the set of spatial patterns projected by the brain's electrical potential on the scalp surface can be clustered into a set of representative maps called EEG microstates. Microstate time series are obtained by competitively fitting the microstate maps back into the EEG data set, i.e., by substituting the EEG data at a given time with the label of the microstate that has the highest similarity with the actual EEG topography. As microstate sequences consist of non-metric random variables, e.g., the letters A–D, we recently introduced information-theoretical measures to quantify these time series. In wakeful resting state EEG recordings, we found new characteristics of microstate sequences such as periodicities related to EEG frequency bands. The algorithms used are here provided as an open-source package and their use is explained in a tutorial style. The package is self-contained and the programming style is procedural, focusing on code intelligibility and easy portability. Using a sample EEG file, we demonstrate how to perform EEG microstate segmentation using the modified K-means approach, and how to compute and visualize the recently introduced information-theoretical tests and quantities. The time-lagged mutual information function is derived as a discrete symbolic alternative to the autocorrelation function for metric time series and confidence intervals are computed from Markov chain surrogate data. The software package provides an open-source extension to the existing implementations of the microstate transform and is specifically designed to analyze resting state EEG recordings.

  11. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  12. De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae.

    Directory of Open Access Journals (Sweden)

    Mukesh Kumar

    Full Text Available The cereal cyst nematode (CCN, Heterodera avenae is a major pest of wheat (Triticum spp that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy revealed fewer glycoside hydrolases (GHs but more glycosyl transferases (GTs and carbohydrate esterases (CEs when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction.

  13. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.

    Directory of Open Access Journals (Sweden)

    Clive J Hoggart

    2008-07-01

    Full Text Available Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.

  14. Genome analysis of environmental and clinical P. aeruginosa isolates from sequence type-1146.

    Directory of Open Access Journals (Sweden)

    David Sánchez

    Full Text Available The genomes of Pseudomonas aeruginosa isolates of the new sequence type ST-1146, three environmental (P37, P47 and P49 and one clinical (SD9 isolates, with differences in their antibiotic susceptibility profiles have been sequenced and analysed. The genomes were mapped against P. aeruginosa PAO1-UW and UCBPP-PA14. The allelic profiles showed that the highest number of differences were in "Related to phage, transposon or plasmid" and "Secreted factors" categories. The clinical isolate showed a number of exclusive alleles greater than that for the environmental isolates. The phage Pf1 region in isolate SD9 accumulated the highest number of nucleotide substitutions. The ORF analysis of the four genomes assembled de novo indicated that the number of isolate-specific genes was higher in isolate SD9 (132 genes than in isolates P37 (24 genes, P47 (16 genes and P49 (21 genes. CRISPR elements were found in all isolates and SD9 showed differences in the spacer region. Genes related to bacteriophages F116 and H66 were found only in isolate SD9. Genome comparisons indicated that the isolates of ST-1146 are close related, and most genes implicated in pathogenicity are highly conserved, suggesting a genetic potential for infectivity in the environmental isolates similar to the clinical one. Phage-related genes are responsible of the main differences among the genomes of ST-1146 isolates. The role of bacteriophages has to be considered in the adaptation processes of isolates to the host and in microevolution studies.

  15. De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae.

    Science.gov (United States)

    Kumar, Mukesh; Gantasala, Nagavara Prasad; Roychowdhury, Tanmoy; Thakur, Prasoon Kumar; Banakar, Prakash; Shukla, Rohit N; Jones, Michael G K; Rao, Uma

    2014-01-01

    The cereal cyst nematode (CCN, Heterodera avenae) is a major pest of wheat (Triticum spp) that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy) revealed fewer glycoside hydrolases (GHs) but more glycosyl transferases (GTs) and carbohydrate esterases (CEs) when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million) value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb) for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction.

  16. Comparison of 3 T and 7 T MRI clinical sequences for ankle imaging

    Energy Technology Data Exchange (ETDEWEB)

    Juras, Vladimir, E-mail: vladimir.juras@meduniwien.ac.at [Medical University of Vienna, Department of Radiology, Vienna General Hospital, Waeringer Guertel 18-20, A-1090 Vienna (Austria); Slovak Academy of Sciences, Institute of Measurement Science, Dubravska cesta 9, 84104 Bratislava (Slovakia); Welsch, Goetz, E-mail: welsch@bwh.harvard.edu [Medical University of Vienna, Department of Radiology, Vienna General Hospital, Waeringer Guertel 18-20, A-1090 Vienna (Austria); Baer, Peter, E-mail: baerpeter@siemens.com [Siemens Healthcare, Richard-Strauss-Strasse 76, D81679 Munich (Germany); Kronnerwetter, Claudia, E-mail: claudia.kronnerwetter@meduniwien.ac.at [Medical University of Vienna, Department of Radiology, Vienna General Hospital, Waeringer Guertel 18-20, A-1090 Vienna (Austria); Fujita, Hiroyuki, E-mail: hiroyuki.fujita@qualedyn.com [Quality Electrodynamics, LCC, 777 Beta Dr, Cleveland, OH 44143-2336 (United States); Trattnig, Siegfried, E-mail: siegfried.trattnig@meduniwien.ac.at [Medical University of Vienna, Department of Radiology, Vienna General Hospital, Waeringer Guertel 18-20, A-1090 Vienna (Austria)

    2012-08-15

    The purpose of this study was to compare 3 T and 7 T signal-to-noise and contrast-to noise ratios of clinical sequences for imaging of the ankles with optimized sequences and dedicated coils. Ten healthy volunteers were examined consecutively on both systems with three clinical sequences: (1) 3D gradient-echo, T{sub 1}-weighted; (2) 2D fast spin-echo, PD-weighted; and (3) 2D spin-echo, T{sub 1}-weighted. SNR was calculated for six regions: cartilage; bone; muscle; synovial fluid; Achilles tendon; and Kager's fat-pad. CNR was obtained for cartilage/bone, cartilage/fluid, cartilage/muscle, and muscle/fat-pad, and compared by a one-way ANOVA test for repeated measures. Mean SNR significantly increased at 7 T compared to 3 T for 3D GRE, and 2D TSE was 60.9% and 86.7%, respectively. In contrast, an average SNR decrease of almost 25% was observed in the 2D SE sequence. A CNR increase was observed in 2D TSE images, and in most 3D GRE images. There was a substantial benefit from ultra high-field MR imaging of ankles with routine clinical sequences at 7 T compared to 3 T. Higher SNR and CNR at ultra-high field MR scanners may be useful in clinical practice for ankle imaging. However, carefully optimized protocols and dedicated extremity coils are necessary to obtain optimal results.

  17. The Comparison of Biochemical and Sequencing 16S rDNA Gene Methods to Identify Nontuberculous Mycobacteria

    Directory of Open Access Journals (Sweden)

    Shafipour1, M.

    2014-11-01

    Full Text Available The identification of Mycobacteria in the species level has great medical importance. Biochemical tests are laborious and time-consuming, so new techniques could be used to identify the species. This research aimed to the comparison of biochemical and sequencing 16S rDNA gene methods to identify nontuberculous Mycobacteria in patients suspected to tuberculosis in Golestan province which is the most prevalent region of tuberculosis in Iran. Among 3336 patients suspected to tuberculosis referred to hospitals and health care centres in Golestan province during 2010-2011, 319 (9.56% culture positive cases were collected. Identification of species by using biochemical tests was done. On the samples recognized as nontuberculous Mycobacteria, after DNA extraction by boiling, 16S rDNA PCR was done and their sequencing were identified by NCBI BLAST. Of the 319 positive samples in Golestan Province, 300 cases were M.tuberculosis and 19 cases (5.01% were identified as nontuberculous Mycobacteria by biochemical tests. 15 out of 19 nontuberculous Mycobacteria were identified by PCR and sequencing method as similar by biochemical methods (similarity rate: 78.9%. But after PCR, 1 case known as M.simiae by biochemical test was identified as M. lentiflavum and 3 other cases were identified as Nocardia. Biochemical methods corresponded to the 16S rDNA PCR and sequencing in 78.9% of cases. However, in identification of M. lentiflavum and Nocaria sp. the molecular method is better than biochemical methods.

  18. Data on genome sequencing, analysis and annotation of a pathogenic Bacillus cereus 062011msu

    Directory of Open Access Journals (Sweden)

    Rashmi Rathy

    2018-04-01

    Full Text Available Bacillus species 062011 msu is a harmful pathogenic strain responsible for causing abscessation in sheep and goat population studied by Mariappan et al. (2012 [1]. The organism specifically targets the female sheep and goat population and results in the reduction of milk and meat production. In the present study, we have performed the whole genome sequencing of the pathogenic isolate using the Ion Torrent sequencing platform and generated 458,944 raw reads with an average length of 198.2 bp. The genome sequence was assembled, annotated and analysed for the genetic islands, metabolic pathways, orthologous groups, virulence factors and antibiotic resistance genes associated with the pathogen. Simultaneously the 16S rRNA sequencing study and genome sequence comparison data confirmed that the strain belongs to the species Bacillus cereus and exhibits 99% sequence homo;logy with the genomes of B. cereus ATCC 10987 and B. cereus FRI-35. Hence, we have renamed the organism as Bacillus cereus 062011msu. The Whole Genome Shotgun (WGS project has been deposited at DDBJ/ENA/GenBank under the accession NTMF00000000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA404036(SAMN07629099. Keywords: Bacillus cereus, Genome sequencing, Abscessation, Virulence factors

  19. A direct comparison of MELCOR 1.8.3 and MAAP4 results for several PWR ampersand BWR accident sequences

    International Nuclear Information System (INIS)

    Leonard, M.T.; Ashbaugh, S.G.; Cole, R.K.; Bergeron, K.D.; Nagashima, K.

    1996-01-01

    This paper presents a comparison of calculations of severe accident progression for several postulated accident sequences for representative Pressurized Water Reactors (PWR) and Boiling Water Reactors (BWR) nuclear power plants performed with the MELCOR 1.8.3 and the MAAP4 computer codes. The PWR system examined in this study is a 1100 MWe system similar in design to a Westinghouse 3-loop plant with a large dry containment; the BWR is a 1100 MWe system similar in design to General Electric BWR/4 with a Mark I containment. A total of nine accident sequences were studied with both codes. Results of these calculations are compared to identify major differences in the timing of key events in the calculated accident progression or other important aspects of severe accident behavior, and to identify specific sources of the observed differences

  20. Comparison of the nucleotide sequence of wild-type hepatitis - A virus and its attenuated candidate vaccine derivative

    International Nuclear Information System (INIS)

    Cohen, J.I.; Rosenblum, B.; Ticehurst, J.R.; Daemer, R.; Feinstone, S.; Purcell, R.H.

    1987-01-01

    Development of attenuated mutants for use as vaccines is in progress for other viruses, including influenza, rotavirus, varicella-zoster, cytomegalovirus, and hepatitis-A virus (HAV). Attenuated viruses may be derived from naturally occurring mutants that infect human or nonhuman hosts. Alternatively, attenuated mutants may be generated by passage of wild-type virus in cell culture. Production of attenuated viruses in cell culture is a laborious and empiric process. Despite previous empiric successes, understanding the molecular basis for attenuation of vaccine viruses could facilitate future development and use of live-virus vaccines. Comparison of the complete nucleotide sequences of wild-type (virulent) and vaccine (attenuated) viruses has been reported for polioviruses and yellow fever virus. Here, the authors compare the nucleotide sequence of wild-type HAV HM-175 with that of a candidate vaccine derivative

  1. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

    Directory of Open Access Journals (Sweden)

    Hironobu Yanagisawa

    2016-03-01

    Full Text Available The presence of high molecular weight double-stranded RNA (dsRNA within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV, a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses.

  2. Data Analysis of Sequences and qPCR for Microbial Communities during Algal Blooms

    Science.gov (United States)

    A training opportunity is open to a highly microbial-research-motivated student to conduct sequence analysis, explore novel genes and metabolic pathways, validate resultant findings using qPCR/RT-qPCR and summarize the findings

  3. Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among ...

    African Journals Online (AJOL)

    Yazun Bashir Jarrar

    2017-11-26

    Nov 26, 2017 ... Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among Jordanian volunteers, Libyan. Journal of Medicine .... For molecular modeling of NAT2 protein, visualized ..... cal clustering. .... cular dynamics simulation.

  4. Analysis of common SHOX gene sequence variants and ∼4.9-kb ...

    Indian Academy of Sciences (India)

    [Solc R., Hirschfeldova K., Kebrdlova V. and Baxova A. 2014 Analysis of common SHOX gene sequence variants ... based on a Gibbs sampling strategy were done using .... SHOX (short stature homeobox) are an important cause of growth.

  5. A third-generation microsatellite-based linkage map of the honey bee, Apis mellifera, and its comparison with the sequence-based physical map.

    Science.gov (United States)

    Solignac, Michel; Mougel, Florence; Vautrin, Dominique; Monnerot, Monique; Cornuet, Jean-Marie

    2007-01-01

    The honey bee is a key model for social behavior and this feature led to the selection of the species for genome sequencing. A genetic map is a necessary companion to the sequence. In addition, because there was originally no physical map for the honey bee genome project, a meiotic map was the only resource for organizing the sequence assembly on the chromosomes. We present the genetic (meiotic) map here and describe the main features that emerged from comparison with the sequence-based physical map. The genetic map of the honey bee is saturated and the chromosomes are oriented from the centromeric to the telomeric regions. The map is based on 2,008 markers and is about 40 Morgans (M) long, resulting in a marker density of one every 2.05 centiMorgans (cM). For the 186 megabases (Mb) of the genome mapped and assembled, this corresponds to a very high average recombination rate of 22.04 cM/Mb. Honey bee meiosis shows a relatively homogeneous recombination rate along and across chromosomes, as well as within and between individuals. Interference is higher than inferred from the Kosambi function of distance. In addition, numerous recombination hotspots are dispersed over the genome. The very large genetic length of the honey bee genome, its small physical size and an almost complete genome sequence with a relatively low number of genes suggest a very promising future for association mapping in the honey bee, particularly as the existence of haploid males allows easy bulk segregant analysis.

  6. Hypercapnic normalization of BOLD fMRI: comparison across field strengths and pulse sequences

    DEFF Research Database (Denmark)

    Cohen, Eric R.; Rostrup, Egill; Sidaros, Karam

    2004-01-01

    to be more accurately localized and quantified based on changes in venous blood oxygenation alone. The normalized BOLD signal induced by the motor task was consistent across different magnetic fields and pulse sequences, and corresponded well with cerebral blood flow measurements. Our data suggest...... size, as well as experimental, such as pulse sequence and static magnetic field strength (B(0)). Thus, it is difficult to compare task-induced fMRI signals across subjects, field strengths, and pulse sequences. This problem can be overcome by normalizing the neural activity-induced BOLD fMRI response...... for global stimulation, subjects breathed a 5% CO(2) gas mixture. Under all conditions, voxels containing primarily large veins and those containing primarily active tissue (i.e., capillaries and small veins) showed distinguishable behavior after hypercapnic normalization. This allowed functional activity...

  7. Comparison of whole genome amplification techniques for human single cell exome sequencing.

    Science.gov (United States)

    Borgström, Erik; Paterlini, Marta; Mold, Jeff E; Frisen, Jonas; Lundeberg, Joakim

    2017-01-01

    Whole genome amplification (WGA) is currently a prerequisite for single cell whole genome or exome sequencing. Depending on the method used the rate of artifact formation, allelic dropout and sequence coverage over the genome may differ significantly. The largest difference between the evaluated protocols was observed when analyzing the target coverage and read depth distribution. These differences also had impact on the downstream variant calling. Conclusively, the products from the AMPLI1 and MALBAC kits were shown to be most similar to the bulk samples and are therefore recommended for WGA of single cells. In this study four commercial kits for WGA (AMPLI1, MALBAC, Repli-G and PicoPlex) were used to amplify human single cells. The WGA products were exome sequenced together with non-amplified bulk samples from the same source. The resulting data was evaluated in terms of genomic coverage, allelic dropout and SNP calling.

  8. Probabilistic topic modeling for the analysis and classification of genomic sequences

    Science.gov (United States)

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  9. Detecting potential adverse reactions of sulpiride in schizophrenic patients by prescription sequence symmetry analysis.

    Directory of Open Access Journals (Sweden)

    Edward Chia-Cheng Lai

    Full Text Available PURPOSE: Previous studies have demonstrated sulpiride to be significantly more effective than haloperidol, risperidone and olanzapine in schizophrenic treatment; however, only limited information is available on the potential risks associated with sulpiride treatment. This study attempts to provide information on the potential risks of sulpiride treatment of schizophrenia, especially with regard to unexpected adverse effects. MATERIALS AND METHODS: Patients with schizophrenia aged 18 and older, newly prescribed with a single antipsychotic medication from the National Health Insurance Research Database of Taiwan in the period from 2003 to 2010 were included. A within-subject comparison method, prescription sequence symmetry analysis (PSSA was employed to efficiently identify potential causal relationships while controlling for potential selection bias. RESULTS: A total of 5,750 patients, with a mean age of 39, approximately half of whom were male, constituted the study cohort. The PSSA found that sulpiride was associated with EPS (adjusted SR, 1.73; 95% CI, 1.46-2.06 and hyperprolactinemia (12.04; 1.59-91.2. In comparison, EPS caused by haloperidol has a magnitude of 1.99 when analyzed with PSSA, and hyperprolactinemia caused by amisulpride has a magnitude of 8.05, respectively. Another finding was the unexpected increase in the use of stomatological corticosteroids, emollient laxatives, dermatological preparations of corticosteroids, quinolone antibacterials, and topical products for joint and muscular pain, after initiation of sulpiride treatment. CONCLUSIONS: We found sulpiride to be associated with an increased risk of EPS and hyperprolactinemia, and the potential risk could be as high as that induced by haloperidol and amisulpride, respectively. Additionally, our study provides grounds for future investigations into the associations between sulpiride and the increased use of additional drugs for managing adverse effects, including

  10. Comparison of MRI pulse sequences for investigation of lesions of the cervical spinal cord

    International Nuclear Information System (INIS)

    Campi, A.; Pontesilli, S.; Gerevini, S.; Scotti, G.

    2000-01-01

    Small spinal cord lesions, even if clinically significant, can be due to the low sensitivity of some pulse sequences. We compared T2-weighted fast (FSE), and conventional (CSE) spin-echo and short-tau inversion-recovery (STIR)-FSE overlooked on MRI sequences to evaluate their sensitivity to and specificity for lesions of different types. We compared the three sequences in MRI of 57 patients with cervical spinal symptoms. The image sets were assessed by two of us individually for final diagnosis, lesion detectability and image quality. Both readers arrived at the same final diagnoses with all sequences, differentiating four groups of patients. Group 1 (30 patients, 53 %), with a final diagnosis of multiple sclerosis (MS). Demyelinating lesions were better seen on STIR-FSE images, on which the number of lesions was significantly higher than on FSE, while the FSE and CSE images showed approximately equal numbers of lesions; additional lesions were found in 9 patients. The contrast-to-noise ratio (CNR) of 17 demyelinating lesions was significantly higher on STIR-FSE images than with the other sequences. Group 2, 19 patients (33 %) with cervical pain, 15 of whom had disc protrusion or herniation: herniated discs were equally well delineated with all sequences, with better myelographic effect on FSE. In five patients with intrinsic spinal cord abnormalities, the conspicuity and demarcation of the lesions were similar with STIR-FSE and FSE. Group 3, 4 patients (7 %) with acute myelopathy of unknown aetiology. In two patients, STIR-FSE gave better demarcation of lesions and in one a questionable additional lesions. Group 4, 4 patients (7 %) with miscellaneous final diagnoses. STIR-FSE had high sensitivity to demyelinating lesions, can be considered quite specific and should be included in spinal MRI for assessment of suspected demyelinating disease. (orig.)

  11. The complete chloroplast genome sequence of Gentiana lawrencei var. farreri (Gentianaceae) and comparative analysis with its congeneric species.

    Science.gov (United States)

    Fu, Peng-Cheng; Zhang, Yan-Zhao; Geng, Hui-Min; Chen, Shi-Long

    2016-01-01

    The chloroplast (cp) genome is useful in plant systematics, genetic diversity analysis, molecular identification and divergence dating. The genus Gentiana contains 362 species, but there are only two valuable complete cp genomes. The purpose of this study is to report the characterization of complete cp genome of G. lawrencei var. farreri , which is endemic to the Qinghai-Tibetan Plateau (QTP). Using high throughput sequencing technology, we got the complete nucleotide sequence of the G. lawrencei var. farreri cp genome. The comparison analysis including genome difference and gene divergence was performed with its congeneric species G. straminea . The simple sequence repeats (SSRs) and phylogenetics were studied as well. The cp genome of G. lawrencei var. farreri is a circular molecule of 138,750 bp, containing a pair of 24,653 bp inverted repeats which are separated by small and large single-copy regions of 11,365 and 78,082 bp, respectively. The cp genome contains 130 known genes, including 85 protein coding genes (PCGs), eight ribosomal RNA genes and 37 tRNA genes. Comparative analyses indicated that G. lawrencei var. farreri is 10,241 bp shorter than its congeneric species G. straminea. Four large gaps were detected that are responsible for 85% of the total sequence loss. Further detailed analyses revealed that 10 PCGs were included in the four gaps that encode nine NADH dehydrogenase subunits. The cp gene content, order and orientation are similar to those of its congeneric species, but with some variation among the PCGs. Three genes, ndhB , ndhF and clpP , have high nonsynonymous to synonymous values. There are 34 SSRs in the G. lawrencei var. farreri cp genome, of which 25 are mononucleotide repeats: no dinucleotide repeats were detected. Comparison with the G. straminea cp genome indicated that five SSRs have length polymorphisms and 23 SSRs are species-specific. The phylogenetic analysis of 48 PCGs from 12 Gentianales taxa cp genomes clearly identified

  12. COMPARISON BETWEEN MIXED INTEGER PROGRAMMING WITH HEURISTIC METHOD FOR JOB SHOP SCHEDULING WITH SEPARABLE SEQUENCE-DEPENDENT SETUPS

    Directory of Open Access Journals (Sweden)

    I Gede Agus Widyadana

    2001-01-01

    Full Text Available The decisions to choose appropriate tools for solving industrial problems are not just tools that achieve optimal solution only but it should consider computation time too. One of industrial problems that still difficult to achieve both criteria is scheduling problem. This paper discuss comparison between mixed integer programming which result optimal solution and heuristic method to solve job shop scheduling problem with separable sequence-dependent setup. The problems are generated and the result shows that the heuristic methods still cannot satisfy optimal solution.

  13. A methodological comparison of customer service analysis techniques

    Science.gov (United States)

    James Absher; Alan Graefe; Robert Burns

    2003-01-01

    Techniques used to analyze customer service data need to be studied. Two primary analysis protocols, importance-performance analysis (IP) and gap score analysis (GA), are compared in a side-by-side comparison using data from two major customer service research projects. A central concern is what, if any, conclusion might be different due solely to the analysis...

  14. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    Science.gov (United States)

    2017-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . PRINCIPAL INVESTIGATOR...TITLE AND SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . 5a. CONTRACT NUMBER 5b. GRANT NUMBER GRANT11489...institutional, NIH-funded study of genetic and epigenetic alterations of pre-invasive DCIS that did or did not progress to invasive breast cancer , with an

  15. Detection of hepatic VX2 carcinomas with ferucarbotran-enhanced magnetic resonance imaging in rabbits: Comparison of nine pulse sequences

    International Nuclear Information System (INIS)

    Kim, Seong Hyun; Choi, Dongil; Lim, Hyo K.; Kim, Min Ju; Jang, Kyung Mi; Kim, Seung Hoon; Lee, Won Jae; Lee, Jongmee; Jeon, Yong Hwan; Lim, Jae Hoon

    2006-01-01

    Objective: To compare the diagnostic performance of a variety of magnetic resonance imaging (MRI) sequences, in order to identify the most effective ferucarbotran-enhanced sequence for the detection of multiple small hepatic VX2 carcinomas in rabbits. Methods: Fifteen rabbits with experimentally induced 135 VX2 carcinomas in the liver underwent ferucarbotran-enhanced MRI using the following nine pulse sequences: a fat-suppressed fast spin-echo (FSE) sequence with two echo times (TE) (proton density- and T2-weighted images), four different T2*-weighted fast multiplanar GRASS (gradient-recalled acquisition in the steady state) (FMPGR) with the combination of three TEs (9, 12, 15 ms, respectively) and two flip angles (20 deg., 80 deg., respectively), T2*-weighted fast multiplanar spoiled GRASS (FMPSPGR), T1-weighted FMPSPGR, and dynamic T1-weighted FMPSPGR. All images were reviewed by three radiologists with quantitative and qualitative analysis. Results: Tumor-to-liver contrast-to-noise ratio of the proton density-weighted FSE sequence was significantly higher than those of the others (p o ) images were superior to those of the others and for the detection of very small hepatic tumors of less than 5 mm, the sensitivities of these sequences were less than 30%. Conclusion: Ferucarbotran-enhanced T2- and proton density-weighted FSE and T2*-weighed FMPGR (TE/flip angle, 12/20 o ) images were found to be the most effective pulse sequences for the detection of multiple small hepatic VX2 carcinomas but these sequences were limited in the detection of very small hepatic tumors of less than 5 mm in size

  16. Radiation-induced germ-line mutations detected by a direct comparison of parents and children DNA sequences containing SNPs

    International Nuclear Information System (INIS)

    Morimyo, M.; Hongo, E.; Higashi, T.; Wu, J.; Matsumoto, I.; Okamoto, M.; Kawano, A.; Tsuji, S.

    2003-01-01

    Full text: Germ-line mutation is detected in mice but not in humans. To estimate genetic risk of humans, a new approach to extrapolate from animal data to humans or to directly detect radiation-induced mutations in man is expected. We have developed a new method to detect germ-line mutations by directly comparing DNA sequences of parents and children. The nucleotide sequences among mouse strains are almost identical except SNP markers that are detected at 1/1000 frequency. When gamma-irradiated male mice are mated with female mice, heterogeneous nucleotide sequences induced in children DNA are a candidate of mutation, whose assignment can be done by SNP analysis. This system can easily detect all types of mutations such as transition, transversion, frameshift and deletion induced by radiation and can be applied to humans having genetically heterogeneous nucleotide sequences and many SNP markers. C3H male mice of 8 weeks of gestation were irradiated with gamma rays of 3 and 1 Gy and after 3 weeks, they were mated with the same aged C57BL female mice. After 3 weeks breeding, DNA was extracted from parents and children mice. The nucleotide sequences of 150 STS markers containing 300-900 bp and SNPs of parents and children DNA were determined by a direct sequencing; amplification of STS markers by Taq DNA polymerase, purification of PCR products, and DNA sequencing with a dye-terminator method. At each radiation dose, a total amount of 5 Mb DNA sequences were examined to detect radiation-induced mutations. We could find 6 deletions in 3 Gy irradiated mice but not in 1 Gy and control mice. The mutation frequency was about 4.0 x 10 -7 /bp/ Gy or 1.6 x 10 -4 /locus/Gy, and suggested the non-linear increase of mutation rate with dose

  17. Seismically induced accident sequence analysis of the advanced test reactor

    International Nuclear Information System (INIS)

    Khericha, S.T.; Henry, D.M.; Ravindra, M.K.; Hashimoto, P.S.; Griffin, M.J.; Tong, W.H.; Nafday, A.M.

    1991-01-01

    A seismic probabilistic risk assessment (PRA) was performed for the Department of Energy (DOE) Advanced Test Reactor (ATR) as part of the external events analysis. The risk from seismic events to the fuel in the core and in the fuel storage canal was evaluated. The key elements of this paper are the integration of seismically induced internal flood and internal fire, and the modeling of human error rates as a function of the magnitude of earthquake. The systems analysis was performed by EG ampersand G Idaho, Inc. and the fragility analysis and quantification were performed by EQE International, Inc. (EQE)

  18. Contrasting safety assessments of a runway incursion scenario: Event sequence analysis versus multi-agent dynamic risk modelling

    International Nuclear Information System (INIS)

    Stroeve, Sybert H.; Blom, Henk A.P.; Bakker, G.J.

    2013-01-01

    In the safety literature it has been argued, that in a complex socio-technical system safety cannot be well analysed by event sequence based approaches, but requires to capture the complex interactions and performance variability of the socio-technical system. In order to evaluate the quantitative and practical consequences of these arguments, this study compares two approaches to assess accident risk of an example safety critical sociotechnical system. It contrasts an event sequence based assessment with a multi-agent dynamic risk model (MA-DRM) based assessment, both of which are performed for a particular runway incursion scenario. The event sequence analysis uses the well-known event tree modelling formalism and the MA-DRM based approach combines agent based modelling, hybrid Petri nets and rare event Monte Carlo simulation. The comparison addresses qualitative and quantitative differences in the methods, attained risk levels, and in the prime factors influencing the safety of the operation. The assessments show considerable differences in the accident risk implications of the performance of human operators and technical systems in the runway incursion scenario. In contrast with the event sequence based results, the MA-DRM based results show that the accident risk is not manifest from the performance of and relations between individual human operators and technical systems. Instead, the safety risk emerges from the totality of the performance and interactions in the agent based model of the safety critical operation considered, which coincides very well with the argumentation in the safety literature.

  19. Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana.

    Directory of Open Access Journals (Sweden)

    Yanan Liu

    Full Text Available The Eastern woodchuck (Marmota monax is a classical animal model for studying hepatitis B virus (HBV infection and hepatocellular carcinoma (HCC in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO. The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs and the simple sequence repeats (SSRs were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC

  20. De-standardization of family-life trajectories of young adults: A cross-national comparison using sequence analyses

    NARCIS (Netherlands)

    Elzinga, C.H.; Liefbroer, A.C.

    2007-01-01

    We introduce a number of new methods based on sequence analysis to test hypotheses on the de-standardization of family-life trajectories in early adulthood, using Fertility and Family Survey data on 19 countries. Across cohorts, family-life trajectories of young adults have not become more

  1. Recent advances in nanopore-based nucleic acid analysis and sequencing

    International Nuclear Information System (INIS)

    Shi, Jidong; Fang, Ying; Hou, Junfeng

    2016-01-01

    Nanopore-based sequencing platforms are transforming the field of genomic science. This review (containing 116 references) highlights some recent progress on nanopore-based nucleic acid analysis and sequencing. These studies are classified into three categories, biological, solid-state, and hybrid nanopores, according to their nanoporous materials. We begin with a brief description of the translocation-based detection mechanism of nanopores. Next, specific examples are given in nanopore-based nucleic acid analysis and sequencing, with an emphasis on identifying strategies that can improve the resolution of nanopores. This review concludes with a discussion of future research directions that will advance the practical applications of nanopore technology. (author)

  2. Microscopic Analysis and Modeling of Airport Surface Sequencing, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — The complexity and interdependence of operations on the airport surface motivate the need for a comprehensive and detailed, yet flexible and validated analysis and...

  3. BioMatriX: Sequence analysis, structure visualization, phylogenetics ...

    African Journals Online (AJOL)

    bmx-biomatrix.blogspot.com) developed for biological science community to augment scientific research regarding genomics, proteomics, phylogenetics and linkage analysis in one platform. BioMatriX offers multi-functional services to perform ...

  4. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii genome.

    Directory of Open Access Journals (Sweden)

    Byrappa Venkatesh

    2007-04-01

    Full Text Available Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.

  5. A powerful and flexible approach to the analysis of RNA sequence count data.

    Science.gov (United States)

    Zhou, Yi-Hui; Xia, Kai; Wright, Fred A

    2011-10-01

    A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean-variance relationships provides a flexible testing regimen that 'borrows' information across genes, while easily incorporating design effects and additional covariates. We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean-variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility. An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeq yzhou@bios.unc.edu; fwright@bios.unc.edu Supplementary data are available at Bioinformatics online.

  6. Sequence analysis of L RNA of Lassa virus

    International Nuclear Information System (INIS)

    Vieth, Simon; Torda, Andrew E.; Asper, Marcel; Schmitz, Herbert; Guenther, Stephan

    2004-01-01

    The L RNA of three Lassa virus strains originating from Nigeria, Ghana/Ivory Coast, and Sierra Leone was sequenced and the data subjected to structure predictions and phylogenetic analyses. The L gene products had 2218-2221 residues, diverged by 18% at the amino acid level, and contained several conserved regions. Only one region of 504 residues (positions 1043-1546) could be assigned a function, namely that of an RNA polymerase. Secondary structure predictions suggest that this domain is very similar to RNA-dependent RNA polymerases of known structure encoded by plus-strand RNA viruses, permitting a model to be built. Outside the polymerase region, there is little structural data, except for regions of strong alpha-helical content and probably a coiled-coil domain at the N terminus. No evidence for reassortment or recombination during Lassa virus evolution was found. The secondary structure-assisted alignment of the RNA polymerase region permitted a reliable reconstruction of the phylogeny of all negative-strand RNA viruses, indicating that Arenaviridae are most closely related to Nairoviruses. In conclusion, the data provide a basis for structural and functional characterization of the Lassa virus L protein and reveal new insights into the phylogeny of negative-strand RNA viruses

  7. Sequence analysis of the breakpoint regions of an X;5 translocation in a female with Duchenne muscular dystrophy

    Energy Technology Data Exchange (ETDEWEB)

    Bakel, I. van; Holt, S.; Craig, I. [Univ. of Oxford (United Kingdom)] [and others

    1995-08-01

    X;autosome translocations in females with Duchenne muscular dystrophy (DMD) provide an opportunity to study the mechanisms responsible for chromosomal rearrangements that occur in the germ line. We describe here a detailed molecular analysis of the translocation breakpoints of an X;autosome reciprocal translocation, t(X;5) (p21;q31.1), in a female with DMD. Cosmid clones that contained the X-chromosome breakpoint region were identified, and subclones that hybridized to the translocation junction fragment in restriction digests of the patient`s DNA were isolated and sequenced. Primers designed from the X-chromosomal sequence were used to obtain the junction fragments on the der(X) and the der(5) by inverse PCR. The resultant clones were also cloned and sequenced, and this information used to isolate the chromosome 5 breakpoint region. Comparison of the DNA sequences of the junction fragments with those of the breakpoint regions on chromosomes X and 5 revealed that the translocation arose by nonhomologous recombination with an imprecise reciprocal exchange. Four and six base pairs of unknown origin are inserted at the exchange points of the der(X) and der(5), respectively, and three nucleotides are deleted from the X-chromosome sequence. Two features were found that may have played a role in the generation of the translocation. These were (1) a repeat motif with an internal homopyrimidine stretch 10 bp upstream from the X-chromosome breakpoint and (2) a 9-bp sequence of 78% homology located near the breakpoints on chromosomes 5 and X. 32 refs., 4 figs., 2 tabs.

  8. Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin

    2016-08-01

    Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques

    DEFF Research Database (Denmark)

    Yan, Guangmei; Zhang, Guojie; Fang, Xiaodong

    2011-01-01

    The nonhuman primates most commonly used in medical research are from the genus Macaca. To better understand the genetic differences between these animal models, we present high-quality draft genome sequences from two macaque species, the cynomolgus/crab-eating macaque and the Chinese rhesus...

  10. Bringing Next-Generation Sequencing into the Classroom through a Comparison of Molecular Biology Techniques

    Science.gov (United States)

    Bowling, Bethany; Zimmer, Erin; Pyatt, Robert E.

    2014-01-01

    Although the development of next-generation (NextGen) sequencing technologies has revolutionized genomic research and medicine, the incorporation of these topics into the classroom is challenging, given an implied high degree of technical complexity. We developed an easy-to-implement, interactive classroom activity investigating the similarities…

  11. Salmon louse (Lepeophtheirus salmonis transcriptomes during post molting maturation and egg production, revealed using EST-sequencing and microarray analysis

    Directory of Open Access Journals (Sweden)

    Jonassen Inge

    2008-03-01

    Full Text Available Abstract Background Lepeophtheirus salmonis is an ectoparasitic copepod feeding on skin, mucus and blood from salmonid hosts. Initial analysis of EST sequences from pre adult and adult stages of L. salmonis revealed a large proportion of novel transcripts. In order to link unknown transcripts to biological functions we have combined EST sequencing and microarray analysis to characterize female salmon louse transcriptomes during post molting maturation and egg production. Results EST sequence analysis shows that 43% of the ESTs have no significant hits in GenBank. Sequenced ESTs assembled into 556 contigs and 1614 singletons and whenever homologous genes were identified no clear correlation with homologous genes from any specific animal group was evident. Sequence comparison of 27 L. salmonis proteins with homologous proteins in humans, zebrafish, insects and crustaceans revealed an almost identical sequence identity with all species. Microarray analysis of maturing female adult salmon lice revealed two major transcription patterns; up-regulation during the final molting followed by down regulation and female specific up regulation during post molting growth and egg production. For a third minor group of ESTs transcription decreased during molting from pre-adult II to immature adults. Genes regulated during molting typically gave hits with cuticula proteins whilst transcripts up regulated during post molting growth were female specific, including two vitellogenins. Conclusion The copepod L.salmonis contains high a level of novel genes. Among analyzed L.salmonis proteins, sequence identities with homologous proteins in crustaceans are no higher than to homologous proteins in humans. Three distinct processes, molting, post molting growth and egg production correlate with transcriptional regulation of three groups of transcripts; two including genes related to growth, one including genes related to egg production. The function of the regulated

  12. Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikos; Vinther, Jeppe

    2015-01-01

    time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data...

  13. Molecular diagnosis of lyssaviruses and sequence comparison of Australian bat lyssavirus samples.

    Science.gov (United States)

    Foord, A J; Heine, H G; Pritchard, L I; Lunt, R A; Newberry, K M; Rootes, C L; Boyle, D B

    2006-07-01

    To evaluate and implement molecular diagnostic tests for the detection of lyssaviruses in Australia. A published hemi-nested reverse transcriptase polymerase chain reaction (RT-PCR) for the detection of all lyssavirus genotypes was modified to a fully nested RT-PCR format and compared with the original assay. TaqMan assays for the detection of Australian bat lyssavirus (ABLV) were compared with both the nested and hemi-nested RT-PCR assays. The sequences of RT-PCR products were determined to assess sequence variations of the target region (nucleocapsid gene) in samples of ABLV originating from different regions. The nested RT-PCR assay was highly analytically specific, and at least as analytically sensitive as the hemi-nested assay. The TaqMan assays were highly analytically specific and more analytically sensitive than either RT-PCR assay, with a detection level of approximately 10 genome equivalents per microl. Sequence of the first 544 nucleotides of the nucleocapsid protein coding sequence was obtained from all samples of ABLV received at Australian Animal Health Laboratory during the study period. The nested RT-PCR provided a means for molecular diagnosis of all tested genotypes of lyssavirus including classical rabies virus and Australian bat lyssavirus. The published TaqMan assay proved to be superior to the RT-PCR assays for the detection of ABLV in terms of analytical sensitivity. The TaqMan assay would also be faster and cross contamination is less likely. Nucleotide sequence analyses of samples of ABLV from a wide geographical range in Australia demonstrated the conserved nature of this region of the genome and therefore the suitability of this region for molecular diagnosis.

  14. The Complete Mitochondrial Genome Sequence of Bactericera cockerelli and Comparison with Three Other Psylloidea Species.

    Directory of Open Access Journals (Sweden)

    Fengnian Wu

    Full Text Available Potato psyllid (Bactericera cockerelli is an important pest of potato, tomato and pepper. Not only could a toxin secreted by nymphs results in serious phytotoxemia in some host plants, but also over the past few years B. cockerelli was shown to transmit "Candidatus Liberibacter solanacearum", the putative bacterial pathogen of potato zebra chip (ZC disease, to potato and tomato. ZC has caused devastating losses to potato production in the western U.S., Mexico, and elsewhere. New knowledge of the genetic diversity of the B. cockerelli is needed to develop improved strategies to manage pest populations. Mitochondrial genome (mitogenome sequencing provides important knowledge about insect evolution and diversity in and among populations. This report provides the first complete B. cockerelli mitogenome sequence as determined by next generation sequencing technology (Illumina MiSeq. The circular B. cockerelli mitogenome had a size of 15,220 bp with 13 protein-coding gene (PCGs, 2 ribosomal RNA genes (rRNAs, 22 transfer RNA genes (tRNAs, and a non-coding region of 975 bp. The overall gene order of the B. cockerelli mitogenome is identical to three other published Psylloidea mitogenomes: one species from the Triozidae, Paratrioza sinica; and two species from the Psyllidae, Cacopsylla coccinea and Pachypsylla venusta. This suggests all of these species share a common ancestral mitogenome. However, sequence analyses revealed differences between and among the insect families, in particular a unique region that can be folded into three stem-loop secondary structures present only within the B. cockerelli mitogenome. A phylogenetic tree based on the 13 PCGs matched an existing taxonomy scheme that was based on morphological characteristics. The available complete mitogenome sequence makes it accessible to all genes for future population diversity evaluation of B. cockerelli.

  15. Stratigraphical analysis of the neoproterozoic sedimentary sequences of the Sao Francisco Basin

    International Nuclear Information System (INIS)

    Martins, Mariela; Lemos, Valesca Brasil

    2007-01-01

    A stratigraphic analysis was performed under the principles of Sequence Stratigraphy on the neoproterozoic sedimentary sequences of the Sao Francisco Basin (Central Brazil). Three periods of deposition separated by unconformities were recognized in the Sao Francisco Megasequence: (1) Sequences 1 and 2, a cryogenian glaciogenic sequence, followed by a distal scarp carbonate ramp, developed during stable conditions, (2) Sequence 3, a Upper Cryogenian stack homoclinal ramps with mixed carbonate-siliciclastic sedimentation, deposited under a progressive influence of compressional stresses of the Brasiliano Cycle, (3) Sequence 4, a Lower Ediacaran shallow platform dominated by siliciclastic sedimentation of molassic nature, the erosion product of the nearby uplifted thrust sheets. Each of the carbonate-bearing sequences presents a distinct δ 13 C isotopic signature. The superposition to the global curve for carbon isotopic variation allowed the recognition of a major depositional hiatus between the Paranoa and Sao Francisco Megasequences, and suggested that the glacial diamictite deposition (Jequitai Formation) took place most probably around 800 Ma. This constrains the Sao Francisco Megasequence deposition to the interval between 800 and 600 Ma (the known ages of the Brasiliano Orogeny defines the upper limit). A minor depositional hiatus (700.680 Ma) was also identified separating sequences 2 and 3. Isotopic analyses suggest that from then on, more restricted environmental conditions were established in the basin, probably associated with a first order global event, which prevailed throughout deposition of the Sequence 3. (author)

  16. Oasis: online analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Capece, Vincenzo; Garcia Vizcaino, Julio C; Vidal, Ramon; Rahman, Raza-Ur; Pena Centeno, Tonatiuh; Shomroni, Orr; Suberviola, Irantzu; Fischer, Andre; Bonn, Stefan

    2015-07-01

    Oasis is a web application that allows for the fast and flexible online analysis of small-RNA-seq (sRNA-seq) data. It was designed for the end user in the lab, providing an easy-to-use web frontend including video tutorials, demo data and best practice step-by-step guidelines on how to analyze sRNA-seq data. Oasis' exclusive selling points are a differential expression module that allows for the multivariate analysis of samples, a classification module for robust biomarker detection and an advanced programming interface that supports the batch submission of jobs. Both modules include the analysis of novel miRNAs, miRNA targets and functional analyses including GO and pathway enrichment. Oasis generates downloadable interactive web reports for easy visualization, exploration and analysis of data on a local system. Finally, Oasis' modular workflow enables for the rapid (re-) analysis of data. Oasis is implemented in Python, R, Java, PHP, C++ and JavaScript. It is freely available at http://oasis.dzne.de. stefan.bonn@dzne.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  17. Establishment of screening technique for mutant cell and analysis of base sequence in the mutation

    International Nuclear Information System (INIS)

    Sofuni, Toshio; Nomi, Takehiko; Yamada, Masami; Masumura, Kenichi

    2000-01-01

    This research project aimed to establish an easy and quick detection method for radiation-induced mutation using molecular-biological techniques and an effective analyzing method for the molecular changes in base sequence. In this year, Spi mutants derived from γ-radiation exposed mouse were analyzed by PCR method and DNA sequence method. Male transgenic mice were exposed to γ-ray at 5,10, 50 Gy and the transgene was taken out from the genome DNA from the spleen in vivo packaging method. Spi mutant plaques were obtained by infecting the recovered phage to E. coli. Sequence analysis for the mutants was made using ALFred DNA sequencer and SequiTherm TM Long-Red Cycle sequencing kit. Sequence analysis was carried out for 41 of 50 independent Spi mutants obtained. The deletions were classified into 4 groups; Group 1 included 15 mutants that were characterized with a large deletion (43 bp-10 kb) with a short homologous sequence. Group 2 included 11 mutants of a large deletion having no homologous sequence at the connecting region. Group 3 included 11 mutants having a short deletion of less than 20 bp, which occurred in the non-repetitive sequence of gam gene and possibly caused by oxidative breakage of DNA or recombination of DNA fragment produced by the breakage. Group 4 included 4 mutants having deletions as short as 20 bp or less in the repetitive sequence of gam gene, resulting in an alteration of the reading frame. Thus, the synthesis of Gam protein was terminated by the appearance of TGA between code 13 and 14 of redB gene, leading to inactivation of gam gene and redBA gene. These results indicated that most of Spi mutants had a deletion in red/gam region and the deletions in more than half mutants occurred in homologous sequences as short as 8 bp. (M.N.)

  18. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  19. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

    Directory of Open Access Journals (Sweden)

    Morgan Kullberg

    Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.

  20. Cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of Clostridium chauvoei

    Directory of Open Access Journals (Sweden)

    Saroj K. Dangi

    2017-09-01

    Full Text Available Aim: Blackleg disease is caused by Clostridium chauvoei in ruminants. Although virulence factors such as C. chauvoei toxin A, sialidase, and flagellin are well characterized, hyaluronidases of C. chauvoei are not characterized. The present study was aimed at cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of C. chauvoei. Materials and Methods: C. chauvoei strain ATCC 10092 was grown in ATCC 2107 media and confirmed by polymerase chain reaction (PCR using the primers specific for 16-23S rDNA spacer region. nagH gene of C. chauvoei was amplified and cloned into pRham-SUMO vector and transformed into Escherichia cloni 10G cells. The construct was then transformed into E. cloni cells. Colony PCR was carried out to screen the colonies followed by sequencing of nagH gene in the construct. Results: PCR amplification yielded nagH gene of 1143 bp product, which was cloned in prokaryotic expression system. Colony PCR, as well as sequencing of nagH gene, confirmed the presence of insert. Sequence was then subjected to BLAST analysis of NCBI, which confirmed that the sequence was indeed of nagH gene of C. chauvoei. Phylogenetic analysis of the sequence showed that it is closely related to Clostridium perfringens and Clostridium paraputrificum. Conclusion: The gene for virulence factor nagH was cloned into a prokaryotic expression vector and confirmed by sequencing.

  1. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features

    Directory of Open Access Journals (Sweden)

    Aaron Sievers

    2017-04-01

    Full Text Available In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4 on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs, which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST and the

  2. Genomic sequence of 'Candidatus Liberibacter solanacearum' haplotype C and its comparison with haplotype A and B genomes.

    Directory of Open Access Journals (Sweden)

    Jinhui Wang

    Full Text Available Haplotypes A and B of 'Candidatus Liberibacter solanacearum' (CLso are associated with diseases of solanaceous plants, especially Zebra chip disease of potato, and haplotypes C, D and E are associated with symptoms on apiaceous plants. To date, one complete genome of haplotype B and two high quality draft genomes of haplotype A have been obtained for these unculturable bacteria using metagenomics from the psyllid vector Bactericera cockerelli. Here, we present the first genomic sequences obtained for the carrot-associated CLso. These two genomic sequences of haplotype C, FIN114 (1.24 Mbp and FIN111 (1.20 Mbp, were obtained from carrot psyllids (Trioza apicalis harboring CLso. Genomic comparisons between the haplotypes A, B and C revealed that the genome organization differs between these haplotypes, due to large inversions and other recombinations. Comparison of protein-coding genes indicated that the core genome of CLso consists of 885 ortholog groups, with the pan-genome consisting of 1327 ortholog groups. Twenty-seven ortholog groups are unique to CLso haplotype C, whilst 11 ortholog groups shared by the haplotypes A and B, are not found in the haplotype C. Some of these ortholog groups that are not part of the core genome may encode functions related to interactions with the different host plant and psyllid species.

  3. RNA-sequencing analysis of fungi-induced transcripts from the bamboo wireworm Melanotus cribricollis (Coleoptera: Elateridae larvae.

    Directory of Open Access Journals (Sweden)

    Bi-Huan Ye

    Full Text Available Larvae of Melanotus cribricollis, feed on bamboo shoots and roots, causing serious damage to bamboo in Southern China. However, there is currently no effective control measure to limit the population of this underground pest. Previously, a new entomopathogenic fungal strain isolated from M. cribricollis larvae cadavers named Metarhizium pingshaense WP08 showed high pathogenic efficacy indoors, indicated that the fungus could be used as a bio-control measure. So far, the genetic backgrounds of both M. cribricollis and M. pingshaense WP08 were blank. Here, we analyzed the whole transcriptome of M. cribricollis larvae, infected with M. pingshaense WP08 or not, using high-throughput next generation sequencing technology. In addition, the transcriptome sequencing of M. pingshaense WP08 was also performed for data separation of those two non-model species. The reliability of the RNA-Seq data was also validated through qRT-PCR experiment. The de novo assembly, functional annotation, sequence comparison of four insect species, and analysis of DEGs, enriched pathways, GO terms and immune related candidate genes were operated. The results indicated that, multiple defense mechanisms of M. cribricollis larvae are initiated to protect against the more serious negative effects caused by fungal infection. To our knowledge, this was the first report of transcriptome analysis of Melanotus spp. infected with a fungus, and it could provide insights to further explore insect-fungi interaction mechanisms.

  4. WebMGA: a customizable web server for fast metagenomic sequence analysis.

    Science.gov (United States)

    Wu, Sitao; Zhu, Zhengwei; Fu, Liming; Niu, Beifang; Li, Weizhong

    2011-09-07

    The new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc. We developed WebMGA, a customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA is freely available at http://weizhongli-lab.org/metagenomic-analysis. WebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.

  5. WebMGA: a customizable web server for fast metagenomic sequence analysis

    Directory of Open Access Journals (Sweden)

    Niu Beifang

    2011-09-01

    Full Text Available Abstract Background The new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc. Results We developed WebMGA, a customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA is freely available at http://weizhongli-lab.org/metagenomic-analysis. Conclusions WebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.

  6. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  7. The Comparison of Streptococcus agalactiae Isolated from Fish and Bovine using Multilocus Sequence Typing

    Directory of Open Access Journals (Sweden)

    ANGELA MARIANA LUSIASTUTI

    2013-12-01

    Full Text Available Multilocus sequence typing (MLST has greater utility for determining the recent ancestral lineage and the relatedness of individual strains. Group B streptococci (GBS is one of the major causes of subclinical mastitis of dairy cattle in several countries. GBS also sporadically causes epizootic infections in fish. The aim of this study was to compare the evolutionary lineage of fish and bovine isolates in relation to the S. agalactiae global population as a whole by comparing the MLST profiles. Twenty S. agalactiae isolates were obtained from dairy cattle and fish. PCR products were amplified with seven different oligonucleotide primer pairs designed from the NEM316 GBS genome sequence. Clone complexes demonstrated that bovine and fish isolates were separate populations. These findings lead us to conclude that fish S. agalactiae is not a zoonotic agent for bovine. MLST could help clarify the emergence of pathogenic clones and to decide whether the host acts as a reservoir for another pathogenic lineage.

  8. A comparison of chloroplast genome sequences in Aconitum (Ranunculaceae: a traditional herbal medicinal genus

    Directory of Open Access Journals (Sweden)

    Hanghui Kong

    2017-11-01

    Full Text Available The herbal medicinal genus Aconitum L., belonging to the Ranunculaceae family, represents the earliest diverging lineage within the eudicots. It currently comprises of two subgenera, A. subgenus Lycoctonum and A. subg. Aconitum. The complete chloroplast (cp genome sequences were characterized in three species: A. angustius, A. finetianum, and A. sinomontanum in subg. Lycoctonum and compared to other Aconitum species to clarify their phylogenetic relationship and provide molecular information for utilization of Aconitum species particularly in Eastern Asia. The length of the chloroplast genome sequences were 156,109 bp in A. angustius, 155,625 bp in A. finetianum and 157,215 bp in A. sinomontanum, with each species possessing 126 genes with 84 protein coding genes (PCGs. While genomic rearrangements were absent, structural variation was detected in the LSC/IR/SSC boundaries. Five pseudogenes were identified, among which Ψrps19 and Ψycf1 were in the LSC/IR/SSC boundaries, Ψrps16 and ΨinfA in the LSC region, and Ψycf15 in the IRb region. The nucleotide variability (Pi of Aconitum was estimated to be 0.00549, with comparably higher variations in the LSC and SSC than the IR regions. Eight intergenic regions were revealed to be highly variable and a total of 58–62 simple sequence repeats (SSRs were detected in all three species. More than 80% of SSRs were present in the LSC region. Altogether, 64.41% and 46.81% of SSRs are mononucleotides in subg. Lycoctonum and subg. Aconitum, respectively, while a higher percentage of di-, tri-, tetra-, and penta- SSRs were present in subg. Aconitum. Most species of subg. Aconitum in Eastern Asia were first used for phylogenetic analyses. The availability of the complete cp genome sequences of these species in subg. Lycoctonum will benefit future phylogenetic analyses and aid in germplasm utilization in Aconitum species.

  9. A comparison of chloroplast genome sequences in Aconitum (Ranunculaceae): a traditional herbal medicinal genus.

    Science.gov (United States)

    Kong, Hanghui; Liu, Wanzhen; Yao, Gang; Gong, Wei

    2017-01-01

    The herbal medicinal genus Aconitum L., belonging to the Ranunculaceae family, represents the earliest diverging lineage within the eudicots. It currently comprises of two subgenera, A . subgenus Lycoctonum and A . subg. Aconitum . The complete chloroplast (cp) genome sequences were characterized in three species: A. angustius , A. finetianum , and A. sinomontanum in subg. Lycoctonum and compared to other Aconitum species to clarify their phylogenetic relationship and provide molecular information for utilization of Aconitum species particularly in Eastern Asia. The length of the chloroplast genome sequences were 156,109 bp in A. angustius , 155,625 bp in A. finetianum and 157,215 bp in A. sinomontanum , with each species possessing 126 genes with 84 protein coding genes (PCGs). While genomic rearrangements were absent, structural variation was detected in the LSC/IR/SSC boundaries. Five pseudogenes were identified, among which Ψ rps 19 and Ψ ycf 1 were in the LSC/IR/SSC boundaries, Ψ rps 16 and Ψ inf A in the LSC region, and Ψ ycf 15 in the IRb region. The nucleotide variability ( Pi ) of Aconitum was estimated to be 0.00549, with comparably higher variations in the LSC and SSC than the IR regions. Eight intergenic regions were revealed to be highly variable and a total of 58-62 simple sequence repeats (SSRs) were detected in all three species. More than 80% of SSRs were present in the LSC region. Altogether, 64.41% and 46.81% of SSRs are mononucleotides in subg. Lycoctonum and subg. Aconitum , respectively, while a higher percentage of di-, tri-, tetra-, and penta- SSRs were present in subg. Aconitum . Most species of subg. Aconitum in Eastern Asia were first used for phylogenetic analyses. The availability of the complete cp genome sequences of these species in subg. Lycoctonum will benefit future phylogenetic analyses and aid in germplasm utilization in Aconitum species.

  10. The scale analysis sequence for LWR fuel depletion

    International Nuclear Information System (INIS)

    Hermann, O.W.; Parks, C.V.

    1991-01-01

    The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system is used extensively to perform away-from-reactor safety analysis (particularly criticality safety, shielding, heat transfer analyses) for spent light water reactor (LWR) fuel. Spent fuel characteristics such as radiation sources, heat generation sources, and isotopic concentrations can be computed within SCALE using the SAS2 control module. A significantly enhanced version of the SAS2 control module, which is denoted as SAS2H, has been made available with the release of SCALE-4. For each time-dependent fuel composition, SAS2H performs one-dimensional (1-D) neutron transport analyses (via XSDRNPM-S) of the reactor fuel assembly using a two-part procedure with two separate unit-cell-lattice models. The cross sections derived from a transport analysis at each time step are used in a point-depletion computation (via ORIGEN-S) that produces the burnup-dependent fuel composition to be used in the next spectral calculation. A final ORIGEN-S case is used to perform the complete depletion/decay analysis using the burnup-dependent cross sections. The techniques used by SAS2H and two recent applications of the code are reviewed in this paper. 17 refs., 5 figs., 5 tabs

  11. Sequencing and phylogenetic analysis of tobacco virus 2, a polerovirus from Nicotiana tabacum.

    Science.gov (United States)

    Zhou, Benguo; Wang, Fang; Zhang, Xuesong; Zhang, Lina; Lin, Huafeng

    2017-07-01

    The complete genome sequence of a new virus, provisionally named tobacco virus 2 (TV2), was determined and identified from leaves of tobacco (Nicotiana tabacum) exhibiting leaf mosaic, yellowing, and deformity, in Anhui Province, China. The genome sequence of TV2 comprises 5,979 nucleotides, with 87% nucleotide sequence identity to potato leafroll virus (PLRV). Its genome organization is similar to that of PLRV, containing six open reading frames (ORFs) that potentially encode proteins with putative functions in cell-to-cell movement and suppression of RNA silencing. Phylogenetic analysis of the nucleotide sequence placed TV2 alongside members of the genus Polerovirus in the family Luteoviridae. To the best our knowledge, this study is the first report of a complete genome sequence of a new polerovirus identified in tobacco.

  12. Mesoscopic modeling of DNA denaturation rates: Sequence dependence and experimental comparison

    Energy Technology Data Exchange (ETDEWEB)

    Dahlen, Oda, E-mail: oda.dahlen@ntnu.no; Erp, Titus S. van, E-mail: titus.van.erp@ntnu.no [Department of Chemistry, Norwegian University of Science and Technology (NTNU), Høgskoleringen 5, Realfagbygget D3-117 7491 Trondheim (Norway)

    2015-06-21

    Using rare event simulation techniques, we calculated DNA denaturation rate constants for a range of sequences and temperatures for the Peyrard-Bishop-Dauxois (PBD) model with two different parameter sets. We studied a larger variety of sequences compared to previous studies that only consider DNA homopolymers and DNA sequences containing an equal amount of weak AT- and strong GC-base pairs. Our results show that, contrary to previous findings, an even distribution of the strong GC-base pairs does not always result in the fastest possible denaturation. In addition, we applied an adaptation of the PBD model to study hairpin denaturation for which experimental data are available. This is the first quantitative study in which dynamical results from the mesoscopic PBD model have been compared with experiments. Our results show that present parameterized models, although giving good results regarding thermodynamic properties, overestimate denaturation rates by orders of magnitude. We believe that our dynamical approach is, therefore, an important tool for verifying DNA models and for developing next generation models that have higher predictive power than present ones.

  13. Complete plastid genome sequence of goosegrass (Eleusine indica) and comparison with other Poaceae.

    Science.gov (United States)

    Zhang, Hui; Hall, Nathan; McElroy, J Scott; Lowe, Elijah K; Goertzen, Leslie R

    2017-02-05

    Eleusine indica, also known as goosegrass, is a serious weed in at least 42 countries. In this paper we report the complete plastid genome sequence of goosegrass obtained by de novo assembly of paired-end and mate-paired reads generated by Illumina sequencing of total genomic DNA. The goosegrass plastome is a circular molecule of 135,151bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 20,919 bases. The large (LSC) and the small (SSC) single-copy regions span 80,667 bases and 12,646 bases, respectively. The plastome of goosegrass has 38.19% GC content and includes 108 unique genes, of which 76 are protein-coding, 28 are transfer RNA, and 4 are ribosomal RNA. The goosegrass plastome sequence was compared to eight other species of Poaceae. Although generally conserved with respect to Poaceae, this genomic resource will be useful for evolutionary studies within this weed species and the genus Eleusine. Copyright © 2016. Published by Elsevier B.V.

  14. An analysis of LOCA sequences in the development of severe accident analysis DB

    International Nuclear Information System (INIS)

    Choi, Young; Park, Soo Yong; Ahn, Kwang-Il; Kim, D.H.

    2006-01-01

    Although a Level 2 PSA was performed for the Korean Standard Power Plants (KSNPs), and it considered the necessary sequences for an assessment of the containment integrity and source term analysis. In terms of an accident management, however, more cases causing severe core damage need to be analyzed and arranged systematically for an easy access to the results. At present, KAERI is calculating the severe accident sequences intensively for various initiating events and generating a database for the accident progression including thermal hydraulic and source term behaviours. The developed Database (DB) system includes a graphical display for a plant and equipment status, previous research results by knowledge-base technique, and the expected plant behaviour. The plant model used in this paper is oriented to the case of LOCAs related severe accident phenomena and thus can simulate the plant behaviours for a severe accident. Therefore the developed system may play a central role as an information source for decision-making for a severe accident management, and will be used as a training simulator for a severe accident management. (author)

  15. Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

    Science.gov (United States)

    Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

    2012-01-01

    Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.

  16. Craspedacusta sowerbii, Lankester 1880 – population dispersal analysis using COI and ITS sequences

    Directory of Open Access Journals (Sweden)

    Franz BRÜMMER

    2009-02-01

    Full Text Available Craspedacusta sowerbii (Hydrozoa, Limnomedusae, Olindiidae is a freshwater jellyfish, which was discovered in England in 1880. Although thought to originate in South America, it became obvious that the species is native to the Yangtze River system in China. It has spread from China into lakes all over the world. Many different species, variations and sub-species have been described based on morphological characters. Specimens discovered in North America were described as separate species, as morphological differences appeared to be significant compared to European specimens. Even within Europe, differences were assumed to be obvious. Up to this point, three valid species are published; others are considered by various scientists to be true species as well, but mostly are recognized as variations. To obtain further insight into population dynamics of C. sowerbii as well as molecular information on the species itself, sequences of internal transcribed spacers (ITS and cytochrome oxidase subunit I (COI have been used to analyze specimens collected in Germany and Austria. These sequences have been compared to sequences published of different Chinese Craspedacusta species and variations. In addition, morphological descriptions were compared. For the COI sequences, we found uniformity throughout the complete set of samples. However, no comparisons could be made, as no data had been published on COI of Chinese specimens. ITS1, 5.8S and ITS2, including partial 18S and 28S, sequences, were almost uniform within the German populations, showing only minor base pair exchanges. However, comparisons to Chinese organisms indicated, that the investigated sequences of C. sowerbii from Germany and Austria are similar, although not identical in morphology, to Craspedacusta sowerbii var. kiatingi from China. Overall our data support the assumption that there are three valid species, with the possibility of C. ziguiensis being a fourth one, and several

  17. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  18. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    Directory of Open Access Journals (Sweden)

    Arias Covadonga

    2007-06-01

    Full Text Available Abstract Background The ciliate protozoan Ichthyophthirius multifiliis (Ich is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate. Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan. BLASTX searches produced 2,518 significant (E-value -5 hits and further Gene Ontology (GO analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289. Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence.

  19. Genetic mutation analysis of human gastric adenocarcinomas using ion torrent sequencing platform.

    Directory of Open Access Journals (Sweden)

    Zhi Xu

    Full Text Available Gastric cancer is the one of the major causes of cancer-related death, especially in Asia. Gastric adenocarcinoma, the most common type of gastric cancer, is heterogeneous and its incidence and cause varies widely with geographical regions, gender, ethnicity, and diet. Since unique mutations have been observed in individual human cancer samples, identification and characterization of the molecular alterations underlying individual gastric adenocarcinomas is a critical step for developing more effective, personalized therapies. Until recently, identifying genetic mutations on an individual basis by DNA sequencing remained a daunting task. Recent advances in new next-generation DNA sequencing technologies, such as the semiconductor-based Ion Torrent sequencing platform, makes DNA sequencing cheaper, faster, and more reliable. In this study, we aim to identify genetic mutations in the genes which are targeted by drugs in clinical use or are under development in individual human gastric adenocarcinoma samples using Ion Torrent sequencing. We sequenced 737 loci from 45 cancer-related genes in 238 human gastric adenocarcinoma samples using the Ion Torrent Ampliseq Cancer Panel. The sequencing analysis revealed a high occurrence of mutations along the TP53 locus (9.7% in our sample set. Thus, this study indicates the utility of a cost and time efficient tool such as Ion Torrent sequencing to screen cancer mutations for the development of personalized cancer therapy.

  20. Introduction of the hybcell-based compact sequencing technology and comparison to state-of-the-art methodologies for KRAS mutation detection.

    Science.gov (United States)

    Zopf, Agnes; Raim, Roman; Danzer, Martin; Niklas, Norbert; Spilka, Rita; Pröll, Johannes; Gabriel, Christian; Nechansky, Andreas; Roucka, Markus

    2015-03-01

    The detection of KRAS mutations in codons 12 and 13 is critical for anti-EGFR therapy strategies; however, only those methodologies with high sensitivity, specificity, and accuracy as well as the best cost and turnaround balance are suitable for routine daily testing. Here we compared the performance of compact sequencing using the novel hybcell technology with 454 next-generation sequencing (454-NGS), Sanger sequencing, and pyrosequencing, using an evaluation panel of 35 specimens. A total of 32 mutations and 10 wild-type cases were reported using 454-NGS as the reference method. Specificity ranged from 100% for Sanger sequencing to 80% for pyrosequencing. Sanger sequencing and hybcell-based compact sequencing achieved a sensitivity of 96%, whereas pyrosequencing had a sensitivity of 88%. Accuracy was 97% for Sanger sequencing, 85% for pyrosequencing, and 94% for hybcell-based compact sequencing. Quantitative results were obtained for 454-NGS and hybcell-based compact sequencing data, resulting in a significant correlation (r = 0.914). Whereas pyrosequencing and Sanger sequencing were not able to detect multiple mutated cell clones within one tumor specimen, 454-NGS and the hybcell-based compact sequencing detected multiple mutations in two specimens. Our comparison shows that the hybcell-based compact sequencing is a valuable alternative to state-of-the-art methodologies used for detection of clinically relevant point mutations.

  1. galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

    Science.gov (United States)

    Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

    2004-06-12

    The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

  2. QTL analysis by sequencing of Water Use Efficiency (WUE) in potato

    DEFF Research Database (Denmark)

    Kaminski, Kacper Piotr; Sønderkær, Mads; Sørensen, Kirsten Kørup

    2013-01-01

    The traditional approach to potato breeding, the classical “mate and phenotype” approach is relatively costly and because phenotyping and growth capacity is limited, this are being slowly replaced by Marker Assisted Selection (MAS) breeding schemes. MAS is based on the presence of DNA polymorphic.......sparsipilum), phenotyped for water use efficiency. This population has also previously been phenotyped for the total glycoalkaloid (TGA) content....... and time consuming process. Here, a novel method for Quantitative Trait Locus (QTL) analysis has been developed, that allows for development of specific markers by use of genomic sequence reads and the recently published reference genome sequence for potato. Prior to sequencing the mapping population...

  3. Sequence length variation, indel costs, and congruence in sensitivity analysis

    DEFF Research Database (Denmark)

    Aagesen, Lone; Petersen, Gitte; Seberg, Ole

    2005-01-01

    The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which...... the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously...... preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation...

  4. Accident sequences and causes analysis in a hydrogen production process

    Energy Technology Data Exchange (ETDEWEB)

    Jae, Moo Sung; Hwang, Seok Won; Kang, Kyong Min; Ryu, Jung Hyun; Kim, Min Soo; Cho, Nam Chul; Jeon, Ho Jun; Jung, Gun Hyo; Han, Kyu Min; Lee, Seng Woo [Hanyang Univ., Seoul (Korea, Republic of)

    2006-03-15

    Since hydrogen production facility using IS process requires high temperature of nuclear power plant, safety assessment should be performed to guarantee the safety of facility. First of all, accident cases of hydrogen production and utilization has been surveyed. Based on the results, risk factors which can be derived from hydrogen production facility were identified. Besides the correlation between risk factors are schematized using influence diagram. Also initiating events of hydrogen production facility were identified and accident scenario development and quantification were performed. PSA methodology was used for identification of initiating event and master logic diagram was used for selection method of initiating event. Event tree analysis was used for quantification of accident scenario. The sum of all the leakage frequencies is 1.22x10{sup -4} which is similar value (1.0x10{sup -4}) for core damage frequency that International Nuclear Safety Advisory Group of IAEA suggested as a criteria.

  5. Image registration based on virtual frame sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Chen, H.; Ng, W.S. [Nanyang Technological University, Computer Integrated Medical Intervention Laboratory, School of Mechanical and Aerospace Engineering, Singapore (Singapore); Shi, D. (Nanyang Technological University, School of Computer Engineering, Singapore, Singpore); Wee, S.B. [Tan Tock Seng Hospital, Department of General Surgery, Singapore (Singapore)

    2007-08-15

    This paper is to propose a new framework for medical image registration with large nonrigid deformations, which still remains one of the biggest challenges for image fusion and further analysis in many medical applications. Registration problem is formulated as to recover a deformation process with the known initial state and final state. To deal with large nonlinear deformations, virtual frames are proposed to be inserted to model the deformation process. A time parameter is introduced and the deformation between consecutive frames is described with a linear affine transformation. Experiments are conducted with simple geometric deformation as well as complex deformations presented in MRI and ultrasound images. All the deformations are characterized with nonlinearity. The positive results demonstrated the effectiveness of this algorithm. The framework proposed in this paper is feasible to register medical images with large nonlinear deformations and is especially useful for sequential images. (orig.)

  6. CoCoNUT: an efficient system for the comparison and analysis of genomes

    Directory of Open Access Journals (Sweden)

    Kurtz Stefan

    2008-11-01

    Full Text Available Abstract Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit that allows solving several different tasks in a unified framework: (1 finding regions of high similarity among multiple genomic sequences and aligning them, (2 comparing two draft or multi-chromosomal genomes, (3 locating large segmental duplications in large genomic sequences, and (4 mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component, CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics.

  7. Next-generation sequencing of multiple individuals per barcoded library by deconvolution of sequenced amplicons using endonuclease fragment analysis

    DEFF Research Database (Denmark)

    Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta

    2014-01-01

    The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease...... digestion of PCR amplicons prior to library preparation, creating a specific fragment pattern for each individual that can be resolved after sequencing. By using both barcodes and restriction fragment patterns, we demonstrate the ability to sequence the human melanocortin 1 receptor (MC1R) genes from 72...... individuals using only 24 barcoded libraries....

  8. VisRseq: R-based visual framework for analysis of sequencing data

    OpenAIRE

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven JM

    2015-01-01

    Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for ...

  9. Targeted DNA Methylation Analysis by High Throughput Sequencing in Porcine Peri-attachment Embryos

    OpenAIRE

    MORRILL, Benson H.; COX, Lindsay; WARD, Anika; HEYWOOD, Sierra; PRATHER, Randall S.; ISOM, S. Clay

    2013-01-01

    Abstract The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx plat...

  10. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

    OpenAIRE

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S.; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M.; Tettelin, Herv?; White, Owen; Angiuoli, Samuel V.; Mahurkar, Anup; Fricke, W. Florian

    2017-01-01

    Background The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. Results CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. ...

  11. Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species

    Science.gov (United States)

    Rey, Michael W; Ramaiya, Preethi; Nelson, Beth A; Brody-Karpin, Shari D; Zaretsky, Elizabeth J; Tang, Maria; de Leon, Alfredo Lopez; Xiang, Henry; Gusti, Veronica; Clausen, Ib Groth; Olsen, Peter B; Rasmussen, Michael D; Andersen, Jens T; Jørgensen, Per L; Larsen, Thomas S; Sorokin, Alexei; Bolotin, Alexander; Lapidus, Alla; Galleron, Nathalie; Ehrlich, S Dusko; Berka, Randy M

    2004-01-01

    Background Bacillus licheniformis is a Gram-positive, spore-forming soil bacterium that is used in the biotechnology industry to manufacture enzymes, antibiotics, biochemicals and consumer products. This species is closely related to the well studied model organism Bacillus subtilis, and produces an assortment of extracellular enzymes that may contribute to nutrient cycling in nature. Results We determined the complete nucleotide sequence of the B. licheniformis ATCC 14580 genome which comprises a circular chromosome of 4,222,336 base-pairs (bp) containing 4,208 predicted protein-coding genes with an average size of 873 bp, seven rRNA operons, and 72 tRNA genes. The B. licheniformis chromosome contains large regions that are colinear with the genomes of B. subtilis and Bacillus halodurans, and approximately 80% of the predicted B. licheniformis coding sequences have B. subtilis orthologs. Conclusions Despite the unmistakable organizational similarities between the B. licheniformis and B. subtilis genomes, there are notable differences in the numbers and locations of prophages, transposable elements and a number of extracellular enzymes and secondary metabolic pathway operons that distinguish these species. Differences include a region of more than 80 kilobases (kb) that comprises a cluster of polyketide synthase genes and a second operon of 38 kb encoding plipastatin synthase enzymes that are absent in the B. licheniformis genome. The availability of a completed genome sequence for B. licheniformis should facilitate the design and construction of improved industrial strains and allow for comparative genomics and evolutionary studies within this group of Bacillaceae. PMID:15461803

  12. Comparison of Control of Clostridium difficile Infection in Six English Hospitals Using Whole-Genome Sequencing.

    Science.gov (United States)

    Eyre, David W; Fawley, Warren N; Rajgopal, Anu; Settle, Christopher; Mortimer, Kalani; Goldenberg, Simon D; Dawson, Susan; Crook, Derrick W; Peto, Tim E A; Walker, A Sarah; Wilcox, Mark H

    2017-08-01

    Variation in Clostridium difficile infection (CDI) rates between healthcare institutions suggests overall incidence could be reduced if the lowest rates could be achieved more widely. We used whole-genome sequencing (WGS) of consecutive C. difficile isolates from 6 English hospitals over 1 year (2013-14) to compare infection control performance. Fecal samples with a positive initial screen for C. difficile were sequenced. Within each hospital, we estimated the proportion of cases plausibly acquired from previous cases. Overall, 851/971 (87.6%) sequenced samples contained toxin genes, and 451 (46.4%) were fecal-toxin-positive. Of 652 potentially toxigenic isolates >90-days after the study started, 128 (20%, 95% confidence interval [CI] 17-23%) were genetically linked (within ≤2 single nucleotide polymorphisms) to a prior patient's isolate from the previous 90 days. Hospital 2 had the fewest linked isolates, 7/105 (7%, 3-13%), hospital 1, 9/70 (13%, 6-23%), and hospitals 3-6 had similar proportions of linked isolates (22-26%) (P ≤ .002 comparing hospital-2 vs 3-6). Results were similar adjusting for locally circulating ribotypes. Adjusting for hospital, ribotype-027 had the highest proportion of linked isolates (57%, 95% CI 29-81%). Fecal-toxin-positive and toxin-negative patients were similarly likely to be a potential transmission donor, OR = 1.01 (0.68-1.49). There was no association between the estimated proportion of linked cases and testing rates. WGS can be used as a novel surveillance tool to identify varying rates of C. difficile transmission between institutions and therefore to allow targeted efforts to reduce CDI incidence. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America.

  13. Comparison of the aflR gene sequences of strains in Aspergillus section Flavi.

    Science.gov (United States)

    Lee, Chao-Zong; Liou, Guey-Yuh; Yuan, Gwo-Fang

    2006-01-01

    Aflatoxins are polyketide-derived secondary metabolites produced by Aspergillus parasiticus, Aspergillus flavus, Aspergillus nomius and a few other species. The toxic effects of aflatoxins have adverse consequences for human health and agricultural economics. The aflR gene, a regulatory gene for aflatoxin biosynthesis, encodes a protein containing a zinc-finger DNA-binding motif. Although Aspergillus oryzae and Aspergillus sojae, which are used in fermented foods and in ingredient manufacture, have no record of producing aflatoxin, they have been shown to possess an aflR gene. This study examined 34 strains of Aspergillus section Flavi. The aflR gene of 23 of these strains was successfully amplified and sequenced. No aflR PCR products were found in five A. sojae strains or six strains of A. oryzae. These PCR results suggested that the aflR gene is absent or significantly different in some A. sojae and A. oryzae strains. The sequenced aflR genes from the 23 positive strains had greater than 96.6 % similarity, which was particularly conserved in the zinc-finger DNA-binding domain. The aflR gene of A. sojae has two obvious characteristics: an extra CTCATG sequence fragment and a C to T transition that causes premature termination of AFLR protein synthesis. Differences between A. parasiticus/A. sojae and A. flavus/A. oryzae aflR genes were also identified. Some strains of A. flavus as well as A. flavus var. viridis, A. oryzae var. viridis and A. oryzae var. effuses have an A. oryzae-type aflR gene. For all strains with the A. oryzae-type aflR gene, there was no evidence of aflatoxin production. It is suggested that for safety reasons, the aflR gene could be examined to assess possible aflatoxin production by Aspergillus section Flavi strains.

  14. Allelic diversity of the MHC class II DRB genes in brown bears (Ursus arctos) and a comparison of DRB sequences within the family Ursidae.

    Science.gov (United States)

    Goda, N; Mano, T; Kosintsev, P; Vorobiev, A; Masuda, R

    2010-11-01

    The allelic diversity of the DRB locus in major histocompatibility complex (MHC) genes was analyzed in the brown bear (Ursus arctos) from the Hokkaido Island of Japan, Siberia, and Kodiak of Alaska. Nineteen alleles of the DRB exon 2 were identified from a total of 38 individuals of U. arctos and were highly polymorphic. Comparisons of non-synonymous and synonymous substitutions in the antigen-binding sites of deduced amino acid sequences indicated evidence for balancing selection on the bear DRB locus. The phylogenetic analysis of the DRB alleles among three genera (Ursus, Tremarctos, and Ailuropoda) in the family Ursidae revealed that DRB allelic lineages were not separated according to species. This strongly shows trans-species persistence of DRB alleles within the Ursidae. © 2010 John Wiley & Sons A/S.

  15. COMPARISON OF ROCURONIUM BROMIDE AND SUCCINYLCHOLINE CHLORIDE FOR USE DURING RAPID SEQUENCE INTUBATION IN ADULTS

    Directory of Open Access Journals (Sweden)

    Ch. Penchalaiah

    2015-08-01

    Full Text Available BACKGROUND AND OBJECTIVE : The goal of rapid sequence intubation is to secure the patients airway smoothly and quickly, minimizing the chances of regurgitation and aspiration of gastric contents. Traditionally succinylcholine chloride has been the neuromuscular blocking drug of choi ce for use in rapid sequence intubation because of its rapid onset of action and profound relaxation. Succinylcholine chloride remains unsurpassed in providing ideal intubating conditions. However the use of succinylcholine chloride is associated with many side effects like muscle pain, bradycardia, hyperkalaemia and rise in intragastric and intraocular pressure. Rocuronium bromide is the only drug currently available which has the rapidity of onset of action like succinylcholine chloride. Hence the present study was undertaken to compare rocuronium bromide with succinylcholine chloride for use during rapid sequence intubation in adult patients. METHODOLOGY : The study population consisted of 90 patients aged between 18 - 60 years posted for various elective su rgeries requiring general anaesthesia . S tudy population was randomly divided into 3 groups with 30 patients in each sub group. 1. Group I : Intubated with 1 mg kg - 1 of succinylcholine chloride (n=30 . 2. Grou p II : Intubated with rocuronium bromide 0.6 mg kg - 1 (n=30 . 3. Group III : Intubated with rocuronium bromide 0.9 mg kg - 1 (n=30 . Intubating conditions were assessed at 60 seconds based on the scale adopted by Toni Magorian et al. 1993. The haemodynamic para meters in the present study were compared using p - value obtained from student t - test . RESULTS : It was noted that succinylcholine chloride 1 mg kg - 1 body weight produced excellent intubating conditions in all patients. Rocuronium bromide 0.6 mg kg - 1 body we ight produced excellent intubating conditions in 53.33% of patients but produced good to excellent intubating conditions in 96.67% of patients. Rocuronium bromide 0.9 mg kg - 1

  16. SPIO-enhanced MR imaging for HCC detection in cirrhotic patient : comparison of various techniques for optimal sequence selection

    International Nuclear Information System (INIS)

    Kim, In Hwan; Lee, Jeong Min; Kwak, Hyo Sung; Kim, Chong Soo; Yu, Hee Chul; Kim, Tae Kon; Lee Soo Tiek

    2000-01-01

    To compare the efficacy of breathhold and non-breathhold sequences in the detection of hepatocellular carcinoma (HCC) in cirrhotic patients using superparamagnetic iron oxide (SPIO)-enhanced MR imaging, and to determine the optimal sequence combination. By means of unenhanced and iron-oxide-enhanced MRI, 29 patients with 49 nodular HCCs were evaluated for the presence of HCC nodules. Twenty-one were male and eight were female, and their ages ranged from 38 to 71 (mean, 56) years. Eight different MR sequences were used, including four non-breath-hold sequences and four breath-hold, and images were obtained before and after the administration of SPIO particles. Non-breath-hold sequences included T2-, proton density-weighted SE, and TSE imaging, while breath-hold sequences comprised T1-weighted fast low-angle shot (T1w FLASH), half-Fourier acquisition single shot turbo spine echo (HASTE), T2-weighted fast imaging with steady-state free precession (T2 * wFISP) and T2-weighted breath-hold TSE (T2wBHTSE). Image analysis involved both quantitative and qualitative analysis. The quantitative parameters calculated were signal-to noise (S/N) ratios for livers and tumors, contrast to noise (C/N) ratios for tumors seen on precontrast and postcontrast images, and percentage of signal intensity loss (PSIL) after SPIO injection. Images were analysed qualitatively in terms of image artifacts and lesion conspicuity, and prior to calculating sensitivity, the number of lesions detected using various pulse sequences were counted. SPIO had a marked effect on liver S/N ratio but a minimal effect on tumor S/N ratio. PSIL was best in T2 * wFISP images, while T2wSE images showed the second-best results (p less than 0.05). Tumor-to-liver C/N values were also highest with T2 * wFISP, while T2wTSE and HASTE images were next. Qualitative study showed that non-breath hold images and FISP were better than breath hold images in terms of lesion conspicuity. The latter, however, were much better

  17. Sequence detection analysis based on canonical correlation for steady-state visual evoked potential brain computer interfaces.

    Science.gov (United States)

    Cao, Lei; Ju, Zhengyu; Li, Jie; Jian, Rongjun; Jiang, Changjun

    2015-09-30

    Steady-state visual evoked potential (SSVEP) has been widely applied to develop brain computer interface (BCI) systems. The essence of SSVEP recognition is to recognize the frequency component of target stimulus focused by a subject significantly present in EEG spectrum. In this paper, a novel statistical approach based on sequence detection (SD) is proposed for improving the performance of SSVEP recognition. This method uses canonical correlation analysis (CCA) coefficients to observe SSVEP signal sequence. And then, a threshold strategy is utilized for SSVEP recognition. The result showed the classification performance with the longer duration of time window achieved the higher accuracy for most subjects. And the average time costing per trial was lower than the predefined recognition time. It was implicated that our approach could improve the speed of BCI system in contrast to other methods. Comparison with existing method(s): In comparison with other resultful algorithms, experimental accuracy of SD approach was better than those using a widely used CCA-based method and two newly proposed algorithms, least absolute shrinkage and selection operator (LASSO) recognition model as well as multivariate synchronization index (MSI) method. Furthermore, the information transfer rate (ITR) obtained by SD approach was higher than those using other three methods for most participants. These conclusions demonstrated that our proposed method was promising for a high-speed online BCI. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Comparison of aftershock sequences between 1975 Haicheng earthquake and 1976 Tangshan earthquake

    Science.gov (United States)

    Liu, B.

    2017-12-01

    The 1975 ML 7.3 Haicheng earthquake and the 1976 ML 7.8 Tangshan earthquake occurred in the same tectonic unit. There are significant differences in spatial-temporal distribution, number of aftershocks and time duration for the aftershock sequence followed by these two main shocks. As we all know, aftershocks could be triggered by the regional seismicity change derived from the main shock, which was caused by the Coulomb stress perturbation. Based on the rate- and state- dependent friction law, we quantitative estimated the possible aftershock time duration with a combination of seismicity data, and compared the results from different approaches. The results indicate that, aftershock time durations from the Tangshan main shock is several times of that form the Haicheng main shock. This can be explained by the significant relationship between aftershock time duration and earthquake nucleation history, normal stressand shear stress loading rateon the fault. In fact the obvious difference of earthquake nucleation history from these two main shocks is the foreshocks. 1975 Haicheng earthquake has clear and long foreshocks, while 1976 Tangshan earthquake did not have clear foreshocks. In that case, abundant foreshocks may mean a long and active nucleation process that may have changed (weakened) the rocks in the source regions, so they should have a shorter aftershock sequences for the reason that stress in weak rocks decay faster.

  19. [Comparison of 2D and 3D sequences for MRCP. Clinical value of the different techniques].

    Science.gov (United States)

    Wallnoefer, A M; Herrmann, K A; Beuers, U; Zech, C J; Gourtsoyianni, S; Reiser, M F; Schoenberg, S O

    2005-11-01

    Magnetic resonance cholangio-pancreaticograpy (MRCP) is a non-invasive imaging modality of the pancreatico-biliary system which plays an increasingly important role in the clinical and diagnostic workup of patients with biliary or pancreatic diseases. The present review is designed to give an overview of the currently available and appropriate sequences, their technical background, as well as new developments and their relevance to the various clinical issues and challenges. The impact of the latest technical innovations, such as integrated parallel imaging techniques and navigator-based respiratory triggering, on the diagnostic capacities of MRCP is discussed. In this context, the individual value of RARE, T2w single shot turbo/fast spin echo (SSFSE) and the recently introduced 3D T2w turbo/fast spin echo sequences (T2w 3D-T/FSE) is reviewed. RARE imaging may be preferred in severely ill patients with limitations in cooperation, SSFSE is particularly effective in differentiating benign and malignant stenosis, and 3D-FSE offers additional advantages in the detection of small biliary concrements.

  20. Comparison of retreatment ability of full-sequence reciprocating instrumentation and 360° rotary instrumentation.

    Science.gov (United States)

    Capar, Ismail Davut; Gok, Tuba; Orhan, Ezgi

    2015-12-01

    The purpose of the present study was to investigate the amount of root canal filling material after root canal filling removal with 360° rotary instrumentation or reciprocating motion with the same file sequence. Root canals of the 36 mandibular premolars were shaped with ProTaper Universal instruments up to size F2 and filled with corresponding single gutta-percha cone and sealer. The teeth were assigned to two retreatment groups (n = 18): group 1 360° rotational motion and group 2 reciprocating motion of ATR Tecnika motors (1310° clockwise and 578° counterclockwise). Retreatment procedure was performed with ProTaper Universal retreatment files with a sequence of D1-3 and ProTaper Universal F3 instruments. Total time required to remove filling material were recorded. Remaining filling material was examined under stereomicroscope at ×8 magnification. The data were analysed statistically using the Mann-Whitney U test, and testing was performed at 95 % confidence level (p  0.05) in terms of remaining filling material. The total time required for retreatment was shorter in 360° rotational motion group compared to reciprocating motion group (p instruments with reciprocating motion of ATR motor and conventional rotary motion have similar efficacy in root canal filling removal.

  1. Sequence comparison of six human microRNAs genes between tuberculosis patients and healthy individuals.

    Science.gov (United States)

    Amila, A; Acosta, A; Sarmiento, M E; Suraiya, Siti; Zafarina, Z; Panneerchelvam, S; Norazmi, M N

    2015-12-01

    MicroRNAs (miRNAs) play an important role in diseases development. Therefore, human miRNAs may be able to inhibit the survival of Mycobacterium tuberculosis (Mtb) in the human host by targeting critical genes of the pathogen. Mutations within miRNAs can alter their target selection, thereby preventing them from inhibiting Mtb genes, thus increasing host susceptibility to the disease. This study was undertaken to investigate the genetic association of pulmonary tuberculosis (TB) with six human miRNAs genes, namely, hsa-miR-370, hsa-miR-520d, hsa-miR-154, hsa-miR-497, hsa-miR-758, and hsa-miR-593, which have been predicted to interact with Mtb genes. The objective of the study was to determine the possible sequence variation of selected miRNA genes that are potentially associated with the inhibition of critical Mtb genes in TB patients. The study did not show differences in the sequences compared with healthy individuals without antecedents of TB. This result could have been influenced by the sample size and the selection of miRNA genes, which need to be addressed in future studies. Copyright © 2015 Asian African Society for Mycobacteriology. Published by Elsevier Ltd. All rights reserved.

  2. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-01-01

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  3. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan

    2012-02-17

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse\\'s genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  4. Comparison of Direct Sequencing, Real-Time PCR-High Resolution Melt (PCR-HRM) and PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) Analysis for Genotyping of Common Thiopurine Intolerant Variant Alleles NUDT15 c.415C>T and TPMT c.719A>G (TPMT*3C).

    Science.gov (United States)

    Fong, Wai-Ying; Ho, Chi-Chun; Poon, Wing-Tat

    2017-05-12

    Thiopurine intolerance and treatment-related toxicity, such as fatal myelosuppression, is related to non-function genetic variants encoding thiopurine S-methyltransferase (TPMT) and Nudix hydrolase 15 (NUDT15). Genetic testing of the common variants NUDT15:NM_018283.2:c.415C>T (Arg139Cys, dbSNP rs116855232 T allele) and TPMT: NM_000367.4:c.719A>G (TPMT*3C, dbSNP rs1142345 G allele) in East Asians including Chinese can potentially prevent treatment-related complications. Two complementary genotyping approaches, real-time PCR-high resolution melt (PCR-HRM) and PCR-restriction fragment length morphism (PCR-RFLP) analysis were evaluated using conventional PCR and Sanger sequencing genotyping as the gold standard. Sixty patient samples were tested, revealing seven patients (11.7%) heterozygous for NUDT15 c.415C>T, one patient homozygous for the variant and one patient heterozygous for the TPMT*3C non-function allele. No patient was found to harbor both variants. In total, nine out of 60 (15%) patients tested had genotypic evidence of thiopurine intolerance, which may require dosage adjustment or alternative medication should they be started on azathioprine, mercaptopurine or thioguanine. The two newly developed assays were more efficient and showed complete concordance (60/60, 100%) compared to the Sanger sequencing results. Accurate and cost-effective genotyping assays by real-time PCR-HRM and PCR-RFLP for NUDT15 c.415C>T and TPMT*3C were successfully developed. Further studies may establish their roles in genotype-informed clinical decision-making in the prevention of morbidity and mortality due to thiopurine intolerance.

  5. Identification of new polymorphic regions and differentiation of cultivated olives (Olea europaea L.) through plastome sequence comparison

    Science.gov (United States)

    2010-01-01

    Background The cultivated olive (Olea europaea L.) is the most agriculturally important species of the Oleaceae family. Although many studies have been performed on plastid polymorphisms to evaluate taxonomy, phylogeny and phylogeography of Olea subspecies, only few polymorphic regions discriminating among the agronomically and economically important olive cultivars have been identified. The objective of this study was to sequence the entire plastome of olive and analyze many potential polymorphic regions to develop new inter-cultivar genetic markers. Results The complete plastid genome of the olive cultivar Frantoio was determined by direct sequence analysis using universal and novel PCR primers designed to amplify all overlapping regions. The chloroplast genome of the olive has an organisation and gene order that is conserved among numerous Angiosperm species and do not contain any of the inversions, gene duplications, insertions, inverted repeat expansions and gene/intron losses that have been found in the chloroplast genomes of the genera Jasminum and Menodora, from the same family as Olea. The annotated sequence was used to evaluate the content of coding genes, the extent, and distribution of repeated and long dispersed sequences and the nucleotide composition pattern. These analyses provided essential information for structural, functional and comparative genomic studies in olive plastids. Furthermore, the alignment of the olive plastome sequence to those of other varieties and species identified 30 new organellar polymorphisms within the cultivated olive. Conclusions In addition to identifying mutations that may play a functional role in modifying the metabolism and adaptation of olive cultivars, the new chloroplast markers represent a valuable tool to assess the level of olive intercultivar plastome variation for use in population genetic analysis, phylogenesis, cultivar characterisation and DNA food tracking. PMID:20868482

  6. Reconsidering the generation time hypothesis based on nuclear ribosomal ITS sequence comparisons in annual and perennial angiosperms

    Directory of Open Access Journals (Sweden)

    Fiz-Palacios Omar

    2008-12-01

    Full Text Available Abstract Background Differences in plant annual/perennial habit are hypothesized to cause a generation time effect on divergence rates. Previous studies that compared rates of divergence for internal transcribed spacer (ITS1 and ITS2 sequences of nuclear ribosomal DNA (nrDNA in angiosperms have reached contradictory conclusions about whether differences in generation times (or other life history features are associated with divergence rate heterogeneity. We compared annual/perennial ITS divergence rates using published sequence data, employing sampling criteria to control for possible artifacts that might obscure any actual rate variation caused by annual/perennial differences. Results Relative rate tests employing ITS sequences from 16 phylogenetically-independent annual/perennial species pairs rejected rate homogeneity in only a few comparisons, with annuals more frequently exhibiting faster substitution rates. Treating branch length differences categorically (annual faster or perennial faster regardless of magnitude with a sign test often indicated an excess of annuals with faster substitution rates. Annuals showed an approximately 1.6-fold rate acceleration in nucleotide substitution models for ITS. Relative rates of three nuclear loci and two chloroplast regions for the annual Arabidopsis thaliana compared with two closely related Arabidopsis perennials indicated that divergence was faster for the annual. In contrast, A. thaliana ITS divergence rates were sometimes faster and sometimes slower than the perennial. In simulations, divergence rate differences of at least 3.5-fold were required to reject rate constancy in > 80 % of replicates using a nucleotide substitution model observed for the combination of ITS1 and ITS2. Simulations also showed that categorical treatment of branch length differences detected rate heterogeneity > 80% of the time with a 1.5-fold or greater rate difference. Conclusion Although rate homogeneity was not rejected

  7. MR diffusion weighted imaging of gastric cancer: b-value determination and comparison with routine sequences

    International Nuclear Information System (INIS)

    Zhao Xiaopeng; Tang Lei; Sun Yingshi; Li Jie; Cao Kun

    2007-01-01

    Objective: To choose the optimal b-values for the DWI of gastric cancer (GC), and investigate the value of DWI in the diagnosis of GCs. Methods: MRI examinations (T 1 WI, T 2 WI, and DWI) were performed on 31 patients with gastric cancer. Three diffusion-weighted sequences were designed with different b values, including 300 s/mm 2 (low), 600 s/mm 2 (intermediate), and 1000 s/mm 2 (high). Free water grade was used to evaluate the suppression of content in gastric lumen. Background contrast grade was used to evaluate the discriminating ability of different sequences between GC and nearby tissues. The ADCs of GCs, nearby gastric wall region, and free water in gastric lumen were measured. SNR Ca , CNR Ca-GW and SIR CaGW of high b-value DWI and routine MRI sequences were evaluated and compared. Results: The signal intensity of free water in gastric lumen decreased as b-value increased, and the SIR were 8.11± 0.77 (b=300 s/mm 2 ), 2.70±0.35 (b=600 s/mm 2 ), and 1.13±0.22 (b=1000 s/mm 2 ) (F55.368, P 2 =16.692, P 2 =9.923, P -3 mm 2 /s, (1.43±0.41) x 10 -3 mm 2 /s, and (1.18±0.25) x 10 -3 mm 2 /s; (F=12.066, P 1 WI (CNR: 12.46 vs. 2.35, Z=-3.746, P 2 WI (CNR: 12.46 vs. 3.92, Z=-3.518, P 2 ) is reasonable for DWI of GCs, which can reflect diffusion condition of water molecules more accurately, suppress signal of content in gastric lumen, and possess higher contrast. DWI can be a supplementary method of routine MRI examination for better demonstration of gastric cancers. (authors)

  8. Complete genome sequence of Shigella flexneri 5b and comparison with Shigella flexneri 2a

    Directory of Open Access Journals (Sweden)

    Xue Ying

    2006-07-01

    Full Text Available Abstract Background Shigella bacteria cause dysentery, which remains a significant threat to public health. Shigella flexneri is the most common species in both developing and developed countries. Five Shigella genomes have been sequenced, revealing dynamic and diverse features. To investigate the intra-species diversity of S. flexneri genomes further, we have sequenced the complete genome of S. flexneri 5b strain 8401 (abbreviated Sf8401 and compared it with S. flexneri 2a (Sf301. Results The Sf8401 chromosome is 4.5-Mb in size, a little smaller than that of Sf301, mainly because the former lacks the SHI-1 pathogenicity island (PAI. Compared with Sf301, there are 6 inversions and one translocation in Sf8401, which are probably mediated by insertion sequences (IS. There are clear differences in the known PAIs between these two genomes. The bacteriophage SfV segment remaining in SHI-O of Sf8401 is clearly larger than the remnants of bacteriophage SfII in Sf301. SHI-1 is absent from Sf8401 but a specific related protein is found next to the pheV locus. SHI-2 is involved in one intra-replichore inversion near the origin of replication, which may change the expression of iut/iuc genes. Moreover, genes related to the glycine-betaine biosynthesis pathway are present only in Sf8401 among the known Shigella genomes. Conclusion Our data show that the two S. flexneri genomes are very similar, which suggests a high level of structural and functional conservation between the two serotypes. The differences reflect different selection pressures during evolution. The ancestor of S. flexneri probably acquired SHI-1 and SHI-2 before SHI-O was integrated and the serotypes diverged. SHI-1 was subsequently deleted from the S. flexneri 5b genome by recombination, but stabilized in the S. flexneri 2a genome. These events may have contributed to the differences in pathogenicity and epidemicity between the two serotypes of S. flexneri.

  9. mESAdb: microRNA expression and sequence analysis database.

    Science.gov (United States)

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  10. The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species.

    Directory of Open Access Journals (Sweden)

    Inkyu Park

    Full Text Available Aconitum species (belonging to the Ranunculaceae are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.

  11. The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species.

    Science.gov (United States)

    Park, Inkyu; Kim, Wook-Jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin; Moon, Byeong Cheol

    2017-01-01

    Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.

  12. Enriching Genomic Resources and Transcriptional Profile Analysis of Miscanthus sinensis under Drought Stress Based on RNA Sequencing

    Directory of Open Access Journals (Sweden)

    Gang Nie

    2017-01-01

    Full Text Available Miscanthus × giganteus is wildly cultivated as a potential biofuel feedstock around the world; however, the narrow genetic basis and sterile characteristics have become a limitation for its utilization. As a progenitor of M. × giganteus, M. sinensis is widely distributed around East Asia providing well abiotic stress tolerance. To enrich the M. sinensis genomic databases and resources, we sequenced and annotated the transcriptome of M. sinensis by using an Illumina HiSeq 2000 platform. Approximately 316 million high-quality trimmed reads were generated from 349 million raw reads, and a total of 114,747 unigenes were obtained after de novo assembly. Furthermore, 95,897 (83.57% unigenes were annotated to at least one database including NR, Swiss-Prot, KEGG, COG, GO, and NT, supporting that the sequences obtained were annotated properly. Differentially expressed gene analysis indicates that drought stress 15 days could be a critical period for M. sinensis response to drought stress. The high-throughput transcriptome sequencing of M. sinensis under drought stress has greatly enriched the current genomic available resources. The comparison of DEGs under different periods of drought stress identified a wealth of candidate genes involved in drought tolerance regulatory networks, which will facilitate further genetic improvement and molecular studies of the M. sinensis.

  13. Changes in latent fingerprint examiners' markup between analysis and comparison.

    Science.gov (United States)

    Ulery, Bradford T; Hicklin, R Austin; Roberts, Maria Antonia; Buscaglia, JoAnn

    2015-02-01

    After the initial analysis of a latent print, an examiner will sometimes revise the assessment during comparison with an exemplar. Changes between analysis and comparison may indicate that the initial analysis of the latent was inadequate, or that confirmation bias may have affected the comparison. 170 volunteer latent print examiners, each randomly assigned 22 pairs of prints from a pool of 320 total pairs, provided detailed markup documenting their interpretations of the prints and the bases for their comparison conclusions. We describe changes in value assessments and markup of features and clarity. When examiners individualized, they almost always added or deleted minutiae (90.3% of individualizations); every examiner revi