WorldWideScience

Sample records for java sequence alignment

  1. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    Science.gov (United States)

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  2. Sequence alignment visualization in HTML5 without Java.

    Science.gov (United States)

    Gille, Christoph; Birgit, Weyand; Gille, Andreas

    2014-01-01

    Java has been extensively used for the visualization of biological data in the web. However, the Java runtime environment is an additional layer of software with an own set of technical problems and security risks. HTML in its new version 5 provides features that for some tasks may render Java unnecessary. Alignment-To-HTML is the first HTML-based interactive visualization for annotated multiple sequence alignments. The server side script interpreter can perform all tasks like (i) sequence retrieval, (ii) alignment computation, (iii) rendering, (iv) identification of a homologous structural models and (v) communication with BioDAS-servers. The rendered alignment can be included in web pages and is displayed in all browsers on all platforms including touch screen tablets. The functionality of the user interface is similar to legacy Java applets and includes color schemes, highlighting of conserved and variable alignment positions, row reordering by drag and drop, interlinked 3D visualization and sequence groups. Novel features are (i) support for multiple overlapping residue annotations, such as chemical modifications, single nucleotide polymorphisms and mutations, (ii) mechanisms to quickly hide residue annotations, (iii) export to MS-Word and (iv) sequence icons. Alignment-To-HTML, the first interactive alignment visualization that runs in web browsers without additional software, confirms that to some extend HTML5 is already sufficient to display complex biological data. The low speed at which programs are executed in browsers is still the main obstacle. Nevertheless, we envision an increased use of HTML and JavaScript for interactive biological software. Under GPL at: http://www.bioinformatics.org/strap/toHTML/.

  3. MSAViewer: interactive JavaScript visualization of multiple sequence alignments.

    Science.gov (United States)

    Yachdav, Guy; Wilzbach, Sebastian; Rauscher, Benedikt; Sheridan, Robert; Sillitoe, Ian; Procter, James; Lewis, Suzanna E; Rost, Burkhard; Goldberg, Tatyana

    2016-11-15

    The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is 'web ready': written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components. The MSAViewer is released as open source software under the Boost Software License 1.0. Documentation, source code and the viewer are available at http://msa.biojs.net/Supplementary information: Supplementary data are available at Bioinformatics online. msa@bio.sh. © The Author 2016. Published by Oxford University Press.

  4. JVM: Java Visual Mapping tool for next generation sequencing read.

    Science.gov (United States)

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.

  5. JavaScript DNA translator: DNA-aligned protein translations.

    Science.gov (United States)

    Perry, William L

    2002-12-01

    There are many instances in molecular biology when it is necessary to identify ORFs in a DNA sequence. While programs exist for displaying protein translations in multiple ORFs in alignment with a DNA sequence, they are often expensive, exist as add-ons to software that must be purchased, or are only compatible with a particular operating system. JavaScript DNA Translator is a shareware application written in JavaScript, a scripting language interpreted by the Netscape Communicator and Internet Explorer Web browsers, which makes it compatible with several different operating systems. While the program uses a familiar Web page interface, it requires no connection to the Internet since calculations are performed on the user's own computer. The program analyzes one or multiple DNA sequences and generates translations in up to six reading frames aligned to a DNA sequence, in addition to displaying translations as separate sequences in FASTA format. ORFs within a reading frame can also be displayed as separate sequences. Flexible formatting options are provided, including the ability to hide ORFs below a minimum size specified by the user. The program is available free of charge at the BioTechniques Software Library (www.Biotechniques.com).

  6. Alignment-Annotator web server: rendering and annotating sequence alignments.

    Science.gov (United States)

    Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

    2014-07-01

    Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences.

    Science.gov (United States)

    Fourment, Mathieu; Gibbs, Mark J

    2008-02-05

    Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.

  8. The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences

    Directory of Open Access Journals (Sweden)

    Gibbs Mark J

    2008-02-01

    Full Text Available Abstract Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.

  9. AlignMe—a membrane protein sequence alignment web server

    Science.gov (United States)

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  10. Pairwise Sequence Alignment Library

    Energy Technology Data Exchange (ETDEWEB)

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  11. Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment

    Directory of Open Access Journals (Sweden)

    Daniels Noah M

    2012-10-01

    Full Text Available Abstract Background The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. Results We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. Conclusions Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.

  12. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the 'Needleman-Wunsch' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  13. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  14. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  15. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    fast alignment algorithm, called 'Alignment By Scanning' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the 'GAP' (which is heuristic) and the 'Needleman

  16. Ancestral sequence alignment under optimal conditions

    Directory of Open Access Journals (Sweden)

    Brown Daniel G

    2005-11-01

    Full Text Available Abstract Background Multiple genome alignment is an important problem in bioinformatics. An important subproblem used by many multiple alignment approaches is that of aligning two multiple alignments. Many popular alignment algorithms for DNA use the sum-of-pairs heuristic, where the score of a multiple alignment is the sum of its induced pairwise alignment scores. However, the biological meaning of the sum-of-pairs of pairs heuristic is not obvious. Additionally, many algorithms based on the sum-of-pairs heuristic are complicated and slow, compared to pairwise alignment algorithms. An alternative approach to aligning alignments is to first infer ancestral sequences for each alignment, and then align the two ancestral sequences. In addition to being fast, this method has a clear biological basis that takes into account the evolution implied by an underlying phylogenetic tree. In this study we explore the accuracy of aligning alignments by ancestral sequence alignment. We examine the use of both maximum likelihood and parsimony to infer ancestral sequences. Additionally, we investigate the effect on accuracy of allowing ambiguity in our ancestral sequences. Results We use synthetic sequence data that we generate by simulating evolution on a phylogenetic tree. We use two different types of phylogenetic trees: trees with a period of rapid growth followed by a period of slow growth, and trees with a period of slow growth followed by a period of rapid growth. We examine the alignment accuracy of four ancestral sequence reconstruction and alignment methods: parsimony, maximum likelihood, ambiguous parsimony, and ambiguous maximum likelihood. Additionally, we compare against the alignment accuracy of two sum-of-pairs algorithms: ClustalW and the heuristic of Ma, Zhang, and Wang. Conclusion We find that allowing ambiguity in ancestral sequences does not lead to better multiple alignments. Regardless of whether we use parsimony or maximum likelihood, the

  17. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

    2012-01-01

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  18. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  19. A rank-based sequence aligner with applications in phylogenetic analysis.

    Directory of Open Access Journals (Sweden)

    Liviu P Dinu

    Full Text Available Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD. The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.

  20. Interactive software tool to comprehend the calculation of optimal sequence alignments with dynamic programming.

    Science.gov (United States)

    Ibarra, Ignacio L; Melo, Francisco

    2010-07-01

    Dynamic programming (DP) is a general optimization strategy that is successfully used across various disciplines of science. In bioinformatics, it is widely applied in calculating the optimal alignment between pairs of protein or DNA sequences. These alignments form the basis of new, verifiable biological hypothesis. Despite its importance, there are no interactive tools available for training and education on understanding the DP algorithm. Here, we introduce an interactive computer application with a graphical interface, for the purpose of educating students about DP. The program displays the DP scoring matrix and the resulting optimal alignment(s), while allowing the user to modify key parameters such as the values in the similarity matrix, the sequence alignment algorithm version and the gap opening/extension penalties. We hope that this software will be useful to teachers and students of bioinformatics courses, as well as researchers who implement the DP algorithm for diverse applications. The software is freely available at: http:/melolab.org/sat. The software is written in the Java computer language, thus it runs on all major platforms and operating systems including Windows, Mac OS X and LINUX. All inquiries or comments about this software should be directed to Francisco Melo at fmelo@bio.puc.cl.

  1. Analysis of computational complexity for HT-based fingerprint alignment algorithms on java card environment

    CSIR Research Space (South Africa)

    Mlambo, CS

    2015-01-01

    Full Text Available In this paper, implementations of three Hough Transform based fingerprint alignment algorithms are analyzed with respect to time complexity on Java Card environment. Three algorithms are: Local Match Based Approach (LMBA), Discretized Rotation Based...

  2. Progressive multiple sequence alignments from triplets

    Directory of Open Access Journals (Sweden)

    Stadler Peter F

    2007-07-01

    Full Text Available Abstract Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mismatch scores.

  3. Visualization of protein sequence features using JavaScript and SVG with pViz.js.

    Science.gov (United States)

    Mukhyala, Kiran; Masselot, Alexandre

    2014-12-01

    pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures.

    Science.gov (United States)

    Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2012-02-15

    We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.

  5. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  6. Heuristics for multiobjective multiple sequence alignment.

    Science.gov (United States)

    Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

    2016-07-15

    Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show

  7. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  8. Spatio-temporal alignment of pedobarographic image sequences.

    Science.gov (United States)

    Oliveira, Francisco P M; Sousa, Andreia; Santos, Rubim; Tavares, João Manuel R S

    2011-07-01

    This article presents a methodology to align plantar pressure image sequences simultaneously in time and space. The spatial position and orientation of a foot in a sequence are changed to match the foot represented in a second sequence. Simultaneously with the spatial alignment, the temporal scale of the first sequence is transformed with the aim of synchronizing the two input footsteps. Consequently, the spatial correspondence of the foot regions along the sequences as well as the temporal synchronizing is automatically attained, making the study easier and more straightforward. In terms of spatial alignment, the methodology can use one of four possible geometric transformation models: rigid, similarity, affine, or projective. In the temporal alignment, a polynomial transformation up to the 4th degree can be adopted in order to model linear and curved time behaviors. Suitable geometric and temporal transformations are found by minimizing the mean squared error (MSE) between the input sequences. The methodology was tested on a set of real image sequences acquired from a common pedobarographic device. When used in experimental cases generated by applying geometric and temporal control transformations, the methodology revealed high accuracy. In addition, the intra-subject alignment tests from real plantar pressure image sequences showed that the curved temporal models produced better MSE results (P alignment of pedobarographic image data, since previous methods can only be applied on static images.

  9. QUASAR--scoring and ranking of sequence-structure alignments.

    Science.gov (United States)

    Birzele, Fabian; Gewehr, Jan E; Zimmer, Ralf

    2005-12-15

    Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.

  10. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  11. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Pedersen, Anders Gorm

    2003-01-01

    The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA...

  12. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  13. Improving your target-template alignment with MODalign.

    KAUST Repository

    Barbato, Alessandro

    2012-02-04

    SUMMARY: MODalign is an interactive web-based tool aimed at helping protein structure modelers to inspect and manually modify the alignment between the sequences of a target protein and of its template(s). It interactively computes, displays and, upon modification of the target-template alignment, updates the multiple sequence alignments of the two protein families, their conservation score, secondary structure and solvent accessibility values, and local quality scores of the implied three-dimensional model(s). Although it has been designed to simplify the target-template alignment step in modeling, it is suitable for all cases where a sequence alignment needs to be inspected in the context of other biological information. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modalign. Website implemented in HTML and JavaScript with all major browsers supported. CONTACT: jan.kosinski@uniroma1.it.

  14. MANGO: a new approach to multiple sequence alignment.

    Science.gov (United States)

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2007-01-01

    Multiple sequence alignment is a classical and challenging task for biological sequence analysis. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state of the art multiple sequence alignment programs suffer from the 'once a gap, always a gap' phenomenon. Is there a radically new way to do multiple sequence alignment? This paper introduces a novel and orthogonal multiple sequence alignment method, using multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds are provably significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks showing that MANGO compares favorably, in both accuracy and speed, against state-of-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, Prob-ConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0 and Kalign 2.0.

  15. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  16. Spreadsheet macros for coloring sequence alignments.

    Science.gov (United States)

    Haygood, M G

    1993-12-01

    This article describes a set of Microsoft Excel macros designed to color amino acid and nucleotide sequence alignments for review and preparation of visual aids. The colored alignments can then be modified to emphasize features of interest. Procedures for importing and coloring sequences are described. The macro file adds a new menu to the menu bar containing sequence-related commands to enable users unfamiliar with Excel to use the macros more readily. The macros were designed for use with Macintosh computers but will also run with the DOS version of Excel.

  17. Multiple sequence alignment accuracy and phylogenetic inference.

    Science.gov (United States)

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  18. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    Directory of Open Access Journals (Sweden)

    Claros M Gonzalo

    2010-06-01

    Full Text Available Abstract Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used

  19. Using structure to explore the sequence alignment space of remote homologs.

    Science.gov (United States)

    Kuziemko, Andrew; Honig, Barry; Petrey, Donald

    2011-10-01

    Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.

  20. Using structure to explore the sequence alignment space of remote homologs.

    Directory of Open Access Journals (Sweden)

    Andrew Kuziemko

    2011-10-01

    Full Text Available Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.

  1. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    LENUS (Irish Health Repository)

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  2. Java programming 24-hour trainer

    CERN Document Server

    Fain, Yakov

    2015-01-01

    Quick and painless Java programming with expert multimedia instruction Java Programming 24-Hour Trainer, 2nd Edition is your complete beginner's guide to the Java programming language, with easy-to-follow lessons and supplemental exercises that help you get up and running quickly. Step-by-step instruction walks you through the basics of object-oriented programming, syntax, interfaces, and more, before building upon your skills to develop games, web apps, networks, and automations. This second edition has been updated to align with Java SE 8 and Java EE 7, and includes new information on GUI b

  3. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Image correlation method for DNA sequence alignment.

    Science.gov (United States)

    Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

    2012-01-01

    The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.

  5. Differential evolution-simulated annealing for multiple sequence alignment

    Science.gov (United States)

    Addawe, R. C.; Addawe, J. M.; Sueño, M. R. K.; Magadia, J. C.

    2017-10-01

    Multiple sequence alignments (MSA) are used in the analysis of molecular evolution and sequence structure relationships. In this paper, a hybrid algorithm, Differential Evolution - Simulated Annealing (DESA) is applied in optimizing multiple sequence alignments (MSAs) based on structural information, non-gaps percentage and totally conserved columns. DESA is a robust algorithm characterized by self-organization, mutation, crossover, and SA-like selection scheme of the strategy parameters. Here, the MSA problem is treated as a multi-objective optimization problem of the hybrid evolutionary algorithm, DESA. Thus, we name the algorithm as DESA-MSA. Simulated sequences and alignments were generated to evaluate the accuracy and efficiency of DESA-MSA using different indel sizes, sequence lengths, deletion rates and insertion rates. The proposed hybrid algorithm obtained acceptable solutions particularly for the MSA problem evaluated based on the three objectives.

  6. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences.

    Science.gov (United States)

    Waldmann, Jost; Gerken, Jan; Hankeln, Wolfgang; Schweer, Timmy; Glöckner, Frank Oliver

    2014-06-14

    Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.

  7. SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment

    Directory of Open Access Journals (Sweden)

    Scott Barlowe

    2017-06-01

    Full Text Available Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment

  8. Enhanced spatio-temporal alignment of plantar pressure image sequences using B-splines.

    Science.gov (United States)

    Oliveira, Francisco P M; Tavares, João Manuel R S

    2013-03-01

    This article presents an enhanced methodology to align plantar pressure image sequences simultaneously in time and space. The temporal alignment of the sequences is accomplished using B-splines in the time modeling, and the spatial alignment can be attained using several geometric transformation models. The methodology was tested on a dataset of 156 real plantar pressure image sequences (3 sequences for each foot of the 26 subjects) that was acquired using a common commercial plate during barefoot walking. In the alignment of image sequences that were synthetically deformed both in time and space, an outstanding accuracy was achieved with the cubic B-splines. This accuracy was significantly better (p align real image sequences with unknown transformation involved, the alignment based on cubic B-splines also achieved superior results than our previous methodology (p alignment on the dynamic center of pressure (COP) displacement was also assessed by computing the intraclass correlation coefficients (ICC) before and after the temporal alignment of the three image sequence trials of each foot of the associated subject at six time instants. The results showed that, generally, the ICCs related to the medio-lateral COP displacement were greater when the sequences were temporally aligned than the ICCs of the original sequences. Based on the experimental findings, one can conclude that the cubic B-splines are a remarkable solution for the temporal alignment of plantar pressure image sequences. These findings also show that the temporal alignment can increase the consistency of the COP displacement on related acquired plantar pressure image sequences.

  9. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  10. An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

    KAUST Repository

    Bonny, Talal; Salama, Khaled N.; Zidan, Mohammed A.

    2012-01-01

    Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we

  11. Beginning programming with Java for dummies

    CERN Document Server

    Burd, Barry

    2014-01-01

    A practical introduction to programming with Java Beginning Programming with Java For Dummies, 4th Edition is a comprehensive guide to learning one of the most popular programming languages worldwide. This book covers basic development concepts and techniques through a Java lens. You'll learn what goes into a program, how to put the pieces together, how to deal with challenges, and how to make it work. The new Fourth Edition has been updated to align with Java 8, and includes new options for the latest tools and techniques. Java is the predominant language used to program Android and cloud app

  12. SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.

    Science.gov (United States)

    Wala, Jeremiah; Beroukhim, Rameen

    2017-03-01

    We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. jwala@broadinstitue.org ; rameen@broadinstitute.org. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  13. GapMis: a tool for pairwise sequence alignment with a single gap.

    Science.gov (United States)

    Flouri, Tomás; Frousios, Kimon; Iliopoulos, Costas S; Park, Kunsoo; Pissis, Solon P; Tischler, German

    2013-08-01

    Pairwise sequence alignment has received a new motivation due to the advent of recent patents in next-generation sequencing technologies, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important problem is to find the alignment between a relatively short succeeding factor of the reference sequence and the remaining low-quality part of the read allowing a number of mismatches and the insertion of a single gap in the alignment. We present GapMis, a tool for pairwise sequence alignment with a single gap. It is based on a simple algorithm, which computes a different version of the traditional dynamic programming matrix. The presented experimental results demonstrate that GapMis is more suitable and efficient than most popular tools for this task.

  14. Statistical distributions of optimal global alignment scores of random protein sequences

    Directory of Open Access Journals (Sweden)

    Tang Jiaowei

    2005-10-01

    Full Text Available Abstract Background The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. Results In this study, random and real but unrelated sequences prepared in six different ways were selected as reference datasets to obtain their respective statistical distributions of global alignment scores. All alignments were carried out with the Needleman-Wunsch algorithm and optimal scores were fitted to the Gumbel, normal and gamma distributions respectively. The three-parameter gamma distribution performs the best as the theoretical distribution function of global alignment scores, as it agrees perfectly well with the distribution of alignment scores. The normal distribution also agrees well with the score distribution frequencies when the shape parameter of the gamma distribution is sufficiently large, for this is the scenario when the normal distribution can be viewed as an approximation of the gamma distribution. Conclusion We have shown that the optimal global alignment scores of random protein sequences fit the three-parameter gamma distribution function. This would be useful for the inference of homology between sequences whose relationship is unknown, through the evaluation of gamma distribution significance between sequences.

  15. AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.

    Directory of Open Access Journals (Sweden)

    Qi Zheng

    2016-10-01

    Full Text Available Accurate mapping of next-generation sequencing (NGS reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.

  16. AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.

    Science.gov (United States)

    Zheng, Qi; Grice, Elizabeth A

    2016-10-01

    Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.

  17. Mitochondrial D-loop sequence of domesticated waterfowl in Central Java: goose and muscovy duck

    Science.gov (United States)

    Susanti, R.; Iswari, R. S.

    2018-03-01

    This study aims to determine the genetic characterization of domesticated waterfowl (goose and Muscovy duck) in Central Java based on a D-loop mtDNA gene. The D-loop gene was amplified using PCR technique by specific primer and sequenced using dideoxy termination method. Multiple alignments of D-loop gene obtained were 710 nucleotides at position 74 to 783 at the 5’ end (for goose) and 712 nucleotides at position 48 to 759 at the 5’ end (for Muscovy duck). The results of the polymorphism analysis on D-loop sequences of muscovy duck produced 3 haplotypes. In the D-loop gene of goose does not show polymorphism, with substitution at G117A. Phylogenetic trees reconstructions of goose and Muscovy duck, which was collected during this research compared with another species from Anser, Chairina and Anas was generated 2 forms of clusters. The first group consists of all kind of Muscovy duck together with Chairina moschata and Anas, while the second group consists of all geese and Anser cygnoides the other. The determination of Muscovy duck and geese identity can be distinguished from the genetic marker information. Based on the phylogenetic analysis, it can be concluded that the Muscovy duck is closely related to Chairina moschata, while geese is closely related to Anser cygnoides.

  18. Spreadsheet-based program for alignment of overlapping DNA sequences.

    Science.gov (United States)

    Anbazhagan, R; Gabrielson, E

    1999-06-01

    Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.

  19. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

    Science.gov (United States)

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-07-15

    In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

  20. IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments.

    Science.gov (United States)

    Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Naeem; Naveed, Nasir; Ahmad, Sarfraz; Muhammad, Shah; Qadri, Salman; Shahid, Muhammad; Hussain, Tanveer; Javed, Maryam

    2015-01-01

    IVisTMSA is a software package of seven graphical tools for multiple sequence alignments. MSApad is an editing and analysis tool. It can load 409% more data than Jalview, STRAP, CINEMA, and Base-by-Base. MSA comparator allows the user to visualize consistent and inconsistent regions of reference and test alignments of more than 21-MB size in less than 12 seconds. MSA comparator is 5,200% efficient and more than 40% efficient as compared to BALiBASE c program and FastSP, respectively. MSA reconstruction tool provides graphical user interfaces for four popular aligners and allows the user to load several sequence files at a time. FASTA generator converts seven formats of alignments of unlimited size into FASTA format in a few seconds. MSA ID calculator calculates identity matrix of more than 11,000 sequences with a sequence length of 2,696 base pairs in less than 100 seconds. Tree and Distance Matrix calculation tools generate phylogenetic tree and distance matrix, respectively, using neighbor joining% identity and BLOSUM 62 matrix.

  1. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Directory of Open Access Journals (Sweden)

    Kaufmann Michael

    2004-09-01

    Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  2. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    Science.gov (United States)

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to

  3. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments.

    Science.gov (United States)

    Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

    2004-09-22

    Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl

  4. GROUPING WEB ACCESS SEQUENCES uSING SEQUENCE ALIGNMENT METHOD

    OpenAIRE

    BHUPENDRA S CHORDIA; KRISHNAKANT P ADHIYA

    2011-01-01

    In web usage mining grouping of web access sequences can be used to determine the behavior or intent of a set of users. Grouping websessions is how to measure the similarity between web sessions. There are many shortcomings in traditional measurement methods. The taskof grouping web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-groupsimilarity is done using sequence alignment method. This paper introduces a new method to group we...

  5. DIALIGN: multiple DNA and protein sequence alignment at BiBiServ.

    OpenAIRE

    Morgenstern, Burkhard

    2004-01-01

    DIALIGN is a widely used software tool for multiple DNA and protein sequence alignment. The program combines local and global alignment features and can therefore be applied to sequence data that cannot be correctly aligned by more traditional approaches. DIALIGN is available online through Bielefeld Bioinformatics Server (BiBiServ). The downloadable version of the program offers several new program features. To compare the output of different alignment programs, we developed the program AltA...

  6. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-05-01

    The purpose of this dissertation is to present a methodology to model global sequence alignment problem as directed acyclic graph which helps to extract all possible optimal alignments. Moreover, a mechanism to sequentially optimize sequence alignment problem relative to different cost functions is suggested. Sequence alignment is mostly important in computational biology. It is used to find evolutionary relationships between biological sequences. There are many algo- rithms that have been developed to solve this problem. The most famous algorithms are Needleman-Wunsch and Smith-Waterman that are based on dynamic program- ming. In dynamic programming, problem is divided into a set of overlapping sub- problems and then the solution of each subproblem is found. Finally, the solutions to these subproblems are combined into a final solution. In this thesis it has been proved that for two sequences of length m and n over a fixed alphabet, the suggested optimization procedure requires O(mn) arithmetic operations per cost function on a single processor machine. The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  7. BarraCUDA - a fast short read sequence aligner using graphics processing units

    Directory of Open Access Journals (Sweden)

    Klus Petr

    2012-01-01

    Full Text Available Abstract Background With the maturation of next-generation DNA sequencing (NGS technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU, extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net

  8. BarraCUDA - a fast short read sequence aligner using graphics processing units

    LENUS (Irish Health Repository)

    Klus, Petr

    2012-01-13

    Abstract Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http:\\/\\/seqbarracuda.sf.net

  9. RNA-Pareto: interactive analysis of Pareto-optimal RNA sequence-structure alignments.

    Science.gov (United States)

    Schnattinger, Thomas; Schöning, Uwe; Marchfelder, Anita; Kestler, Hans A

    2013-12-01

    Incorporating secondary structure information into the alignment process improves the quality of RNA sequence alignments. Instead of using fixed weighting parameters, sequence and structure components can be treated as different objectives and optimized simultaneously. The result is not a single, but a Pareto-set of equally optimal solutions, which all represent different possible weighting parameters. We now provide the interactive graphical software tool RNA-Pareto, which allows a direct inspection of all feasible results to the pairwise RNA sequence-structure alignment problem and greatly facilitates the exploration of the optimal solution set.

  10. MUSCLE: multiple sequence alignment with high accuracy and high throughput.

    Science.gov (United States)

    Edgar, Robert C

    2004-01-01

    We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

  11. Complete Whole-Genome Sequence of Salmonella enterica subsp. enterica Serovar Java NCTC5706.

    Science.gov (United States)

    Fazal, Mohammed-Abbas; Alexander, Sarah; Burnett, Edward; Deheer-Graham, Ana; Oliver, Karen; Holroyd, Nancy; Parkhill, Julian; Russell, Julie E

    2016-11-03

    Salmonellae are a significant cause of morbidity and mortality globally. Here, we report the first complete genome sequence for Salmonella enterica subsp. enterica serovar Java strain NCTC5706. This strain is of historical significance, having been isolated in the pre-antibiotic era and was deposited into the National Collection of Type Cultures in 1939. © Crown copyright 2016.

  12. A time warping approach to multiple sequence alignment.

    Science.gov (United States)

    Arribas-Gil, Ana; Matias, Catherine

    2017-04-25

    We propose an approach for multiple sequence alignment (MSA) derived from the dynamic time warping viewpoint and recent techniques of curve synchronization developed in the context of functional data analysis. Starting from pairwise alignments of all the sequences (viewed as paths in a certain space), we construct a median path that represents the MSA we are looking for. We establish a proof of concept that our method could be an interesting ingredient to include into refined MSA techniques. We present a simple synthetic experiment as well as the study of a benchmark dataset, together with comparisons with 2 widely used MSA softwares.

  13. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

    Science.gov (United States)

    Daily, Jeff

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

  14. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

    Directory of Open Access Journals (Sweden)

    Sven Warris

    Full Text Available To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis.With the Parallel SW Alignment Software (PaSWAS it is possible (a to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs to perform high-speed sequence alignments, and (b retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1 tag recovery in next generation sequence data and (2 isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.

  15. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

    Science.gov (United States)

    Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

    2015-01-01

    To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.

  16. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

    Science.gov (United States)

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-08-01

    RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.

  17. Efficient alignment of pyrosequencing reads for re-sequencing applications

    Directory of Open Access Journals (Sweden)

    Russo Luis MS

    2011-05-01

    Full Text Available Abstract Background Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects. Results We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454 system against a reference sequence. Our approach explores the characteristics of the data in these re-sequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time. Conclusions The proposed methodology was implemented in a software tool called TAPyR--Tool for the Alignment of Pyrosequencing Reads--which is publicly available from http://www.tapyr.net.

  18. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

    Science.gov (United States)

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-01-01

    Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465

  19. Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

    Science.gov (United States)

    Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

    2018-05-03

    Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.

  20. Noisy: Identification of problematic columns in multiple sequence alignments

    Directory of Open Access Journals (Sweden)

    Grünewald Stefan

    2008-06-01

    Full Text Available Abstract Motivation Sequence-based methods for phylogenetic reconstruction from (nucleic acid sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i phylogenetically informative and (ii effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. Results We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. Software The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1 the average bootstrap support obtained from the original alignment is low, and (2 there are sufficiently many taxa in the data set – at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.

  1. PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

    Science.gov (United States)

    Kuznetsov, Igor B; McDuffie, Michael

    2015-05-07

    Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities. PR2ALIGN is a stand-alone software program and a web-server that provide the functionality for implementing flexible user-specified alignment scoring functions and aligning pairs of amino acid sequences based on the comparison of the profiles of biochemical properties of these sequences. Unlike the conventional sequence alignment methods that use 20x20 fixed amino acid substitution matrices, PR2ALIGN uses a set of weighted biochemical properties of amino acids to measure the distance between pairs of aligned residues and to find an optimal minimal distance global alignment. The user can provide any number of amino acid properties and specify a weight for each property. The higher the weight for a given property, the more this property affects the final alignment. We show that in many cases the approach implemented in PR2ALIGN produces better quality pair-wise alignments than the conventional matrix-based approach. PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix. To the best of the authors' knowledge, there are no existing stand-alone software programs or web-servers analogous to PR2ALIGN. The software is freely available from http://pr2align.rit.albany.edu.

  2. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Stormo, Gary D.

    2005-01-01

    detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include...... the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy....... The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability...

  3. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  4. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

    Science.gov (United States)

    Neuwald, Andrew F

    2009-08-01

    The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.

  5. Coval: improving alignment quality and variant calling accuracy for next-generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Shunichi Kosugi

    Full Text Available Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in 'targeted' alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.

  6. Evolutionary rates at codon sites may be used to align sequences and infer protein domain function

    Directory of Open Access Journals (Sweden)

    Hazelhurst Scott

    2010-03-01

    Full Text Available Abstract Background Sequence alignments form part of many investigations in molecular biology, including the determination of phylogenetic relationships, the prediction of protein structure and function, and the measurement of evolutionary rates. However, to obtain meaningful results, a significant degree of sequence similarity is required to ensure that the alignments are accurate and the inferences correct. Limitations arise when sequence similarity is low, which is particularly problematic when working with fast-evolving genes, evolutionary distant taxa, genomes with nucleotide biases, and cases of convergent evolution. Results A novel approach was conceptualized to address the "low sequence similarity" alignment problem. We developed an alignment algorithm termed FIRE (Functional Inference using the Rates of Evolution, which aligns sequences using the evolutionary rate at codon sites, as measured by the dN/dS ratio, rather than nucleotide or amino acid residues. FIRE was used to test the hypotheses that evolutionary rates can be used to align sequences and that the alignments may be used to infer protein domain function. Using a range of test data, we found that aligning domains based on evolutionary rates was possible even when sequence similarity was very low (for example, antibody variable regions. Furthermore, the alignment has the potential to infer protein domain function, indicating that domains with similar functions are subject to similar evolutionary constraints. These data suggest that an evolutionary rate-based approach to sequence analysis (particularly when combined with structural data may be used to study cases of convergent evolution or when sequences have very low similarity. However, when aligning homologous gene sets with sequence similarity, FIRE did not perform as well as the best traditional alignment algorithms indicating that the conventional approach of aligning residues as opposed to evolutionary rates remains the

  7. Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences

    Directory of Open Access Journals (Sweden)

    Lee DT

    2007-02-01

    Full Text Available Abstract Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL http://biocomp.iis.sinica.edu.tw/phylomlogo.

  8. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    OpenAIRE

    Arabi E. keshk

    2014-01-01

    The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between se...

  9. Measuring the distance between multiple sequence alignments.

    Science.gov (United States)

    Blackburne, Benjamin P; Whelan, Simon

    2012-02-15

    Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.

  10. pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment.

    Science.gov (United States)

    Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

    2018-01-01

    Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python. The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS. pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.

  11. Fractal MapReduce decomposition of sequence alignment

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2012-05-01

    Full Text Available Abstract Background The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required. Results In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming. Conclusions The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp, highlighting the browser's emergence as an environment for high performance distributed computing. Availability Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm".

  12. Aligning protein sequence and analysing substitution pattern using ...

    Indian Academy of Sciences (India)

    Prakash

    Aligning protein sequences using a score matrix has became a routine but valuable method in modern biological ..... the amino acids according to their substitution behaviour ...... which may cause great change (e.g. prolonging the helix) in.

  13. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    Science.gov (United States)

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  14. Using hidden Markov models to align multiple sequences.

    Science.gov (United States)

    Mount, David W

    2009-07-01

    A hidden Markov model (HMM) is a probabilistic model of a multiple sequence alignment (msa) of proteins. In the model, each column of symbols in the alignment is represented by a frequency distribution of the symbols (called a "state"), and insertions and deletions are represented by other states. One moves through the model along a particular path from state to state in a Markov chain (i.e., random choice of next move), trying to match a given sequence. The next matching symbol is chosen from each state, recording its probability (frequency) and also the probability of going to that state from a previous one (the transition probability). State and transition probabilities are multiplied to obtain a probability of the given sequence. The hidden nature of the HMM is due to the lack of information about the value of a specific state, which is instead represented by a probability distribution over all possible values. This article discusses the advantages and disadvantages of HMMs in msa and presents algorithms for calculating an HMM and the conditions for producing the best HMM.

  15. OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy

    Directory of Open Access Journals (Sweden)

    Searle Stephen MJ

    2003-10-01

    Full Text Available Abstract Background The alignment of two or more protein sequences provides a powerful guide in the prediction of the protein structure and in identifying key functional residues, however, the utility of any prediction is completely dependent on the accuracy of the alignment. In this paper we describe a suite of reference alignments derived from the comparison of protein three-dimensional structures together with evaluation measures and software that allow automatically generated alignments to be benchmarked. We test the OXBench benchmark suite on alignments generated by the AMPS multiple alignment method, then apply the suite to compare eight different multiple alignment algorithms. The benchmark shows the current state-of-the art for alignment accuracy and provides a baseline against which new alignment algorithms may be judged. Results The simple hierarchical multiple alignment algorithm, AMPS, performed as well as or better than more modern methods such as CLUSTALW once the PAM250 pair-score matrix was replaced by a BLOSUM series matrix. AMPS gave an accuracy in Structurally Conserved Regions (SCRs of 89.9% over a set of 672 alignments. The T-COFFEE method on a data set of families with http://www.compbio.dundee.ac.uk. Conclusions The OXBench suite of reference alignments, evaluation software and results database provide a convenient method to assess progress in sequence alignment techniques. Evaluation measures that were dependent on comparison to a reference alignment were found to give good discrimination between methods. The STAMP Sc Score which is independent of a reference alignment also gave good discrimination. Application of OXBench in this paper shows that with the exception of T-COFFEE, the majority of the improvement in alignment accuracy seen since 1985 stems from improved pair-score matrices rather than algorithmic refinements. The maximum theoretical alignment accuracy obtained by pooling results over all methods was 94

  16. MISTICA: Minimum Spanning Tree-Based Coarse Image Alignment for Microscopy Image Sequences.

    Science.gov (United States)

    Ray, Nilanjan; McArdle, Sara; Ley, Klaus; Acton, Scott T

    2016-11-01

    Registration of an in vivo microscopy image sequence is necessary in many significant studies, including studies of atherosclerosis in large arteries and the heart. Significant cardiac and respiratory motion of the living subject, occasional spells of focal plane changes, drift in the field of view, and long image sequences are the principal roadblocks. The first step in such a registration process is the removal of translational and rotational motion. Next, a deformable registration can be performed. The focus of our study here is to remove the translation and/or rigid body motion that we refer to here as coarse alignment. The existing techniques for coarse alignment are unable to accommodate long sequences often consisting of periods of poor quality images (as quantified by a suitable perceptual measure). Many existing methods require the user to select an anchor image to which other images are registered. We propose a novel method for coarse image sequence alignment based on minimum weighted spanning trees (MISTICA) that overcomes these difficulties. The principal idea behind MISTICA is to reorder the images in shorter sequences, to demote nonconforming or poor quality images in the registration process, and to mitigate the error propagation. The anchor image is selected automatically making MISTICA completely automated. MISTICA is computationally efficient. It has a single tuning parameter that determines graph width, which can also be eliminated by the way of additional computation. MISTICA outperforms existing alignment methods when applied to microscopy image sequences of mouse arteries.

  17. ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

    Directory of Open Access Journals (Sweden)

    Kim Taeho

    2010-09-01

    Full Text Available Abstract Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA. The new editing option and the graphical user interface (GUI provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1 the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2 Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3 Support for both single PC and distributed cluster systems.

  18. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-01-01

    The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  19. Accelerated convergence and robust asymptotic regression of the Gumbel scale parameter for gapped sequence alignment

    International Nuclear Information System (INIS)

    Park, Yonil; Sheetlin, Sergey; Spouge, John L

    2005-01-01

    Searches through biological databases provide the primary motivation for studying sequence alignment statistics. Other motivations include physical models of annealing processes or mathematical similarities to, e.g., first-passage percolation and interacting particle systems. Here, we investigate sequence alignment statistics, partly to explore two general mathematical methods. First, we model the global alignment of random sequences heuristically with Markov additive processes. In sequence alignment, the heuristic suggests a numerical acceleration scheme for simulating an important asymptotic parameter (the Gumbel scale parameter λ). The heuristic might apply to similar mathematical theories. Second, we extract the asymptotic parameter λ from simulation data with the statistical technique of robust regression. Robust regression is admirably suited to 'asymptotic regression' and deserves to be better known for it

  20. MACSIMS : multiple alignment of complete sequences information management system

    Directory of Open Access Journals (Sweden)

    Plewniak Frédéric

    2006-06-01

    Full Text Available Abstract Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at http://bips.u-strasbg.fr/MACSIMS/.

  1. Alignment of Escherichia coli K12 DNA sequences to a genomic restriction map.

    Science.gov (United States)

    Rudd, K E; Miller, W; Ostell, J; Benson, D A

    1990-01-25

    We use the extensive published information describing the genome of Escherichia coli and new restriction map alignment software to align DNA sequence, genetic, and physical maps. Restriction map alignment software is used which considers restriction maps as strings analogous to DNA or protein sequences except that two values, enzyme name and DNA base address, are associated with each position on the string. The resulting alignments reveal a nearly linear relationship between the physical and genetic maps of the E. coli chromosome. Physical map comparisons with the 1976, 1980, and 1983 genetic maps demonstrate a better fit with the more recent maps. The results of these alignments are genomic kilobase coordinates, orientation and rank of the alignment that best fits the genetic data. A statistical measure based on extreme value distribution is applied to the alignments. Additional computer analyses allow us to estimate the accuracy of the published E. coli genomic restriction map, simulate rearrangements of the bacterial chromosome, and search for repetitive DNA. The procedures we used are general enough to be applicable to other genome mapping projects.

  2. An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

    KAUST Repository

    Bonny, Talal

    2012-07-28

    Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.

  3. MSuPDA: A Memory Efficient Algorithm for Sequence Alignment.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Kamal, Md Sarwar; Chowdhury, Linkon

    2016-03-01

    Space complexity is a million dollar question in DNA sequence alignments. In this regard, memory saving under pushdown automata can help to reduce the occupied spaces in computer memory. Our proposed process is that anchor seed (AS) will be selected from given data set of nucleotide base pairs for local sequence alignment. Quick splitting techniques will separate the AS from all the DNA genome segments. Selected AS will be placed to pushdown automata's (PDA) input unit. Whole DNA genome segments will be placed into PDA's stack. AS from input unit will be matched with the DNA genome segments from stack of PDA. Match, mismatch and indel of nucleotides will be popped from the stack under the control unit of pushdown automata. During the POP operation on stack, it will free the memory cell occupied by the nucleotide base pair.

  4. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    Science.gov (United States)

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  5. SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

    Science.gov (United States)

    Liu, Kevin; Warnow, Tandy J; Holder, Mark T; Nelesen, Serita M; Yu, Jiaye; Stamatakis, Alexandros P; Linder, C Randal

    2012-01-01

    Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of

  6. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    Science.gov (United States)

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  7. Heuristic for Solving the Multiple Alignment Sequence Problem

    Directory of Open Access Journals (Sweden)

    Roman Anselmo Mora Gutiérrez

    2011-03-01

    Full Text Available In this paper we developed a new algorithm for solving the problem of multiple sequence alignment (AM S, which is a hybrid metaheuristic based on harmony search and simulated annealing. The hybrid was validated with the methodology of Julie Thompson. This is a basic algorithm and and results obtained during this stage are encouraging.

  8. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  9. Base-By-Base: single nucleotide-level analysis of whole viral genome alignments.

    Science.gov (United States)

    Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris

    2004-07-14

    With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  10. Subfamily logos: visualization of sequence deviations at alignment positions with high information content

    Directory of Open Access Journals (Sweden)

    Beitz Eric

    2006-06-01

    Full Text Available Abstract Background Recognition of relevant sequence deviations can be valuable for elucidating functional differences between protein subfamilies. Interesting residues at highly conserved positions can then be mutated and experimentally analyzed. However, identification of such sites is tedious because automated approaches are scarce. Results Subfamily logos visualize subfamily-specific sequence deviations. The display is similar to classical sequence logos but extends into the negative range. Positive, upright characters correspond to residues which are characteristic for the subfamily, negative, upside-down characters to residues typical for the remaining sequences. The symbol height is adjusted to the information content of the alignment position. Residues which are conserved throughout do not appear. Conclusion Subfamily logos provide an intuitive display of relevant sequence deviations. The method has proven to be valid using a set of 135 aligned aquaporin sequences in which established subfamily-specific positions were readily identified by the algorithm.

  11. Genotype and Phenotype Characterization of Indonesian Phytophthora infestans Isolates Collected From Java and Outside Java Island

    Directory of Open Access Journals (Sweden)

    Dwinita Wikan Utami

    2017-10-01

    Full Text Available Phytophthora infestans, the cause of late blight disease, is a worldwide problem in potato and tomato production. To understand the biology and ecology of P. infestans and the mechanism of spatial and temporal factors for the variation in P. infestans, the population diversity is required to be fully characterized. The objective of this research is to characterize the diversity of P. infestans. Surveys and collection of P. infestans isolates were performed on many locations of potato's production center in Indonesia, as in Java (West Java, Central Java, and East Java and outside of Java islands (Medan, Jambi, and Makassar. The collected isolates were then analyzed for their virulence diversity via plant disease bioassays on differential varieties and genotype diversity based on fragment analysis genotypes profile using the multiplexing 20 simple sequence repeat markers. The virulence characterization showed that the isolates group from Makassar, South Sulawesi, have the broad spectrum virulence pathotype to R1, R2, R3, R4, and R5 differential plants. Simple sequence repeat genotype characterization showed that in general, the population structure of P. infestans grouping is accordance to the origin of the sampling locations. The diversity between populations is lower than diversity between isolates in one location population groups. The characters of P. infestans population showed that the population diversity of P. infestans more occurs on individual isolates in one location compared with the diversity between the population location sampling.

  12. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

    Science.gov (United States)

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-11-23

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

  13. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

    Directory of Open Access Journals (Sweden)

    von Reumont Björn M

    2010-03-01

    Full Text Available Abstract Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment

  14. BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm.

    Science.gov (United States)

    Loving, Joshua; Hernandez, Yozen; Benson, Gary

    2014-11-15

    Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, shift and addition. Bit-parallelism has been successfully applied to the longest common subsequence (LCS) and edit-distance problems, producing fast algorithms in practice. We have developed BitPAl, a bit-parallel algorithm for general, integer-scoring global alignment. Integer-scoring schemes assign integer weights for match, mismatch and insertion/deletion. The BitPAl method uses structural properties in the relationship between adjacent scores in the scoring matrix to construct classes of efficient algorithms, each designed for a particular set of weights. In timed tests, we show that BitPAl runs 7-25 times faster than a standard iterative algorithm. Source code is freely available for download at http://lobstah.bu.edu/BitPAl/BitPAl.html. BitPAl is implemented in C and runs on all major operating systems. jloving@bu.edu or yhernand@bu.edu or gbenson@bu.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  15. Java EE 7 performance tuning and optimization

    CERN Document Server

    Oransa, Osama

    2014-01-01

    The book adopts a step-by-step approach, starting from building the basics and adding to it gradually by using different tools and examples. The book sequence is easy to follow and all topics are fully illustrated showing you how to make good use of different performance diagnostic tools. If you are an experienced Java developer, architect, team leader, consultant, support engineer, or anyone else who needs performance tuning in your Java applications, and in particular, Java enterprise applications, this book is for you. No prior experience of performance tuning is required.

  16. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    Science.gov (United States)

    Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

    2010-07-01

    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

  17. Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

    Directory of Open Access Journals (Sweden)

    Sadreyev Ruslan I

    2004-08-01

    Full Text Available Abstract Background Profile-based analysis of multiple sequence alignments (MSA allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1 MSA position and a set of predicted residue frequencies, and (2 between two MSA positions. These problems are important for (i evaluation and optimization of methods predicting residue occurrence at protein positions; (ii detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii detection of sites that determine functional or structural specificity in two related families. Results For problems (1 and (2, we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion The proposed computational method is of significant potential value for the analysis of protein families.

  18. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS

    NARCIS (Netherlands)

    Warris, S.; Yalcin, F.; Jackson, K.J.; Nap, J.P.H.

    2015-01-01

    Motivation To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate

  19. A direct method for computing extreme value (Gumbel) parameters for gapped biological sequence alignments.

    Science.gov (United States)

    Quinn, Terrance; Sinkala, Zachariah

    2014-01-01

    We develop a general method for computing extreme value distribution (Gumbel, 1958) parameters for gapped alignments. Our approach uses mixture distribution theory to obtain associated BLOSUM matrices for gapped alignments, which in turn are used for determining significance of gapped alignment scores for pairs of biological sequences. We compare our results with parameters already obtained in the literature.

  20. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    Directory of Open Access Journals (Sweden)

    Brejnev Muhizi Muhire

    Full Text Available The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV. There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT, a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms.

  1. MARTA: a suite of Java-based tools for assigning taxonomic status to DNA sequences.

    Science.gov (United States)

    Horton, Matthew; Bodenhausen, Natacha; Bergelson, Joy

    2010-02-15

    We have created a suite of Java-based software to better provide taxonomic assignments to DNA sequences. We anticipate that the program will be useful for protistologists, virologists, mycologists and other microbial ecologists. The program relies on NCBI utilities including the BLAST software and Taxonomy database and is easily manipulated at the command-line to specify a BLAST candidate's query-coverage or percent identity requirements; other options include the ability to set minimal consensus requirements (%) for each of the eight major taxonomic ranks (Domain, Kingdom, Phylum, ...) and whether to consider lower scoring candidates when the top-hit lacks taxonomic classification.

  2. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    Science.gov (United States)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  3. Introducing difference recurrence relations for faster semi-global alignment of long sequences.

    Science.gov (United States)

    Suzuki, Hajime; Kasahara, Masahiro

    2018-02-19

    The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .

  4. pyPaSWAS : Python-based multi-core CPU and GPU sequence alignment

    NARCIS (Netherlands)

    Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

    2018-01-01

    BACKGROUND: Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of

  5. Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments

    Directory of Open Access Journals (Sweden)

    Tcherepanov Vasily

    2004-07-01

    Full Text Available Abstract Background With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes is not feasible without new bioinformatics tools. Results A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1 rapidly identify and correct alignment errors in large, multiple genome alignments; and 2 generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs to retrieve detailed annotation information about the aligned genomes or use information from text files. Conclusion Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  6. Lumba-Lumba Hidung Botol Laut Jawa Adalah Tursiops aduncus Berdasar Sekuen Gen NADH Dehidrogenase Subunit 6 (VERIFICATION BOTTLENOSE DOLPHINS FROM JAVA SEA IS TURSIOPS ADUNCUS BASED ON GENE SEQUENCES OF NADH DEHYDROGENASE SUBUNIT 6

    Directory of Open Access Journals (Sweden)

    Rini Widayanti

    2014-05-01

    Full Text Available Bottlenose dolphins (Tursiops sp. is one of the aquatic mammals widely spread in the marines ofIndonesia archipelago, especially the Java Sea. The taxonomy of the genus Tursiops is still  controversial.The purpose of this study was to examine the molecular basis of Tursiops sp of Java sea marine origin onthe basis of its NADH dehydrogenase gene subunit 6 (ND6 sequences. Samples of blood were collectedfrom five male bottle nose dolphins from captivity of PT. Wersut Seguni Indonesia. DNA was isolated,amplified by polymerase chain reaction (PCR, sequenced, and analyzed the data using the MEGA v. 5.1program. The results of PCR amplification was 868 base pairs (bp, DNA sequencing showed that 528nucleotides were ND6 gene, nucleotide at the position of 387 could be used to distinguish the bottle nosedolphins Java marine origin with T. aduncus.   Filogram using Neighbor joining method based on thenucleotide sequence of the gene ND6, showed that bottle nose dolphins Java marine origin belong to groupof T. aduncus.

  7. BuddySuite: Command-Line Toolkits for Manipulating Sequences, Alignments, and Phylogenetic Trees.

    Science.gov (United States)

    Bond, Stephen R; Keat, Karl E; Barreira, Sofia N; Baxevanis, Andreas D

    2017-06-01

    The ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.

  8. Revisiting the phylogeny of Zoanthidea (Cnidaria: Anthozoa): Staggered alignment of hypervariable sequences improves species tree inference.

    Science.gov (United States)

    Swain, Timothy D

    2018-01-01

    The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    Directory of Open Access Journals (Sweden)

    Hao Ye

    2015-11-01

    Full Text Available Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.

  10. High-speed all-optical DNA local sequence alignment based on a three-dimensional artificial neural network.

    Science.gov (United States)

    Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra

    2017-07-01

    This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.

  11. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  12. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

    Directory of Open Access Journals (Sweden)

    Shi Weisong

    2011-06-01

    Full Text Available Abstract Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS. However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80% mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http

  13. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.

    Science.gov (United States)

    Nguyen, Tung; Shi, Weisong; Ruden, Douglas

    2011-06-06

    Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http

  14. Molecule-oriented programming in Java

    NARCIS (Netherlands)

    Bergstra, J.A.

    2002-01-01

    Molecule-oriented programming is introduced as a programming style carrying some perspective for Java. A sequence of examples is provided. Supporting the development of the molecule-oriented programming style several matters are introduced and developed: profile classes allowing the representation

  15. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

    Science.gov (United States)

    Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

    2013-09-01

    Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.

  16. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    Science.gov (United States)

    Kelly, Steven; Maini, Philip K

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  17. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    Directory of Open Access Journals (Sweden)

    Steven Kelly

    Full Text Available The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  18. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    Science.gov (United States)

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  19. Jannovar: a java library for exome annotation.

    Science.gov (United States)

    Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

    2014-05-01

    Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.

  20. Pattern recognition in complex activity travel patterns : comparison of Euclidean distance, signal-processing theoretical, and multidimensional sequence alignment methods

    NARCIS (Netherlands)

    Joh, C.H.; Arentze, T.A.; Timmermans, H.J.P.

    2001-01-01

    The application of a multidimensional sequence alignment method for classifying activity travel patterns is reported. The method was developed as an alternative to the existing classification methods suggested in the transportation literature. The relevance of the multidimensional sequence alignment

  1. Learning Java

    CERN Document Server

    Niemeyer, Patrick

    2005-01-01

    Version 5.0 of the Java 2 Standard Edition SDK is the most important upgrade since Java first appeared a decade ago. With Java 5.0, you'll not only find substantial changes in the platform, but to the language itself-something that developers of Java took five years to complete. The main goal of Java 5.0 is to make it easier for you to develop safe, powerful code, but none of these improvements makes Java any easier to learn, even if you've programmed with Java for years. And that means our bestselling hands-on tutorial takes on even greater significance. Learning Java is the most widely sou

  2. JavaD: Bringing Ownership Domains to Mainstream Java

    National Research Council Canada - National Science Library

    Abi-Antoun, Marwan; Aldrich, Jonathan

    2006-01-01

    .... As a result, none of the tool support for Java programs is available for AliasJava programs, making it harder to justify the case that Java programs are easier to evolve with Alias-Java annotations than without...

  3. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  4. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

    Science.gov (United States)

    Chang, Jia-Ming; Di Tommaso, Paolo; Notredame, Cedric

    2014-06-01

    Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. For this purpose, we describe a novel lossless alternative to site filtering that involves overweighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. The software is available from www.tcoffee.org/Projects/tcs. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. MzJava: An open source library for mass spectrometry data processing.

    Science.gov (United States)

    Horlacher, Oliver; Nikitin, Frederic; Alocci, Davide; Mariethoz, Julien; Müller, Markus; Lisacek, Frederique

    2015-11-03

    Mass spectrometry (MS) is a widely used and evolving technique for the high-throughput identification of molecules in biological samples. The need for sharing and reuse of code among bioinformaticians working with MS data prompted the design and implementation of MzJava, an open-source Java Application Programming Interface (API) for MS related data processing. MzJava provides data structures and algorithms for representing and processing mass spectra and their associated biological molecules, such as metabolites, glycans and peptides. MzJava includes functionality to perform mass calculation, peak processing (e.g. centroiding, filtering, transforming), spectrum alignment and clustering, protein digestion, fragmentation of peptides and glycans as well as scoring functions for spectrum-spectrum and peptide/glycan-spectrum matches. For data import and export MzJava implements readers and writers for commonly used data formats. For many classes support for the Hadoop MapReduce (hadoop.apache.org) and Apache Spark (spark.apache.org) frameworks for cluster computing was implemented. The library has been developed applying best practices of software engineering. To ensure that MzJava contains code that is correct and easy to use the library's API was carefully designed and thoroughly tested. MzJava is an open-source project distributed under the AGPL v3.0 licence. MzJava requires Java 1.7 or higher. Binaries, source code and documentation can be downloaded from http://mzjava.expasy.org and https://bitbucket.org/sib-pig/mzjava. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Java Dust: How Small Can Embedded Java Be?

    DEFF Research Database (Denmark)

    Caska, James; Schoeberl, Martin

    2011-01-01

    Java is slowly being accepted as a language and platform for embedded devices. However, the memory requirements of the Java library and runtime are still troublesome. A Java system is considered small when it requires less than 1 MB, and within the embedded domain small microcontollers with a few...... KB on-chip Flash memory and even less on-chip RAM are very common. For such small devices Java is a clearly challenging. In this paper we present the combination of the Java compiler Muvium for microcontrollers with the tiny soft-core Leros for an FPGA. To the best of our knowledge, the presented...... embedded Java system is the smallest Java system available. The Leros processor consumes less than 5% of the logic cells of the smallest FPGA from Altera and the Muvium compiler produces a JVM, including the Java application, that can execute in a few KB ROM and less than 1 KB RAM. The Leros processor...

  7. Aptaligner: automated software for aligning pseudorandom DNA X-aptamers from next-generation sequencing data.

    Science.gov (United States)

    Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E

    2014-06-10

    Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.

  8. Micropatterning stretched and aligned DNA for sequence-specific nanolithography

    Science.gov (United States)

    Petit, Cecilia Anna Paulette

    Techniques for fabricating nanostructured materials can be categorized as either "top-down" or "bottom-up". Top-down techniques use lithography and contact printing to create patterned surfaces and microfluidic channels that can corral and organize nanoscale structures, such as molecules and nanorods in contrast; bottom-up techniques use self-assembly or molecular recognition to direct the organization of materials. A central goal in nanotechnology is the integration of bottom-up and top-down assembly strategies for materials development, device design; and process integration. With this goal in mind, we have developed strategies that will allow this integration by using DNA as a template for nanofabrication; two top-down approaches allow the placement of these templates, while the bottom-up technique uses the specific sequence of bases to pattern materials along each strand of DNA. Our first top-down approach, termed combing of molecules in microchannels (COMMIC), produces microscopic patterns of stretched and aligned molecules of DNA on surfaces. This process consists of passing an air-water interface over end adsorbed molecules inside microfabricated channels. The geometry of the microchannel directs the placement of the DNA molecules, while the geometry of the airwater interface directs the local orientation and curvature of the molecules. We developed another top-down strategy for creating micropatterns of stretched and aligned DNA using surface chemistry. Because DNA stretching occurs on hydrophobic surfaces, this technique uses photolithography to pattern vinyl-terminated silanes on glass When these surface-, are immersed in DNA solution, molecules adhere preferentially to the silanized areas. This approach has also proven useful in patterning protein for cell adhesion studies. Finally, we describe the use of these stretched and aligned molecules of DNA as templates for the subsequent bottom-up construction of hetero-structures through hybridization

  9. Formalising the Safety of Java, the Java Virtual Machine and Java Card

    OpenAIRE

    Hartel, Pieter H.; Moreau, Luc

    2001-01-01

    We review the existing literature on Java safety, emphasizing formal approaches, and the impact of Java safety on small footprint devices such as smart cards. The conclusion is that while a lot of good work has been done, a more concerted effort is needed to build a coherent set of machine readable formal models of the whole of Java and its implementation. This is a formidable task but we believe it is essential to building trust in Java safety, and thence to achieve ITSEC level 6 or Common C...

  10. Harnessing cross-species alignment to discover SNPs and generate a draft genome sequence of a bighorn sheep (Ovis canadensis).

    Science.gov (United States)

    Miller, Joshua M; Moore, Stephen S; Stothard, Paul; Liao, Xiaoping; Coltman, David W

    2015-05-20

    Whole genome sequences (WGS) have proliferated as sequencing technology continues to improve and costs decline. While many WGS of model or domestic organisms have been produced, a growing number of non-model species are also being sequenced. In the absence of a reference, construction of a genome sequence necessitates de novo assembly which may be beyond the ability of many labs due to the large volumes of raw sequence data and extensive bioinformatics required. In contrast, the presence of a reference WGS allows for alignment which is more tractable than assembly. Recent work has highlighted that the reference need not come from the same species, potentially enabling a wide array of species WGS to be constructed using cross-species alignment. Here we report on the creation a draft WGS from a single bighorn sheep (Ovis canadensis) using alignment to the closely related domestic sheep (Ovis aries). Two sequencing libraries on SOLiD platforms yielded over 865 million reads, and combined alignment to the domestic sheep reference resulted in a nearly complete sequence (95% coverage of the reference) at an average of 12x read depth (104 SD). From this we discovered over 15 million variants and annotated them relative to the domestic sheep reference. We then conducted an enrichment analysis of those SNPs showing fixed differences between the reference and sequenced individual and found significant differences in a number of gene ontology (GO) terms, including those associated with reproduction, muscle properties, and bone deposition. Our results demonstrate that cross-species alignment enables the creation of novel WGS for non-model organisms. The bighorn sheep WGS will provide a resource for future resequencing studies or comparative genomics.

  11. Beginning JavaFX

    CERN Document Server

    Mohan, Praveen

    2010-01-01

    The open source JavaFX platform offers a Java-based approach to rich Internet application (RIA) development - an alternative to Adobe Flash/Flex and Microsoft Silverlight. At over 100 million downloads, the new JavaFX is poised to be a significant player now. Written by a JavaFX engineer and developer, this book is one of the first on the new JavaFX platform to give you the following: * The fundamentals of JavaFX scripting on desktop and mobile platforms * Examples of RIAs using JavaFX Graphics * Media and animation using JavaFX See how JavaFX gives you dynamic Java effects in your RIA applica

  12. A generalized global alignment algorithm.

    Science.gov (United States)

    Huang, Xiaoqiu; Chao, Kun-Mao

    2003-01-22

    Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.

  13. CHROMATOGATE: A TOOL FOR DETECTING BASE MIS-CALLS IN MULTIPLE SEQUENCE ALIGNMENTS BY SEMI-AUTOMATIC CHROMATOGRAM INSPECTION

    Directory of Open Access Journals (Sweden)

    Nikolaos Alachiotis

    2013-03-01

    Full Text Available Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG, an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.

  14. Phylo: a citizen science approach for improving multiple sequence alignment.

    Directory of Open Access Journals (Sweden)

    Alexander Kawrykow

    Full Text Available BACKGROUND: Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. METHODOLOGY/PRINCIPAL FINDINGS: We introduce Phylo, a human-based computing framework applying "crowd sourcing" techniques to solve the Multiple Sequence Alignment (MSA problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. CONCLUSIONS/SIGNIFICANCE: We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of "human-brain peta-flops" of computation that are spent every day playing games

  15. Java Series: Java Essentials I. what is Java. Basic Language Constructs

    CERN Multimedia

    CERN. Geneva

    2000-01-01

    The tutorial will firstly give a very first general introduction of what is the JAVA programming language and an overview of what the Java Development environment consists of. It will briefly explain its relation to the Internet, Web browsers and Operating Systems and show how to access Java at CERN. Then, the tutorial will be centred on explaining the basic language constructs to create classes, instances, and implement inheritance, destroy objects, etc. It will show the usage of interfaces. The tutorial is open to everyone. Attendants are required to have a basic intuition on what Object Orientation is, or to have followed the previous tutorial on the Java Serires. Organiser(s): M.Marquina and R.Ramos /IT-User Support

  16. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    Energy Technology Data Exchange (ETDEWEB)

    Ovacik, Meric A. [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Androulakis, Ioannis P., E-mail: yannis@rci.rutgers.edu [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States)

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  17. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    International Nuclear Information System (INIS)

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-01-01

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy

  18. TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction.

    Science.gov (United States)

    Chang, Jia-Ming; Di Tommaso, Paolo; Lefort, Vincent; Gascuel, Olivier; Notredame, Cedric

    2015-07-01

    This article introduces the Transitive Consistency Score (TCS) web server; a service making it possible to estimate the local reliability of protein multiple sequence alignments (MSAs) using the TCS index. The evaluation can be used to identify the aligned positions most likely to contain structurally analogous residues and also most likely to support an accurate phylogenetic reconstruction. The TCS scoring scheme has been shown to be accurate predictor of structural alignment correctness among commonly used methods. It has also been shown to outperform common filtering schemes like Gblocks or trimAl when doing MSA post-processing prior to phylogenetic tree reconstruction. The web server is available from http://tcoffee.crg.cat/tcs. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. KISS for STRAP: user extensions for a protein alignment editor.

    Science.gov (United States)

    Gille, Christoph; Lorenzen, Stephan; Michalsky, Elke; Frömmel, Cornelius

    2003-12-12

    The Structural Alignment Program STRAP is a comfortable comprehensive editor and analyzing tool for protein alignments. A wide range of functions related to protein sequences and protein structures are accessible with an intuitive graphical interface. Recent features include mapping of mutations and polymorphisms onto structures and production of high quality figures for publication. Here we address the general problem of multi-purpose program packages to keep up with the rapid development of bioinformatical methods and the demand for specific program functions. STRAP was remade implementing a novel design which aims at Keeping Interfaces in STRAP Simple (KISS). KISS renders STRAP extendable to bio-scientists as well as to bio-informaticians. Scientists with basic computer skills are capable of implementing statistical methods or embedding existing bioinformatical tools in STRAP themselves. For bio-informaticians STRAP may serve as an environment for rapid prototyping and testing of complex algorithms such as automatic alignment algorithms or phylogenetic methods. Further, STRAP can be applied as an interactive web applet to present data related to a particular protein family and as a teaching tool. JAVA-1.4 or higher. http://www.charite.de/bioinf/strap/

  20. Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads.

    Science.gov (United States)

    Huson, Daniel H; Tappu, Rewati; Bazinet, Adam L; Xie, Chao; Cummings, Michael P; Nieselt, Kay; Williams, Rohan

    2017-01-25

    Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way.

  1. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Science.gov (United States)

    Shi, Jieming; Li, Xi; Dong, Min; Graham, Mitchell; Yadav, Nehul; Liang, Chun

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  2. JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures

    Science.gov (United States)

    Dong, Min; Graham, Mitchell; Yadav, Nehul

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416

  3. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Directory of Open Access Journals (Sweden)

    Jieming Shi

    Full Text Available Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  4. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships.

    KAUST Repository

    Kosinski, Jan; Barbato, Alessandro; Tramontano, Anna

    2013-01-01

    SUMMARY: MODexplorer is an integrated tool aimed at exploring the sequence, structural and functional diversity in protein families useful in homology modeling and in analyzing protein families in general. It takes as input either the sequence or the structure of a protein and provides alignments with its homologs along with a variety of structural and functional annotations through an interactive interface. The annotations include sequence conservation, similarity scores, ligand-, DNA- and RNA-binding sites, secondary structure, disorder, crystallographic structure resolution and quality scores of models implied by the alignments to the homologs of known structure. MODexplorer can be used to analyze sequence and structural conservation among the structures of similar proteins, to find structures of homologs solved in different conformational state or with different ligands and to transfer functional annotations. Furthermore, if the structure of the query is not known, MODexplorer can be used to select the modeling templates taking all this information into account and to build a comparative model. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modexplorer. Website implemented in HTML and JavaScript with all major browsers supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  5. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships.

    KAUST Repository

    Kosinski, Jan

    2013-02-08

    SUMMARY: MODexplorer is an integrated tool aimed at exploring the sequence, structural and functional diversity in protein families useful in homology modeling and in analyzing protein families in general. It takes as input either the sequence or the structure of a protein and provides alignments with its homologs along with a variety of structural and functional annotations through an interactive interface. The annotations include sequence conservation, similarity scores, ligand-, DNA- and RNA-binding sites, secondary structure, disorder, crystallographic structure resolution and quality scores of models implied by the alignments to the homologs of known structure. MODexplorer can be used to analyze sequence and structural conservation among the structures of similar proteins, to find structures of homologs solved in different conformational state or with different ligands and to transfer functional annotations. Furthermore, if the structure of the query is not known, MODexplorer can be used to select the modeling templates taking all this information into account and to build a comparative model. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modexplorer. Website implemented in HTML and JavaScript with all major browsers supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  6. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

    Science.gov (United States)

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

    2016-07-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.

  7. An alignment-free method to find similarity among protein sequences via the general form of Chou's pseudo amino acid composition.

    Science.gov (United States)

    Gupta, M K; Niyogi, R; Misra, M

    2013-01-01

    In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.

  8. Java for dummies

    CERN Document Server

    Burd, Barry

    2011-01-01

    The top-selling beginning Java book is now fully updated for Java 7! Java is the platform-independent, object-oriented programming language used for developing web and mobile applications. The revised version offers new functionality and features that have programmers excited, and this popular guide covers them all. This book helps programmers create basic Java objects and learn when they can reuse existing code. It's just what inexperienced Java developers need to get going quickly with Java 2 Standard Edition 7.0 (J2SE 7.0) and Java Development Kit 7.0 (JDK 7). Explores how the new version o

  9. Local sequence alignments statistics: deviations from Gumbel statistics in the rare-event tail

    Directory of Open Access Journals (Sweden)

    Burghardt Bernd

    2007-07-01

    Full Text Available Abstract Background The optimal score for ungapped local alignments of infinitely long random sequences is known to follow a Gumbel extreme value distribution. Less is known about the important case, where gaps are allowed. For this case, the distribution is only known empirically in the high-probability region, which is biologically less relevant. Results We provide a method to obtain numerically the biologically relevant rare-event tail of the distribution. The method, which has been outlined in an earlier work, is based on generating the sequences with a parametrized probability distribution, which is biased with respect to the original biological one, in the framework of Metropolis Coupled Markov Chain Monte Carlo. Here, we first present the approach in detail and evaluate the convergence of the algorithm by considering a simple test case. In the earlier work, the method was just applied to one single example case. Therefore, we consider here a large set of parameters: We study the distributions for protein alignment with different substitution matrices (BLOSUM62 and PAM250 and affine gap costs with different parameter values. In the logarithmic phase (large gap costs it was previously assumed that the Gumbel form still holds, hence the Gumbel distribution is usually used when evaluating p-values in databases. Here we show that for all cases, provided that the sequences are not too long (L > 400, a "modified" Gumbel distribution, i.e. a Gumbel distribution with an additional Gaussian factor is suitable to describe the data. We also provide a "scaling analysis" of the parameters used in the modified Gumbel distribution. Furthermore, via a comparison with BLAST parameters, we show that significance estimations change considerably when using the true distributions as presented here. Finally, we study also the distribution of the sum statistics of the k best alignments. Conclusion Our results show that the statistics of gapped and ungapped local

  10. Cover song identification by sequence alignment algorithms

    Science.gov (United States)

    Wang, Chih-Li; Zhong, Qian; Wang, Szu-Ying; Roychowdhury, Vwani

    2011-10-01

    Content-based music analysis has drawn much attention due to the rapidly growing digital music market. This paper describes a method that can be used to effectively identify cover songs. A cover song is a song that preserves only the crucial melody of its reference song but different in some other acoustic properties. Hence, the beat/chroma-synchronous chromagram, which is insensitive to the variation of the timber or rhythm of songs but sensitive to the melody, is chosen. The key transposition is achieved by cyclically shifting the chromatic domain of the chromagram. By using the Hidden Markov Model (HMM) to obtain the time sequences of songs, the system is made even more robust. Similar structure or length between the cover songs and its reference are not necessary by the Smith-Waterman Alignment Algorithm.

  11. Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.

    Science.gov (United States)

    Newberg, Lee A

    2008-08-15

    A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.

  12. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    Science.gov (United States)

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  13. VCFtoTree: a user-friendly tool to construct locus-specific alignments and phylogenies from thousands of anthropologically relevant genome sequences.

    Science.gov (United States)

    Xu, Duo; Jaber, Yousef; Pavlidis, Pavlos; Gokcumen, Omer

    2017-09-26

    Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/VCFtoTree_3.0.0 .

  14. DNAAlignEditor: DNA alignment editor tool

    Directory of Open Access Journals (Sweden)

    Guill Katherine E

    2008-03-01

    Full Text Available Abstract Background With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor. Results We have generated a nucleotide sequence alignment editor (DNAAlignEditor that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected. Conclusion We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism.

  15. JavaScript bible

    CERN Document Server

    Goodman, Danny; Novitski, Paul; Rayl, Tia Gustaffl

    2009-01-01

    The bestselling JavaScript reference, now updated to reflect changes in technology and best practices. As the most comprehensive book on the market, the JavaScript Bible is a classic bestseller that keeps you up to date on the latest changes in JavaScript, the leading technology for incorporating interactivity into Web pages. Part tutorial, part reference, this book serves as both a learning tool for building new JavaScript skills as well as a detailed reference for the more experienced JavaScript user. You'll get up-to-date coverage on the latest JavaScript practices that have been implemente

  16. Nucleotide sequence alignment of hdcA from Gram-positive bacteria.

    Science.gov (United States)

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; Del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A

    2016-03-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4].

  17. Begining Java EE 7

    CERN Document Server

    Gonclaves, Antonio

    2013-01-01

    Java Enterprise Edition (Java EE) continues to be one of the leading Java technologies and platforms. Beginning Java EE 7 is the first tutorial book on Java EE 7. Step by step and easy to follow, this book describes many of the Java EE 7 specifications and reference implementations, and shows them in action using practical examples. This definitive book also uses the newest version of GlassFish to deploy and administer the code examples. Written by an expert member of the Java EE specification request and review board in the Java Community Process (JCP), this book contains the best information possible, from an expert’s perspective on enterprise Java technologies.

  18. JavaScript Pocket Reference

    CERN Document Server

    Flanagan, David

    1998-01-01

    JavaScript is a powerful, object-based scripting language that can be embedded directly in HTML pages. It allows you to create dynamic, interactive Web-based applications that run completely within a Web browser -- JavaScript is the language of choice for developing Dynamic HTML (DHTML) content. JavaScript can be integrated effectively with CGI and Java to produce sophisticated Web applications, although, in many cases, JavaScript eliminates the need for complex CGI scripts and Java applets altogether. The JavaScript Pocket Reference is a companion volume to JavaScript: The Definitive Guide

  19. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    Science.gov (United States)

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

  20. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

    Directory of Open Access Journals (Sweden)

    Arthur W Pightling

    Full Text Available The wide availability of whole-genome sequencing (WGS and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i depth of sequencing coverage, ii choice of reference-guided short-read sequence assembler, iii choice of reference genome, and iv whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT, using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming. We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers

  1. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  2. The Delft-Java Engine

    NARCIS (Netherlands)

    Glossner III, C.J.

    2001-01-01

    In this dissertation, we describe the DELFT-JAVA engine - a 32-bit RISC-based architecture that provides high performance JAVA program execution. More specifically we describe a microarchitecture that accelerates JAVA execution and provide details of the DELFT-JAVA architecture for executing JAVA

  3. Java Series: Java Essentials II Advanced Language Constructs

    CERN Multimedia

    CERN. Geneva

    2000-01-01

    This tutorial will show how Java uses important language constructs, and the set of classes typically used in common tasks. It will briefly show conditional and loops structures and then will introduce the most significative classes included in the java.util package, such as vectors, collections, enumeration, etc. It will finally explain the usage and handling of exceptions in Java.Organiser(s): M.Marquina and R.Ramos /IT-User Support

  4. Java EE 7 handbook

    CERN Document Server

    Pilgrim, Peter A

    2013-01-01

    Java EE 7 Handbook is an example based tutorial with descriptions and explanations.""Java EE 7 Handbook"" is for the developer, designer, and architect aiming to get acquainted with the Java EE platform in its newest edition. This guide will enhance your knowledge about the Java EE 7 platform. Whether you are a long-term Java EE (J2EE) developer or an intermediate level engineer on the JVM with just Java SE behind you, this handbook is for you, the new contemporary Java EE 7 developer!

  5. Safety-critical Java on a Java processor

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Rios Rivas, Juan Ricardo

    2012-01-01

    The safety-critical Java (SCJ) specification is developed within the Java Community Process under specification request number JSR 302. The specification is available as public draft, but details are still discussed by the expert group. In this stage of the specification we need prototype...... implementations of SCJ and first test applications that are written with SCJ, even when the specification is not finalized. The feedback from those prototype implementations is needed for final decisions. To help the SCJ expert group, a prototype implementation of SCJ on top of the Java optimized processor...

  6. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    Science.gov (United States)

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the

  7. Java for dummies

    CERN Document Server

    Burd

    2014-01-01

    The top-selling beginning Java book is now fully updated! As an unstoppably platform-independent, object-oriented programming language, Java is used for developing web and mobile applications. In this up-to-date bestselling book, veteran author Barry Burd shows you how to create basic Java objects and clearly explains when you should simply reuse existing code. Explores how the new version of Java offers more robust functionality and new features such as closures to keep Java competitive with more syntax-friendly languages like Python and Ruby Covers object-oriented programming basics with Ja

  8. eMatchSite: sequence order-independent structure alignments of ligand binding pockets in protein models.

    Directory of Open Access Journals (Sweden)

    Michal Brylinski

    2014-09-01

    Full Text Available Detecting similarities between ligand binding sites in the absence of global homology between target proteins has been recognized as one of the critical components of modern drug discovery. Local binding site alignments can be constructed using sequence order-independent techniques, however, to achieve a high accuracy, many current algorithms for binding site comparison require high-quality experimental protein structures, preferably in the bound conformational state. This, in turn, complicates proteome scale applications, where only various quality structure models are available for the majority of gene products. To improve the state-of-the-art, we developed eMatchSite, a new method for constructing sequence order-independent alignments of ligand binding sites in protein models. Large-scale benchmarking calculations using adenine-binding pockets in crystal structures demonstrate that eMatchSite generates accurate alignments for almost three times more protein pairs than SOIPPA. More importantly, eMatchSite offers a high tolerance to structural distortions in ligand binding regions in protein models. For example, the percentage of correctly aligned pairs of adenine-binding sites in weakly homologous protein models is only 4-9% lower than those aligned using crystal structures. This represents a significant improvement over other algorithms, e.g. the performance of eMatchSite in recognizing similar binding sites is 6% and 13% higher than that of SiteEngine using high- and moderate-quality protein models, respectively. Constructing biologically correct alignments using predicted ligand binding sites in protein models opens up the possibility to investigate drug-protein interaction networks for complete proteomes with prospective systems-level applications in polypharmacology and rational drug repositioning. eMatchSite is freely available to the academic community as a web-server and a stand-alone software distribution at http://www.brylinski.org/ematchsite.

  9. Protected Objects in Java

    DEFF Research Database (Denmark)

    Løvengreen, Hans Henrik; Schwarzer, Jens Christian

    1998-01-01

    We present an implementation of Ada 95's notion of protected objects in Java. The implementation comprises a class library supporting entry queues and a (pre-) compiler translating slightly decorated Java classes to pure Java classes utilizing the library.......We present an implementation of Ada 95's notion of protected objects in Java. The implementation comprises a class library supporting entry queues and a (pre-) compiler translating slightly decorated Java classes to pure Java classes utilizing the library....

  10. Java SOA Cookbook

    CERN Document Server

    Hewitt, Eben

    2009-01-01

    Java SOA Cookbook offers practical solutions and advice to programmers charged with implementing a service-oriented architecture (SOA) in their organization. Instead of providing another conceptual, high-level view of SOA, this cookbook shows you how to make SOA work. It's full of Java and XML code you can insert directly into your applications and recipes you can apply right away. The book focuses primarily on the use of free and open source Java Web Services technologies -- including Java SE 6 and Java EE 5 tools -- but you'll find tips for using commercially available tools as well. Jav

  11. Formalizing the Safety of Java, the Java Virtual Machine and Java Card

    NARCIS (Netherlands)

    Hartel, Pieter H.; Moreau, Luc

    2001-01-01

    We review the existing literature on Java safety, emphasizing formal approaches, and the impact of Java safety on small footprint devices such as smart cards. The conclusion is that while a lot of good work has been done, a more concerted effort is needed to build a coherent set of machine readable

  12. Pro Java ME Apps

    CERN Document Server

    Iliescu, Ovidiu

    2011-01-01

    Pro Java ME Apps gives you, the developer, the know-how required for writing sophisticated Java ME applications and for taking advantage of this huge potential market. Java ME is the largest mobile software platform in the world, supported by over 80% of all phones. You'll cover what Java ME is and how it compares to other mobile software platforms, how to properly design and structure Java ME applications, how to think like an experienced Java ME developer, what common problems and pitfalls you may run into, how to optimize your code, and many other key topics. Unlike other Java ME books out

  13. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

    Science.gov (United States)

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  14. Some measurements of Java-to-bytecode compiler performance in the Java Virtual Machine

    OpenAIRE

    Daly, Charles; Horgan, Jane; Power, James; Waldron, John

    2001-01-01

    In this paper we present a platform independent analysis of the dynamic profiles of Java programs when executing on the Java Virtual Machine. The Java programs selected are taken from the Java Grande Forum benchmark suite, and five different Java-to-bytecode compilers are analysed. The results presented describe the dynamic instruction usage frequencies.

  15. CBESW: sequence alignment on the Playstation 3.

    Science.gov (United States)

    Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil

    2008-09-17

    The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. For large datasets, our implementation on the PlayStation 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. The results from our experiments demonstrate that the PlayStation 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.

  16. Aligning the unalignable: bacteriophage whole genome alignments.

    Science.gov (United States)

    Bérard, Sèverine; Chateau, Annie; Pompidor, Nicolas; Guertin, Paul; Bergeron, Anne; Swenson, Krister M

    2016-01-13

    In recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences. In this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressive Mauve aligner - which implements a partial order strategy, but whose alignments are linearized - shows a greatly improved interactive graphic display, while avoiding misalignments. Multiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains. A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://bitbucket.org/thekswenson/alpha).

  17. PriFi - Using a Multiple Alignment of Related Sequences to Find Primers for  Amplification of Homologs

    DEFF Research Database (Denmark)

    Fredslund, Jakob; Schauser, Leif; Madsen, Lene Heegaard

    2005-01-01

    Using a comparative approach, the web program PriFi (http://cgi-www.daimi.au.dk/cgi-chili/PriFi/main) designs pairs of primers useful for PCR amplification of genomic DNA in species where prior sequence information is not available. The program works with an alignment of DNA sequences from phylog...

  18. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids.

    Science.gov (United States)

    Li, Yushuang; Song, Tian; Yang, Jiasheng; Zhang, Yi; Yang, Jialiang

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector.

  19. Gapped sequence alignment using artificial neural networks: application to the MHC class I system

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Nielsen, Morten

    2016-01-01

    . On this relatively simple system, we developed a sequence alignment method based on artificial neural networks that allows insertions and deletions in the alignment. Results: We show that prediction methods based on alignments that include insertions and deletions have significantly higher performance than methods...... trained on peptides of single lengths. Also, we illustrate how the location of deletions can aid the interpretation of the modes of binding of the peptide-MHC, as in the case of long peptides bulging out of the MHC groove or protruding at either terminus. Finally, we demonstrate that the method can learn...... the length profile of different MHC molecules, and quantified the reduction of the experimental effort required to identify potential epitopes using our prediction algorithm. Availability and implementation: The NetMHC-4.0 method for the prediction of peptide-MHC class I binding affinity using gapped...

  20. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

    Directory of Open Access Journals (Sweden)

    Maréchal Eric

    2008-08-01

    Full Text Available Abstract Background Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2 following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. Results We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure. Homologous sequences were considered as systems 1 having a high redundancy of information reflected by the magnitude of their alignment scores, 2 which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a

  1. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    Science.gov (United States)

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs

  2. Monte Carlo simulation of a statistical mechanical model of multiple protein sequence alignment.

    Science.gov (United States)

    Kinjo, Akira R

    2017-01-01

    A grand canonical Monte Carlo (MC) algorithm is presented for studying the lattice gas model (LGM) of multiple protein sequence alignment, which coherently combines long-range interactions and variable-length insertions. MC simulations are used for both parameter optimization of the model and production runs to explore the sequence subspace around a given protein family. In this Note, I describe the details of the MC algorithm as well as some preliminary results of MC simulations with various temperatures and chemical potentials, and compare them with the mean-field approximation. The existence of a two-state transition in the sequence space is suggested for the SH3 domain family, and inappropriateness of the mean-field approximation for the LGM is demonstrated.

  3. Ivor Horton's Beginning Java

    CERN Document Server

    Horton, Ivor

    2011-01-01

    Find out why thousands have turned to Ivor Horton for learning Java Ivor Horton's approach is teaching Java is so effective and popular that he is one of the leading authors of introductory programming tutorials, with over 160,000 copies of his Java books sold. In this latest edition, whether you're a beginner or an experienced programmer switching to Java, you'll learn how to build real-world Java applications using Java SE 7. The author thoroughly covers the basics as well as new features such as extensions and classes; extended coverage of the Swing Application Framework; and he does it all

  4. Multiple amino acid sequence alignment nitrogenase component 1: insights into phylogenetics and structure-function relationships.

    Directory of Open Access Journals (Sweden)

    James B Howard

    Full Text Available Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as "core" for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification

  5. Anatomy of the western Java plate interface from depth-migrated seismic images

    OpenAIRE

    Kopp, Heidrun; Hindle, David; Klaeschen, Dirk; Oncken, O.; Scholl, D.

    2009-01-01

    Newly pre-stack depth-migrated seismic images resolve the structural details of the western Java forearc and plate interface. The structural segmentation of the forearc into discrete mechanical domains correlates with distinct deformation styles. Approximately 2/3 of the trench sediment fill is detached and incorporated into frontal prism imbricates, while the floor sequence is underthrust beneath the décollement. Western Java, however, differs markedly from margins such as Nankai or Barbados...

  6. Java servlet programming

    CERN Document Server

    Hunter, Jason

    2001-01-01

    Servlets are an exciting and important technology that ties Java to the Web, allowing programmers to write Java programs that create dynamic web content. Java Servlet Programming covers everything Java developers need to know to write effective servlets. It explains the servlet lifecycle, showing how to use servlets to maintain state information effortlessly. It also describes how to serve dynamic web content, including both HTML pages and multimedia data, and explores more advanced topics like integrated session tracking, efficient database connectivity using JDBC, applet-servlet communicat

  7. Fine-tuning structural RNA alignments in the twilight zone

    Directory of Open Access Journals (Sweden)

    Schirmer Stefanie

    2010-04-01

    Full Text Available Abstract Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.

  8. Mason: a JavaScript web site widget for visualizing and comparing annotated features in nucleotide or protein sequences.

    Science.gov (United States)

    Jaschob, Daniel; Davis, Trisha N; Riffle, Michael

    2015-03-07

    Sequence feature annotations (e.g., protein domain boundaries, binding sites, and secondary structure predictions) are an essential part of biological research. Annotations are widely used by scientists during research and experimental design, and are frequently the result of biological studies. A generalized and simple means of disseminating and visualizing these data via the web would be of value to the research community. Mason is a web site widget designed to visualize and compare annotated features of one or more nucleotide or protein sequence. Annotated features may be of virtually any type, ranging from annotating transcription binding sites or exons and introns in DNA to secondary structure or domain boundaries in proteins. Mason is simple to use and easy to integrate into web sites. Mason has a highly dynamic and configurable interface supporting multiple sets of annotations per sequence, overlapping regions, customization of interface and user-driven events (e.g., clicks and text to appear for tooltips). It is written purely in JavaScript and SVG, requiring no 3(rd) party plugins or browser customization. Mason is a solution for dissemination of sequence annotation data on the web. It is highly flexible, customizable, simple to use, and is designed to be easily integrated into web sites. Mason is open source and freely available at https://github.com/yeastrc/mason.

  9. Java 8 recipes

    CERN Document Server

    Dea, Carl; Guime, Freddy; OConner, John; Juneau, Josh

    2014-01-01

    Java 8 Recipes offers solutions to common programming problems encountered while developing Java-based applications. Fully updated with the newest features and techniques available, Java 8 Recipes provides code examples involving Lambdas, embedded scripting with Nashorn, the new date-time API, stream support, functional interfaces, and much more. Especial emphasis is given to features such as lambdas that are newly introduced in Java 8. Content is presented in the popular problem-solution format: Look up the programming problem that you want to solve. Read the solution. Apply the solution dir

  10. JavaScript Patterns

    CERN Document Server

    Stefanov, Stoyan

    2010-01-01

    What's the best approach for developing an application with JavaScript? This book helps you answer that question with numerous JavaScript coding patterns and best practices. If you're an experienced developer looking to solve problems related to objects, functions, inheritance, and other language-specific categories, the abstractions and code templates in this guide are ideal -- whether you're writing a client-side, server-side, or desktop application with JavaScript. Written by JavaScript expert Stoyan Stefanov -- Senior Yahoo! Technical and architect of YSlow 2.0, the web page performance

  11. Java Application Shell: A Framework for Piecing Together Java Applications

    Science.gov (United States)

    Miller, Philip; Powers, Edward I. (Technical Monitor)

    2001-01-01

    This session describes the architecture of Java Application Shell (JAS), a Swing-based framework for developing interactive Java applications. Java Application Shell is being developed by Commerce One, Inc. for NASA Goddard Space Flight Center Code 588. The purpose of JAS is to provide a framework for the development of Java applications, providing features that enable the development process to be more efficient, consistent and flexible. Fundamentally, JAS is based upon an architecture where an application is considered a collection of 'plugins'. In turn, a plug-in is a collection of Swing actions defined using XML and packaged in a jar file. Plug-ins may be local to the host platform or remotely-accessible through HTTP. Local and remote plugins are automatically discovered by JAS upon application startup; plugins may also be loaded dynamically without having to re-start the application. Using Extensible Markup Language (XML) to define actions, as opposed to hardcoding them in application logic, allows easier customization of application-specific operations by separating application logic from presentation. Through XML, a developer defines an action that may appear on any number of menus, toolbars, and buttons. Actions maintain and propagate enable/disable states and specify icons, tool-tips, titles, etc. Furthermore, JAS allows actions to be implemented using various scripting languages through the use of IBM's Bean Scripting Framework. Scripted action implementation is seamless to the end-user. In addition to action implementation, scripts may be used for application and unit-level testing. In the case of application-level testing, JAS has hooks to assist a script in simulating end-user input. JAS also provides property and user preference management, JavaHelp, Undo/Redo, Multi-Document Interface, Single-Document Interface, printing, and logging. Finally, Jini technology has also been included into the framework by means of a Jini services browser and the

  12. Alignment of whole genomes.

    Science.gov (United States)

    Delcher, A L; Kasif, S; Fleischmann, R D; Peterson, J; White, O; Salzberg, S L

    1999-01-01

    A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications. PMID:10325427

  13. Java RMI

    CERN Document Server

    Grosso, William

    2002-01-01

    Java RMI contains a wealth of experience in designing and implementing Java's Remote Method Invocation. If you're a novice reader, you will quickly be brought up to speed on why RMI is such a powerful yet easy to use tool for distributed programming, while experts can gain valuable experience for constructing their own enterprise and distributed systems. With Java RMI, you'll learn tips and tricks for making your RMI code excel. The book also provides strategies for working with serialization, threading, the RMI registry, sockets and socket factories, activation, dynamic class downloading,

  14. The Java Legacy Interface

    DEFF Research Database (Denmark)

    Korsholm, Stephan

    2007-01-01

    The Java Legacy Interface is designed to use Java for encapsulating native legacy code on small embedded platforms. We discuss why existing technologies for encapsulating legacy code (JNI) is not sufficient for an important range of small embedded platforms, and we show how the Java Legacy...... Interface offers this previously missing functionality. We describe an implementation of the Java Legacy Interface for a particular virtual machine, and how we have used this virtual machine to integrate Java with an existing, commercial, soft real-time, C/C++ legacy platform....

  15. Java development in MDSplus

    International Nuclear Information System (INIS)

    Barana, O.; Luchetta, A.; Manduchi, G.; Taliercio, C.

    2002-01-01

    This paper describes the new Java components of MDSplus. These tools represent the evolution of some MDSplus components (MDSplus Current Developments and Future Directions, this conference) previously written in C, taking advantage from the multiplatform interoperability provided by the Java framework. The use of Java in the development of these tools provided an impressive reduction in the coding and test time. This is mainly due to the large set of ready-to-use components of the Java framework, and to the effective code re-use which can be achieved in the organization of Java applications

  16. CBESW: Sequence Alignment on the Playstation 3

    Directory of Open Access Journals (Sweden)

    Hieu Nim

    2008-09-01

    Full Text Available Abstract Background The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation® 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. Results For large datasets, our implementation on the PlayStation® 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. Conclusion The results from our experiments demonstrate that the PlayStation® 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.

  17. Alignment efficiency and discomfort of three orthodontic archwire sequences: a randomized clinical trial.

    Science.gov (United States)

    Ong, Emily; Ho, Christopher; Miles, Peter

    2011-03-01

    To compare the efficiency of orthodontic archwire sequences produced by three manufacturers. Prospective, randomized clinical trial with three parallel groups. Private orthodontic practice in Caloundra, QLD, Australia. One hundred and thirty-two consecutive patients were randomized to one of three archwire sequence groups: (i) 3M Unitek, 0·014 inch Nitinol, 0·017 inch × 0·017 inch heat activated Ni-Ti; (ii) GAC international, 0·014 inch Sentalloy, 0·016 × 0·022 inch Bioforce; and (iii) Ormco corporation, 0·014 inch Damon Copper Ni-Ti, 0·014 × 0·025 inch Damon Copper Ni-Ti. All patients received 0·018 × 0·025 inch slot Victory Series™ brackets. Mandibular impressions were taken before the insertion of each archwire. Patients completed discomfort surveys according to a seven-point Likert Scale at 4 h, 24 h, 3 days and 7 days after the insertion of each archwire. Efficiency was measured by time required to reach the working archwire, mandibular anterior alignment and level of discomfort. No significant differences were found in the reduction of irregularity between the archwire sequences at any time-point (T1: P = 0·12; T2: P = 0·06; T3: P = 0·21) or in the time to reach the working archwire (P = 0·28). No significant differences were found in the overall discomfort scores between the archwire sequences (4 h: P = 0·30; 24 h: P = 0·18; 3 days: P = 0·53; 7 days: P = 0·47). When the time-points were analysed individually, the 3M Unitek archwire sequence induced significantly less discomfort than GAC and Ormco archwires 24 h after the insertion of the third archwire (P = 0·02). This could possibly be attributed to the progression in archwire material and archform. The archwire sequences were similar in alignment efficiency and overall discomfort. Progression in archwire dimension and archform may contribute to discomfort levels. This study provides clinical justification for three common archwire sequences in 0·018 × 0·025 inch slot brackets.

  18. MaxAlign: maximizing usable data in an alignment

    DEFF Research Database (Denmark)

    Oliveira, Rodrigo Gouveia; Sackett, Peter Wad; Pedersen, Anders Gorm

    2007-01-01

    Align. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. CONCLUSION: We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also...

  19. Model Checking Real Time Java Using Java PathFinder

    Science.gov (United States)

    Lindstrom, Gary; Mehlitz, Peter C.; Visser, Willem

    2005-01-01

    The Real Time Specification for Java (RTSJ) is an augmentation of Java for real time applications of various degrees of hardness. The central features of RTSJ are real time threads; user defined schedulers; asynchronous events, handlers, and control transfers; a priority inheritance based default scheduler; non-heap memory areas such as immortal and scoped, and non-heap real time threads whose execution is not impeded by garbage collection. The Robust Software Systems group at NASA Ames Research Center has JAVA PATHFINDER (JPF) under development, a Java model checker. JPF at its core is a state exploring JVM which can examine alternative paths in a Java program (e.g., via backtracking) by trying all nondeterministic choices, including thread scheduling order. This paper describes our implementation of an RTSJ profile (subset) in JPF, including requirements, design decisions, and current implementation status. Two examples are analyzed: jobs on a multiprogramming operating system, and a complex resource contention example involving autonomous vehicles crossing an intersection. The utility of JPF in finding logic and timing errors is illustrated, and the remaining challenges in supporting all of RTSJ are assessed.

  20. An evaluation of safety-critical Java on a Java processor

    OpenAIRE

    Rios Rivas, Juan Ricardo; Schoeberl, Martin

    2014-01-01

    The safety-critical Java (SCJ) specification provides a restricted set of the Java language intended for applications that require certification. In order to test the specification, implementations are emerging and the need to evaluate those implementations in a systematic way is becoming important. In this paper we evaluate our SCJ implementation which is based on the Java Optimized Processor JOP and we measure different performance and timeliness criteria relevant to hard real-time systems....

  1. Java The Good Parts

    CERN Document Server

    Waldo, Jim

    2010-01-01

    What if you could condense Java down to its very best features and build better applications with that simpler version? In this book, veteran Sun Labs engineer Jim Waldo reveals which parts of Java are most useful, and why those features make Java among the best programming languages available. Every language eventually builds up crud, Java included. The core language has become increasingly large and complex, and the libraries associated with it have grown even more. Learn how to take advantage of Java's best features by working with an example application throughout the book. You may not l

  2. Java EE 7 first look

    CERN Document Server

    Fabrice, Armel

    2013-01-01

    An easy-to-follow guide to reveal the new features of Java EE 7 and how to efficiently utilize them.Given the main objectives pursued, this book targets three groups of people with a knowledge of the Java language. They are:Beginners in the Java EE platform who would like to have an idea about the main specifications of Java EE 7.Developers who have experimented with previous versions of Java EE and who would like to explore the new features of Java EE 7.Building architects who want to learn how to put together the various Java EE 7 specifications for building robust and secure enterprise appl

  3. Java Swing

    CERN Document Server

    Loy, Marc; Eckstein, Robert; Elliott, James; Wood, Dave

    2003-01-01

    Swing is a fully-featured user interface development kit for Java applications. Building on the foundations of the Abstract Window Toolkit (AWT), Swing enables cross-platform applications to use any of several pluggable look-and-feels. Swing developers can take advantage of its rich, flexible features and modular components, building elegant user interfaces with very little code. This second edition of Java Swing thoroughly covers all the features available in Java 2 SDK 1.3 and 1.4. More than simply a reference, this new edition takes a practical approach. It is a book by developers for

  4. Java Persistence Dengan JBoss Seam

    OpenAIRE

    Utomo, Wiranto Herry; Istiyanto, Jazi Eko

    2009-01-01

    Seam is based on Java EE, so it satisfies its framework duties in two fundamental ways: 1) Seam  simplifies Java EE: Seam provides a number of  shortcuts and  simplifications  to  the standard  Java EE  framework, making  it  even  easier  to  effectively  use  Java EE web  and business components, 2) Seam extends Java EE: Seam integrates a number of new concepts and tools into the Java EE framework. These extensions b...

  5. JavaScript programmer's reference

    CERN Document Server

    Valentine, Thomas

    2013-01-01

    JavaScript Programmer's Reference is an invaluable resource that won't stray far from your desktop (or your tablet!). It contains detailed information on every JavaScript object and command, and combines that reference with practical examples showcasing how you can use those commands in the real world. Whether you're just checking the syntax of a method or you're starting out on the road to JavaScript mastery, the JavaScript Programmer's Reference will be an essential aid.  With a detailed and informative tutorial section giving you the ins and outs of programming with JavaScript and the DOM f

  6. Learning JavaScript

    CERN Document Server

    Powers, Shelley

    2008-01-01

    Packed with best practices and examples of JavaScript use, Learning JavaScript provides complete, no-nonsense coverage of this quirky yet essential language for web development. You'll learn everything from primitive data types to complex features, including JavaScript elements involved with Ajax and dynamic page effects. By the end of the book, you'll be able to work with even the most sophisticated libraries and web applications.

  7. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    Science.gov (United States)

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Hardware Support for Embedded Java

    DEFF Research Database (Denmark)

    Schoeberl, Martin

    2012-01-01

    The general Java runtime environment is resource hungry and unfriendly for real-time systems. To reduce the resource consumption of Java in embedded systems, direct hardware support of the language is a valuable option. Furthermore, an implementation of the Java virtual machine in hardware enables...... worst-case execution time analysis of Java programs. This chapter gives an overview of current approaches to hardware support for embedded and real-time Java....

  9. CSA: An efficient algorithm to improve circular DNA multiple alignment

    Directory of Open Access Journals (Sweden)

    Pereira Luísa

    2009-07-01

    Full Text Available Abstract Background The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools. Results In this paper we propose an efficient algorithm that identifies the most interesting region to cut circular genomes in order to improve phylogenetic analysis when using standard multiple sequence alignment algorithms. This algorithm identifies the largest chain of non-repeated longest subsequences common to a set of circular mitochondrial DNA sequences. All the sequences are then rotated and made linear for multiple alignment purposes. To evaluate the effectiveness of this new tool, three different sets of mitochondrial DNA sequences were considered. Other tests considering randomly rotated sequences were also performed. The software package Arlequin was used to evaluate the standard genetic measures of the alignments obtained with and without the use of the CSA algorithm with two well known multiple alignment algorithms, the CLUSTALW and the MAVID tools, and also the visualization tool SinicView. Conclusion The results show that a circularization and rotation pre-processing step significantly improves the efficiency of public available multiple sequence alignment

  10. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

    Science.gov (United States)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R

    2009-07-01

    The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

  11. JavaScript for Absolute Beginners

    CERN Document Server

    McNavage, T

    2010-01-01

    If you are new to both JavaScript and programming, this hands-on book is for you. Rather than staring blankly at gobbledygook, you'll explore JavaScript by entering and running hundreds of code samples in Firebug, a free JavaScript debugger. Then in the last two chapters, you'll leave the safety of Firebug and hand-code an uber cool JavaScript application in your preferred text editor. Written in a friendly, engaging narrative style, this innovative JavaScript tutorial covers the following essentials: * Core JavaScript syntax, such as value types, operators, expressions, and statements provide

  12. BinAligner: a heuristic method to align biological networks.

    Science.gov (United States)

    Yang, Jialiang; Li, Jun; Grünewald, Stefan; Wan, Xiu-Feng

    2013-01-01

    The advances in high throughput omics technologies have made it possible to characterize molecular interactions within and across various species. Alignments and comparison of molecular networks across species will help detect orthologs and conserved functional modules and provide insights on the evolutionary relationships of the compared species. However, such analyses are not trivial due to the complexity of network and high computational cost. Here we develop a mixture of global and local algorithm, BinAligner, for network alignments. Based on the hypotheses that the similarity between two vertices across networks would be context dependent and that the information from the edges and the structures of subnetworks can be more informative than vertices alone, two scoring schema, 1-neighborhood subnetwork and graphlet, were introduced to derive the scoring matrices between networks, besides the commonly used scoring scheme from vertices. Then the alignment problem is formulated as an assignment problem, which is solved by the combinatorial optimization algorithm, such as the Hungarian method. The proposed algorithm was applied and validated in aligning the protein-protein interaction network of Kaposi's sarcoma associated herpesvirus (KSHV) and that of varicella zoster virus (VZV). Interestingly, we identified several putative functional orthologous proteins with similar functions but very low sequence similarity between the two viruses. For example, KSHV open reading frame 56 (ORF56) and VZV ORF55 are helicase-primase subunits with sequence identity 14.6%, and KSHV ORF75 and VZV ORF44 are tegument proteins with sequence identity 15.3%. These functional pairs can not be identified if one restricts the alignment into orthologous protein pairs. In addition, BinAligner identified a conserved pathway between two viruses, which consists of 7 orthologous protein pairs and these proteins are connected by conserved links. This pathway might be crucial for virus packing and

  13. MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

    Science.gov (United States)

    González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

    2016-12-15

    MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Stochastic sampling of the RNA structural alignment space.

    Science.gov (United States)

    Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H

    2009-07-01

    A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the 'structural alignment' space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The 'best' centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.

  15. Monitoring Java Programs with Java PathExplorer

    Science.gov (United States)

    Havelund, Klaus; Rosu, Grigore; Clancy, Daniel (Technical Monitor)

    2001-01-01

    We present recent work on the development Java PathExplorer (JPAX), a tool for monitoring the execution of Java programs. JPAX can be used during program testing to gain increased information about program executions, and can potentially furthermore be applied during operation to survey safety critical systems. The tool facilitates automated instrumentation of a program's late code which will then omit events to an observer during its execution. The observer checks the events against user provided high level requirement specifications, for example temporal logic formulae, and against lower level error detection procedures, for example concurrency related such as deadlock and data race algorithms. High level requirement specifications together with their underlying logics are defined in the Maude rewriting logic, and then can either be directly checked using the Maude rewriting engine, or be first translated to efficient data structures and then checked in Java.

  16. Model Checking JAVA Programs Using Java Pathfinder

    Science.gov (United States)

    Havelund, Klaus; Pressburger, Thomas

    2000-01-01

    This paper describes a translator called JAVA PATHFINDER from JAVA to PROMELA, the "programming language" of the SPIN model checker. The purpose is to establish a framework for verification and debugging of JAVA programs based on model checking. This work should be seen in a broader attempt to make formal methods applicable "in the loop" of programming within NASA's areas such as space, aviation, and robotics. Our main goal is to create automated formal methods such that programmers themselves can apply these in their daily work (in the loop) without the need for specialists to manually reformulate a program into a different notation in order to analyze the program. This work is a continuation of an effort to formally verify, using SPIN, a multi-threaded operating system programmed in Lisp for the Deep-Space 1 spacecraft, and of previous work in applying existing model checkers and theorem provers to real applications.

  17. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies.

    Directory of Open Access Journals (Sweden)

    Patrick D Schloss

    Full Text Available Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of beta-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results

  18. An evaluation of safety-critical Java on a Java processor

    DEFF Research Database (Denmark)

    Rios Rivas, Juan Ricardo; Schoeberl, Martin

    2014-01-01

    The safety-critical Java (SCJ) specification provides a restricted set of the Java language intended for applications that require certification. In order to test the specification, implementations are emerging and the need to evaluate those implementations in a systematic way is becoming important....... In this paper we evaluate our SCJ implementation which is based on the Java Optimized Processor JOP and we measure different performance and timeliness criteria relevant to hard real-time systems. Our implementation targets Level 0 and Level1 of the specification and to test it we use a series of micro...

  19. Scala for Java developers

    CERN Document Server

    Alexandre, Thomas

    2014-01-01

    This step-by-step guide is full of easy-to-follow code taken from real-world examples explaining the migration and integration of Scala in a Java project. If you are a Java developer or a Java architect, working in Java EE-based solutions and want to start using Scala in your daily programming, this book is ideal for you. This book will get you up and running quickly by adopting a pragmatic approach with real-world code samples. No prior knowledge of Scala is required.

  20. Alignment methods: strategies, challenges, benchmarking, and comparative overview.

    Science.gov (United States)

    Löytynoja, Ari

    2012-01-01

    Comparative evolutionary analyses of molecular sequences are solely based on the identities and differences detected between homologous characters. Errors in this homology statement, that is errors in the alignment of the sequences, are likely to lead to errors in the downstream analyses. Sequence alignment and phylogenetic inference are tightly connected and many popular alignment programs use the phylogeny to divide the alignment problem into smaller tasks. They then neglect the phylogenetic tree, however, and produce alignments that are not evolutionarily meaningful. The use of phylogeny-aware methods reduces the error but the resulting alignments, with evolutionarily correct representation of homology, can challenge the existing practices and methods for viewing and visualising the sequences. The inter-dependency of alignment and phylogeny can be resolved by joint estimation of the two; methods based on statistical models allow for inferring the alignment parameters from the data and correctly take into account the uncertainty of the solution but remain computationally challenging. Widely used alignment methods are based on heuristic algorithms and unlikely to find globally optimal solutions. The whole concept of one correct alignment for the sequences is questionable, however, as there typically exist vast numbers of alternative, roughly equally good alignments that should also be considered. This uncertainty is hidden by many popular alignment programs and is rarely correctly taken into account in the downstream analyses. The quest for finding and improving the alignment solution is complicated by the lack of suitable measures of alignment goodness. The difficulty of comparing alternative solutions also affects benchmarks of alignment methods and the results strongly depend on the measure used. As the effects of alignment error cannot be predicted, comparing the alignments' performance in downstream analyses is recommended.

  1. Reusable libraries for safety-critical Java

    DEFF Research Database (Denmark)

    Rios Rivas, Juan Ricardo; Schoeberl, Martin

    2014-01-01

    The large collection of Java class libraries is a main factor of the success of Java. However, these libraries assume that a garbage-collected heap is used. Safety-critical Java uses scope-based memory areas instead of a garbage-collected heap. Therefore, the Java class libraries are problematic...... to use in safety-critical Java. We have identified common programming patterns in the Java class libraries that make them unsuitable for safety-critical Java. We propose ways to improve the libraries to avoid the impact of the identified problematic patterns. We illustrate these changes by implementing...

  2. Beginning Java' and Flex Migrating Java, Spring, Hibernate and Maven Developers to Adobe Flex

    CERN Document Server

    di Pisa, F

    2009-01-01

    Over the past few years, the now open source Adobe Flex Framework has been adopted by the Java community as the preferred framework for Java RIAs using Flash for the presentation layer. Flex helps Java developers to build and maintain expressive web/desktop applications that deploy consistently on all major browsers, desktops, and operating systems. Beginning Java and Flex describes new, simpler, and faster ways to develop enterprise RIAs. This book is not only for Java or Flex developers, but also for all web developers who want to increase their productivity and the quality of their developm

  3. Schedulability Analysis for Java Finalizers

    DEFF Research Database (Denmark)

    Bøgholm, Thomas; Hansen, Rene Rydhof; Søndergaard, Hans

    2010-01-01

    Java finalizers perform clean-up and finalisation of objects at garbage collection time. In real-time Java profiles the use of finalizers is either discouraged (RTSJ, Ravenscar Java) or even disallowed (JSR-302), mainly because of the unpredictability of finalizers and in particular their impact...... on the schedulability analysis. In this paper we show that a controlled scoped memory model results in a structured and predictable execution of finalizers, more reminiscent of C++ destructors than Java finalizers. Furthermore, we incorporate finalizers into a (conservative) schedulability analysis for Predictable Java...... programs. Finally, we extend the SARTS tool for automated schedulability analysis of Java bytecode programs to handle finalizers in a fully automated way....

  4. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment.

    Science.gov (United States)

    Baichoo, Shakuntala; Ouzounis, Christos A

    A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. GenNon-h: Generating multiple sequence alignments on nonhomogeneous phylogenetic trees

    Directory of Open Access Journals (Sweden)

    Kedzierska Anna M

    2012-08-01

    Full Text Available Abstract Background A number of software packages are available to generate DNA multiple sequence alignments (MSAs evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages. Results We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site, the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. Conclusion The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.

  6. STELLAR: fast and exact local alignments

    Directory of Open Access Journals (Sweden)

    Weese David

    2011-10-01

    Full Text Available Abstract Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de.

  7. BPEL and Java cookbook

    CERN Document Server

    Laznik, Jurij

    2013-01-01

    The book is written in a Cookbook format with practical recipes aimed at helping you extend BPEL capabilities with Java.This book is aimed at Java developers who use BPEL programming to develop web services in SOA development. It is assumed that the readers are experienced with Java programming and SOA, but knowledge of BPEL is not necessarily required.

  8. Analysis of variables affecting unemployment rate and detecting for cluster in West Java, Central Java, and East Java in 2012

    Science.gov (United States)

    Samuel, Putra A.; Widyaningsih, Yekti; Lestari, Dian

    2016-02-01

    The objective of this study is modeling the Unemployment Rate (UR) in West Java, Central Java, and East Java, with rate of disease, infant mortality rate, educational level, population size, proportion of married people, and GDRP as the explanatory variables. Spatial factors are also considered in the modeling since the closer the distance, the higher the correlation. This study uses the secondary data from BPS (Badan Pusat Statistik). The data will be analyzed using Moran I test, to obtain the information about spatial dependence, and using Spatial Autoregressive modeling to obtain the information, which variables are significant affecting UR and how great the influence of the spatial factors. The result is, variables proportion of married people, rate of disease, and population size are related significantly to UR. In all three regions, the Hotspot of unemployed will also be detected districts/cities using Spatial Scan Statistics Method. The results are 22 districts/cities as a regional group with the highest unemployed (Most likely cluster) in the study area; 2 districts/cities as a regional group with the highest unemployed in West Java; 1 district/city as a regional groups with the highest unemployed in Central Java; 15 districts/cities as a regional group with the highest unemployed in East Java.

  9. A predictable Java profile

    DEFF Research Database (Denmark)

    Bøgholm, Thomas; Hansen, Rene Rydhof; Ravn, Anders Peter

    2009-01-01

    A Java profile suitable for development of high integrity embedded systems is presented. It is based on event handlers which are grouped in missions and equipped with respectively private handler memory and shared mission memory. This is a result of our previous work on developing a Java profile......, and is directly inspired by interactions with the Open Group on their on-going work on a safety critical Java profile (JSR-302). The main contribution is an arrangement of the class hierarchy such that the proposal is a generalization of Real-Time Specification for Java (RTSJ). A further contribution...

  10. Pro JavaScript with MooTools

    CERN Document Server

    Obcena, Mark Joseph

    2010-01-01

    Pro JavaScript with MooTools is unlike any other JavaScript book on the market today. While similar books focus on either JavaScript as a language of the browser or how to use JavaScript frameworks, Pro JavaScript with MooTools fills the gap between these topics and moves beyond - exploring the advanced features of JavaScript and how the MooTools framework uses these features to further improve the language itself. The book itself takes a unique three-pronged approach. It first walks you through the advanced features of JavaScript and the MooTools framework, including native augmentation and t

  11. Java Programming Language

    Science.gov (United States)

    Shaykhian, Gholam Ali

    2007-01-01

    The Java seminar covers the fundamentals of Java programming language. No prior programming experience is required for participation in the seminar. The first part of the seminar covers introductory concepts in Java programming including data types (integer, character, ..), operators, functions and constants, casts, input, output, control flow, scope, conditional statements, and arrays. Furthermore, introduction to Object-Oriented programming in Java, relationships between classes, using packages, constructors, private data and methods, final instance fields, static fields and methods, and overloading are explained. The second part of the seminar covers extending classes, inheritance hierarchies, polymorphism, dynamic binding, abstract classes, protected access. The seminar conclude by introducing interfaces, properties of interfaces, interfaces and abstract classes, interfaces and cailbacks, basics of event handling, user interface components with swing, applet basics, converting applications to applets, the applet HTML tags and attributes, exceptions and debugging.

  12. JavaScript The Definitive Guide

    CERN Document Server

    Flanagan, David

    2011-01-01

    Since 1996, JavaScript: The Definitive Guide has been the bible for JavaScript programmers-a programmer's guide and comprehensive reference to the core language and to the client-side JavaScript APIs defined by web browsers. The 6th edition covers HTML5 and ECMAScript 5. Many chapters have been completely rewritten to bring them in line with today's best web development practices. New chapters in this edition document jQuery and server side JavaScript. It's recommended for experienced programmers who want to learn the programming language of the Web, and for current JavaScript programmers wh

  13. Integrating R and Java for Enhancing Interactivity of Algorithmic Data Analysis Software Solutions

    Directory of Open Access Journals (Sweden)

    Titus Felix FURTUNĂ

    2016-06-01

    Full Text Available Conceiving software solutions for statistical processing and algorithmic data analysis involves handling diverse data, fetched from various sources and in different formats, and presenting the results in a suggestive, tailorable manner. Our ongoing research aims to design programming technics for integrating R developing environment with Java programming language for interoperability at a source code level. The goal is to combine the intensive data processing capabilities of R programing language, along with the multitude of statistical function libraries, with the flexibility offered by Java programming language and platform, in terms of graphical user interface and mathematical function libraries. Both developing environments are multiplatform oriented, and can complement each other through interoperability. R is a comprehensive and concise programming language, benefiting from a continuously expanding and evolving set of packages for statistical analysis, developed by the open source community. While is a very efficient environment for statistical data processing, R platform lacks support for developing user friendly, interactive, graphical user interfaces (GUIs. Java on the other hand, is a high level object oriented programming language, which supports designing and developing performant and interactive frameworks for general purpose software solutions, through Java Foundation Classes, JavaFX and various graphical libraries. In this paper we treat both aspects of integration and interoperability that refer to integrating Java code into R applications, and bringing R processing sequences into Java driven software solutions. Our research has been conducted focusing on case studies concerning pattern recognition and cluster analysis.

  14. PriFi - Using a Multiple Alignment of Related Sequences to Find Primers for  Amplification of Homologs

    DEFF Research Database (Denmark)

    Fredslund, Jakob; Schauser, Leif; Madsen, Lene Heegaard

    2005-01-01

    Using a comparative approach, the web program PriFi (http://cgi-www.daimi.au.dk/cgi-chili/PriFi/main) designs pairs of primers useful for PCR amplification of genomic DNA in species where prior sequence information is not available. The program works with an alignment of DNA sequences from...... of a procedure for developing general markers serving as common anchor loci across species. To accommodate users with special preferences, configuration settings and criteria can be customized....

  15. Functional-light JavaScript balanced, pragmatic FP in JavaScript

    CERN Document Server

    Simpson, Kyle

    2017-01-01

    Functional-Light JavaScript is a balanced, pragmatic exploration of Functional Programming in JavaScript. Functional Programming (FP) is an incredibly powerful paradigm for structuring code that yields more robust, verifiable, and readable programs. If you've ever tried to learn FP but struggled with terms like "monad", mathematical concepts like category theory, or symbols like (lambda), you're not alone. Functional-Light programming distills the most vital aspects of FP—function purity, value immutability, composition, and more!—down to approachable JavaScript patterns. Rather than the all-or-nothing dogmatism often encountered in FP, this book teaches you how to improve your programs line by line.

  16. Communicating Java Threads

    NARCIS (Netherlands)

    Hilderink, G.H.; Broenink, Johannes F.; Vervoort, Wiek; Bakkers, André; Bakkers, A.

    The incorporation of multithreading in Java may be considered a significant part of the Java language, because it provides udimentary facilities for concurrent programming. However, we belief that the use of channels is a fundamental concept for concurrent programming. The channel approach as

  17. Java and Mac OS X

    CERN Document Server

    Davis, T Gene

    2010-01-01

    Learn the guidelines of integrating Java with native Mac OS X applications with this Devloper Reference book. Java is used to create nearly every type of application that exists and is one of the most required skills of employers seeking computer programmers. Java code and its libraries can be integrated with Mac OS X features, and this book shows you how to do just that. You'll learn to write Java programs on OS X and you'll even discover how to integrate them with the Cocoa APIs.: Shows how Java programs can be integrated with any Mac OS X feature, such as NSView widgets or screen savers; Re

  18. Java performance tuning

    CERN Document Server

    Shirazi, Jack

    2003-01-01

    Performance has been an important issue for Java developers ever since the first version hit the streets. Over the years, Java performance has improved dramatically, but tuning is essential to get the best results, especially for J2EE applications. You can never have code that runs too fast. Java Peformance Tuning, 2nd edition provides a comprehensive and indispensable guide to eliminating all types of performance problems. Using many real-life examples to work through the tuning process in detail, JPT shows how tricks such as minimizing object creation and replacing strings with arrays can

  19. Translating Colored Control Flow Nets into Readable Java via Annotated Java Workflow Nets

    DEFF Research Database (Denmark)

    Lassen, Kristian Bisgaard; Tjell, Simon

    2007-01-01

    In this paper, we present a method for developing Java applications from Colored Control Flow Nets (CCFNs), which is a special kind of Colored Petri Nets (CPNs) that we introduce. CCFN makes an explicit distinction between the representation of: The system, the environment of the system, and the ......In this paper, we present a method for developing Java applications from Colored Control Flow Nets (CCFNs), which is a special kind of Colored Petri Nets (CPNs) that we introduce. CCFN makes an explicit distinction between the representation of: The system, the environment of the system......, and the interface between the system and the environment. Our translation maps CCFNs into Anno- tated Java Workflow Nets (AJWNs) as an intermediate step, and these AJWNs are finally mapped to Java. CCFN is intended to enforce the modeler to describe the system in an imperative manner which makes the subsequent...... translation to Java easier to define. The translation to Java preserves data dependencies and control-flow aspects of the source CCFN. This paper contributes to the model-driven software development paradigm, by showing how to model a system, environment, and their interface, as a CCFN and presenting a fully...

  20. Hardware Objects for Java

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Thalinger, Christian; Korsholm, Stephan

    2008-01-01

    Java, as a safe and platform independent language, avoids access to low-level I/O devices or direct memory access. In standard Java, low-level I/O it not a concern; it is handled by the operating system. However, in the embedded domain resources are scarce and a Java virtual machine (JVM) without...... an underlying middleware is an attractive architecture. When running the JVM on bare metal, we need access to I/O devices from Java; therefore we investigate a safe and efficient mechanism to represent I/O devices as first class Java objects, where device registers are represented by object fields. Access...... to those registers is safe as Java’s type system regulates it. The access is also fast as it is directly performed by the bytecodes getfield and putfield. Hardware objects thus provide an object-oriented abstraction of low-level hardware devices. As a proof of concept, we have implemented hardware objects...

  1. Mastering JavaScript high performance

    CERN Document Server

    Adams, Chad R

    2015-01-01

    If you are a JavaScript developer with some experience in development and want to increase the performance of JavaScript projects by building faster web apps, then this book is for you. You should know the basic concepts of JavaScript.

  2. Implementation of Private Cloud Computing Using Integration of JavaScript and Python

    Directory of Open Access Journals (Sweden)

    2010-09-01

    Full Text Available

    align: justify; margin: 0cm 0cm 0pt" class="MsoNormal">This paper deals with the design and deployment of a novel library class in Python, enabling the use of JavaScript functionalities in Application Programming and the leveraging of this Library into development for third generation technologies such as Private Cloud Computing. The integration of these two prevalent languages provides us with a new level of compliance which helps in developing an understanding between Web Programming and Application Programming. An inter-browser functionality wrapping, which would enable users to have a JavaScript experience in Python interfaces directly, without having to depend on external programs, has been developed. The functionality of this concept is prevalent in the fact that Applications written in JavaScript and accessed on the browser now have the capability of interacting with each other on a common platform with the help of a Python wrapper. The idea is demonstrated by the integrating with the now ubiquitous Cloud Computing concept. With the help of examples, we have showcased the same and explained how the Library XOCOM can be a stepping stone to flexible cloud computing environment.

  3. Isolation and identification of java race amniotic membrane secretory leukocyte protease inhibitor gene

    Directory of Open Access Journals (Sweden)

    Elly Munadziroh

    2008-09-01

    Full Text Available Background: Secretory leukocyte protease inhibitor (SLPI has been found to facilitate epithelialization, maintain a normal epithelial phenotype, reduce inflammation, secrete growth factors such as IL-4, IL-6, IL-10, EGF, FGF, TGF, HGFand 2-microbulin. SLPI is serine protease inhibitor, which found in secretions such as whole saliva, seminal fluid, cervical mucus, synovial fluid, breast milk, tears, amniotic fluid and amniotic membrane. Impaired healing states are characterized by excessive proteolysis and oftenbacterial infection, leading to the hypothesis that SLPI may have a role in the healing process in oral inflammation and contributes to tissue repair in oral mucosa. The oral wound healing response is impaired in the SLPI sufficient mice since matrix synthesis and collagen deposition delayed. The objective of this research is to isolate and identify the amniotic membrane of Java Race SLPI Gene. Methods: SLPI RNA was isolated from Java Race amniotic membrane and the cDNA was amplified by polymerase chain reaction (PCR. Result: Through sequence analyses, SLPI cDNA was 530 nucleotide in length with a predicted molecular mass about 12 kDa. The nucleotide sequence showed that human SLPI from sample was 98% identical with human SLPI from gene bank. PCR analysis revealed that the mRNA of SLPI was highly expressed in the amniotic membrane from Java Race sample. Conclusion: it is demonstrated that human SLPI are highly conserved in sequence content as compared to the human SLPI from gene.

  4. Mango: multiple alignment with N gapped oligos.

    Science.gov (United States)

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2008-06-01

    Multiple sequence alignment is a classical and challenging task. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state-of-the-art works suffer from the "once a gap, always a gap" phenomenon. Is there a radically new way to do multiple sequence alignment? In this paper, we introduce a novel and orthogonal multiple sequence alignment method, using both multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole and tries to build the alignment vertically, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds have proved significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks, showing that MANGO compares favorably, in both accuracy and speed, against state-of-the-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, ProbConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0, and Kalign 2.0. We have further demonstrated the scalability of MANGO on very large datasets of repeat elements. MANGO can be downloaded at http://www.bioinfo.org.cn/mango/ and is free for academic usage.

  5. Memory Management for Safety-Critical Java

    DEFF Research Database (Denmark)

    Schoeberl, Martin

    2011-01-01

    Safety-Critical Java (SCJ) is based on the Real-Time Specification for Java. To simplify the certification of Java programs, SCJ supports only a restricted scoped memory model. Individual threads share only immortal memory and the newly introduced mission memory. All other scoped memories...... implementation is evaluated on an embedded Java processor....

  6. Safety-critical Java for embedded systems

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Dalsgaard, Andreas Engelbredt; Hansen, René Rydhof

    2016-01-01

    This paper presents the motivation for and outcomes of an engineering research project on certifiable Javafor embedded systems. The project supports the upcoming standard for safety-critical Java, which defines asubset of Java and libraries aiming for development of high criticality systems....... The outcome of this projectinclude prototype safety-critical Java implementations, a time-predictable Java processor, analysis tools formemory safety, and example applications to explore the usability of safety-critical Java for this applicationarea. The text summarizes developments and key contributions...

  7. Refactoring Real-Time Java Profiles

    DEFF Research Database (Denmark)

    Søndergaard, Hans; Thomsen, Bent; Ravn, Anders Peter

    2011-01-01

    Just like other software, Java profiles benefits from refactoring when they have been used and have evolved for some time. This paper presents a refactoring of the Real-Time Specification for Java (RTSJ) and the Safety Critical Java (SCJ) profile (JSR-302). It highlights core concepts and makes...

  8. JavaFX' Special Effects Taking Java RIA to the Extreme with Animation, Multimedia, and Game Elements

    CERN Document Server

    Jordon, L

    2009-01-01

    Enough about learning the fundamentals of the intriguing JavaFX platform; it's now time to start implementing visually stunning and dynamic Java-based rich Internet applications (RIAs) for your desktop or mobile front end. This book will show you what the JavaFX platform can really do for Java desktop and mobile front ends. It presents a number of excellent visual effects and techniques that will make any JavaFX application stand out-whether it's animation, multimedia, or a game. The techniques shown in this book are invaluable for competing in today's market, and they'll help set your RIAs ap

  9. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  10. Java problem-based learning

    Directory of Open Access Journals (Sweden)

    Goran P, Šimić

    2012-01-01

    Full Text Available The paper describes the self-directed problem-based learning system (PBL named Java PBL. The expert module is the kernel of Java PBL. It involves a specific domain model, a problem generator and a solution generator. The overall system architecture is represented in the paper. Java PBL can act as the stand-alone system, but it is also designed to provide support to learning management systems (LMSs. This is provided by a modular design of the system. An LMS can offer the declarative knowledge only. Java PBL offers the procedural knowledge and the progress of the learner programming skills. The free navigation, unlimited numbers of problems and recommendations represent the main pedagogical strategies and tactics implemented into the system.

  11. Long Read Alignment with Parallel MapReduce Cloud Platform

    Science.gov (United States)

    Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

    2015-01-01

    Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887

  12. DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

    Science.gov (United States)

    Ma, Wenxiu; Yang, Lin; Rohs, Remo; Noble, William Stafford

    2017-10-01

    Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values. The software is available at https://bitbucket.org/wenxiu/sequence-shape.git. rohs@usc.edu or william-noble@uw.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  13. A Ravenscar-Java profile implementation

    DEFF Research Database (Denmark)

    Thomsen, Bent; Ravn, Anders Peter; Søndergaard, Hans

    2006-01-01

    This paper presents an implementation of the Ravenscar-Java profile. While most implementations of the profile are reference-implementations showing that it is possible to implement the profile, our implementation is aimed at industrial applications. It uses a dedicated real-time Java processor......, since we want to investigate if the Ravenscar-Java profile, implemented on a Java processor, is efficient for real applications. During the implementation some ambiguities and weaknesses of the profile were uncovered. However, test examples indicate that the profile is suitable for development...... of realistic real-time programs....

  14. Professional Java EE design patterns

    CERN Document Server

    Yener, Murat

    2014-01-01

    Master Java EE design pattern implementation to improve your design skills and your application's architecture Professional Java EE Design Patterns is the perfect companion for anyone who wants to work more effectively with Java EE, and the only resource that covers both the theory and application of design patterns in solving real-world problems. The authors guide readers through both the fundamental and advanced features of Java EE 7, presenting patterns throughout, and demonstrating how they are used in day-to-day problem solving. As the most popular programming language in community-dri

  15. Essential Java for Scientists and Engineers

    CERN Document Server

    Hahn, Brian D; Malan, Katherine M

    2003-01-01

    Essential Java serves as an introduction to the programming language, Java, for scientists and engineers, and can also be used by experienced programmers wishing to learn Java as an additional language. The book focuses on how Java, and object-oriented programming, can be used to solve science and engineering problems. Many examples are included from a number of different scientific and engineering areas, as well as from business and everyday life. Pre-written packages of code are provided to help in such areas as input/output, matrix manipulation and scientific graphing. Java source code and

  16. A cross-species alignment tool (CAT)

    DEFF Research Database (Denmark)

    Li, Heng; Guan, Liang; Liu, Tao

    2007-01-01

    BACKGROUND: The main two sorts of automatic gene annotation frameworks are ab initio and alignment-based, the latter splitting into two sub-groups. The first group is used for intra-species alignments, among which are successful ones with high specificity and speed. The other group contains more...... sensitive methods which are usually applied in aligning inter-species sequences. RESULTS: Here we present a new algorithm called CAT (for Cross-species Alignment Tool). It is designed to align mRNA sequences to mammalian-sized genomes. CAT is implemented using C scripts and is freely available on the web...... at http://xat.sourceforge.net/. CONCLUSIONS: Examined from different angles, CAT outperforms other extant alignment tools. Tested against all available mouse-human and zebrafish-human orthologs, we demonstrate that CAT combines the specificity and speed of the best intra-species algorithms, like BLAT...

  17. Benchmarking JavaScript Frameworks

    OpenAIRE

    Mariano, Carl Lawrence

    2017-01-01

    JavaScript programming language has been in existence for many years already and is one of the most widely known, if not, the most used front-end programming language in web development. However, JavaScript is still evolving and with the emergence of JavaScript Frameworks (JSF), there has been a major change in how developers develop software nowadays. Developers these days often use more than one framework in order to fulfil their job which has given rise to the problem for developers when i...

  18. Beginning Programming with Java For Dummies

    CERN Document Server

    Burd, Barry

    2012-01-01

    One of the most popular beginning programming books, now fully updated Java is a popular language for beginning programmers, and earlier editions of this fun and friendly guide have helped thousands get started. Now fully revised to cover recent updates for Java 7.0, Beginning Programming with Java For Dummies, 3rd Edition is certain to put more first-time programmers and Java beginners on the road to Java mastery.Explores what goes into creating a program, putting the pieces together, dealing with standard programming challenges, debugging, and making the program work Offers new options for

  19. JavaScript programming pushing the limits

    CERN Document Server

    Raasch, Jon

    2013-01-01

    Take your JavaScript knowledge as far as it can go JavaScript has grown up, and it's a hot topic. Newer and faster JavaScript VMs and frameworks built upon them have increased the popularity of JavaScript for server-side web applications, and rich JS applications are being developed for mobile devices. This book delivers a compelling tutorial, showing you how to build a real-world app from the ground up. Experienced developers who want to master the latest techniques and redefine their skills will find this deep dive into JavaScript's hidden functionalities gives them the tools to

  20. Professional JavaScript for Web Developers

    CERN Document Server

    Zakas, Nicholas C

    2011-01-01

    A significant update to a bestselling JavaScript book As the key scripting language for the web, JavaScript is supported by every modern web browser and allows developers to create client-side scripts that take advantage of features such as animating the canvas tag and enabling client-side storage and application caches. After an in-depth introduction to the JavaScript language, this updated edition of a bestseller progresses to break down how JavaScript is applied for web development using the latest web development technologies. Veteran author and JavaScript guru Nicholas Zakas shows how Jav

  1. Java 7 A Beginner's Tutorial

    CERN Document Server

    Kurniawan, Budi

    2011-01-01

    A Books24x7's TOP 10 title for 4 consecutive years! Java is an easy language to learn. However, you need to master more than the language syntax to be a professional Java programmer. For one, object-oriented programming (OOP) skill is key to developing robust and effective Java applications. In addition, knowing how to use the vast collection of libraries makes development more rapid. This book introduces you to important programming concepts and teaches how to use the Java core libraries. It is a guide to building real-world applications, both desktop and Web-based. The coverage is the

  2. GATA: A graphic alignment tool for comparative sequenceanalysis

    Energy Technology Data Exchange (ETDEWEB)

    Nix, David A.; Eisen, Michael B.

    2005-01-01

    Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.

  3. Java Processor Optimized for RTSJ

    Directory of Open Access Journals (Sweden)

    Tu Shiliang

    2007-01-01

    Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.

  4. Static Analysis for JavaScript

    DEFF Research Database (Denmark)

    Jensen, Simon Holm

    . This dissertation describes the design and implementation of a static analysis for JavaScript that can assist programmers in finding bugs in code during development. We describe the design of a static analysis tool for JavaScript, built using the monotone framework. This analysis infers detailed type information......Web applications present unique challenges to designers of static analysis tools. One of these challenges is the language JavaScript used for client side scripting in the browser. JavaScript is a complex language with many pitfalls and poor tool support compared to other languages...... about programs. This information can be used to detect bugs such as null pointer dereferences and unintended type coercions. The analysis is sound, enabling it to prove the absence of certain program errors. JavaScript is usually run within the context of the browser and the DOM API. The major...

  5. Isolating and Restricting Client-Side JavaScript:Isoleren en beperken van JavaScript aan de cliëntzijde

    OpenAIRE

    Van Acker, Steven

    2015-01-01

    In today’s web applications, no one disputes the important role of JavaScript asa client-side programming language. JavaScript can turn the Web into a lively,dynamic and interactive end-user experience. Unfortunately, JavaScript canalso be used to steal sensitive information and abuse powerful functionality. Sloppy input validation can make a web application vulnerable, allowingmalicious JavaScript code to leak into a web application’s JavaScript executionenvironment, where it leads to un...

  6. DIDA: Distributed Indexing Dispatched Alignment.

    Directory of Open Access Journals (Sweden)

    Hamid Mohamadi

    Full Text Available One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA, and is free for academic use.

  7. Improving your target-template alignment with MODalign

    OpenAIRE

    Barbato, Alessandro; Benkert, Pascal; Schwede, Torsten; Tramontano, Anna; Kosinski, Jan

    2012-01-01

    Summary: MODalign is an interactive web-based tool aimed at helping protein structure modelers to inspect and manually modify the alignment between the sequences of a target protein and of its template(s). It interactively computes, displays and, upon modification of the target-template alignment, updates the multiple sequence alignments of the two protein families, their conservation score, secondary structure and solvent accessibility values, and local quality scores of the implied three-di...

  8. JGromacs: a Java package for analyzing protein simulations.

    Science.gov (United States)

    Münz, Márton; Biggin, Philip C

    2012-01-23

    In this paper, we introduce JGromacs, a Java API (Application Programming Interface) that facilitates the development of cross-platform data analysis applications for Molecular Dynamics (MD) simulations. The API supports parsing and writing file formats applied by GROMACS (GROningen MAchine for Chemical Simulations), one of the most widely used MD simulation packages. JGromacs builds on the strengths of object-oriented programming in Java by providing a multilevel object-oriented representation of simulation data to integrate and interconvert sequence, structure, and dynamics information. The easy-to-learn, easy-to-use, and easy-to-extend framework is intended to simplify and accelerate the implementation and development of complex data analysis algorithms. Furthermore, a basic analysis toolkit is included in the package. The programmer is also provided with simple tools (e.g., XML-based configuration) to create applications with a user interface resembling the command-line interface of GROMACS applications. JGromacs and detailed documentation is freely available from http://sbcb.bioch.ox.ac.uk/jgromacs under a GPLv3 license .

  9. Wedge geometry, frictional properties and interseismic coupling of the Java megathrust

    Science.gov (United States)

    Koulali, Achraf; McClusky, Simon; Cummins, Phil; Tregoning, Paul

    2018-06-01

    The mechanical interaction between rocks at fault zones is a key element for understanding how earthquakes nucleate and propagate. Therefore, estimating frictional properties along fault planes allows us to infer the degree of elastic strain accumulation throughout the seismic cycle. The Java subduction zone is an active plate boundary where high seismic activity has long been documented. However, very little is known about the seismogenic processes of the megathrust, especially its shallowest portion where onshore geodetic networks are insensitive to recover the pattern of elastic strain. Here, we use the geometry of the offshore accretionary prism to infer frictional properties along the Java subduction zone, using Coulomb critical taper theory. We show that large portions of the inner wedge in the eastern part of the Java subduction megathrust are in a critical state, where the wedge is on the verge of failure everywhere. We identify four clusters with an internal coefficient of friction μint of ∼ 0.8 and hydrostatic pore pressure within the wedge. The average effective coefficient of friction ranges between 0.3 and 0.4, reflecting a strong décollement. Our results also show that the aftershock sequence of the 1994 Mw 7.9 earthquake halted adjacent to a critical segment of the wedge, suggesting that critical taper wedge areas in the eastern Java subduction interface may behave as a permanent barrier to large earthquake rupture. In contrast, in western Java topographic slope and slab dip profiles suggest that the wedge is mechanically stable, i.e deformation is restricted to sliding along the décollement, and likely to coincide with a seismogenic portion of the megathrust. We discuss the seismic hazard implications and highlight the importance of considering the segmentation of the Java subduction zone when assessing the seismic hazard of this region.

  10. Checking Java Programs

    CERN Document Server

    Darwin, Ian

    2007-01-01

    This Short Cut tells you about tools that will improve the quality of your Java code, using checking above and beyond what the standard tools do, including: Using javac options, JUnit and assertions Making your IDE work harder Checking your source code with PMD Checking your compiled code (.class files) with FindBugs Checking your program's run-time behavior with Java PathFinder

  11. High Performance JavaScript

    CERN Document Server

    Zakas, Nicholas

    2010-01-01

    If you're like most developers, you rely heavily on JavaScript to build interactive and quick-responding web applications. The problem is that all of those lines of JavaScript code can slow down your apps. This book reveals techniques and strategies to help you eliminate performance bottlenecks during development. You'll learn how to improve execution time, downloading, interaction with the DOM, page life cycle, and more. Yahoo! frontend engineer Nicholas C. Zakas and five other JavaScript experts -- Ross Harmes, Julien Lecomte, Steven Levithan, Stoyan Stefanov, and Matt Sweeney -- demonstra

  12. JavaScript Web Applications

    CERN Document Server

    MacCaw, Alex

    2011-01-01

    Building rich JavaScript applications that bring a desktop experience to the Web requires moving state from the server to the client side-not a simple task. This hands-on book takes proficient JavaScript developers through all the steps necessary to create state-of-the-art applications, including structure, templating, frameworks, communicating with the server, and many other issues. Throughout the book, you'll work with real-world example applications to help you grasp the concepts involved. Learn how to create JavaScript applications that offer a more responsive and improved experience. U

  13. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

    Science.gov (United States)

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-06-15

    Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available

  14. Beginning Java- me platform

    CERN Document Server

    Rischpater, Ray

    2008-01-01

    Empowering developers with the flexibility and power to start building Java applications for their Java-enabled mobile device or cell phone, this book covers sound HTTPS support, user interface API enhancements, the Mobile Media API, the Game API, and more.

  15. Big Java late objects

    CERN Document Server

    Horstmann, Cay S

    2012-01-01

    Big Java: Late Objects is a comprehensive introduction to Java and computer programming, which focuses on the principles of programming, software engineering, and effective learning. It is designed for a two-semester first course in programming for computer science students.

  16. Pro JavaScript for web apps

    CERN Document Server

    Freeman, Adam

    2012-01-01

    JavaScript is the engine behind every web app, and a solid knowledge of it is essential for all modern web developers. Pro JavaScript for Web Apps gives you all of the information that you need to create professional, optimized, and efficient JavaScript applications that will run across all devices. It takes you through all aspects of modern JavaScript application creation, showing you how to combine JavaScript with the new features of HTML5 and CSS3 to make the most of the new web technologies. The focus of the book is on creating professional web applications, ensuring that your app provides

  17. Instant web scraping with Java

    CERN Document Server

    Mitchell, Ryan

    2013-01-01

    This book is full of short, concise recipes to learn a variety of useful web scraping techniques using Java. You will start with a simple basic recipe of setting up your Java environment and gradually learn some more advanced recipes such as using complex Scrapers.Instant Web Scraping with Java is aimed at developers who, while not necessarily familiar with Java, are at least ready to dive into the complexities of this language with simple, step-by-step instructions leading the way. It is assumed that you have at least an intermediate knowledge of HTML, some knowledge of MySQL, and access to a

  18. SFESA: a web server for pairwise alignment refinement by secondary structure shifts.

    Science.gov (United States)

    Tong, Jing; Pei, Jimin; Grishin, Nick V

    2015-09-03

    Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.

  19. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  20. Some researches on converting a C++ software to java

    International Nuclear Information System (INIS)

    Ding Yuzheng; Wang Taijie; Dai Guiliang

    1997-01-01

    Because of Java's flexibility, portability, and relative simplicity, Java programming language has sparked considerable interest among software developers. The author presents the experience on converting a C++ off-line software prototype to Java. Some benefits of Java while converting the C++ prototype to Java and also some limitations of Java are described. Some of these limitations arise from the differences between Java and C++, Others are due to weakness of Java itself. The article also introduces some methods to work around Java's limitations

  1. NINJA: Java for High Performance Numerical Computing

    Directory of Open Access Journals (Sweden)

    José E. Moreira

    2002-01-01

    Full Text Available When Java was first introduced, there was a perception that its many benefits came at a significant performance cost. In the particularly performance-sensitive field of numerical computing, initial measurements indicated a hundred-fold performance disadvantage between Java and more established languages such as Fortran and C. Although much progress has been made, and Java now can be competitive with C/C++ in many important situations, significant performance challenges remain. Existing Java virtual machines are not yet capable of performing the advanced loop transformations and automatic parallelization that are now common in state-of-the-art Fortran compilers. Java also has difficulties in implementing complex arithmetic efficiently. These performance deficiencies can be attacked with a combination of class libraries (packages, in Java that implement truly multidimensional arrays and complex numbers, and new compiler techniques that exploit the properties of these class libraries to enable other, more conventional, optimizations. Two compiler techniques, versioning and semantic expansion, can be leveraged to allow fully automatic optimization and parallelization of Java code. Our measurements with the NINJA prototype Java environment show that Java can be competitive in performance with highly optimized and tuned Fortran code.

  2. APINetworks Java. A Java approach to the efficient treatment of large-scale complex networks

    Science.gov (United States)

    Muñoz-Caro, Camelia; Niño, Alfonso; Reyes, Sebastián; Castillo, Miriam

    2016-10-01

    We present a new version of the core structural package of our Application Programming Interface, APINetworks, for the treatment of complex networks in arbitrary computational environments. The new version is written in Java and presents several advantages over the previous C++ version: the portability of the Java code, the easiness of object-oriented design implementations, and the simplicity of memory management. In addition, some additional data structures are introduced for storing the sets of nodes and edges. Also, by resorting to the different garbage collectors currently available in the JVM the Java version is much more efficient than the C++ one with respect to memory management. In particular, the G1 collector is the most efficient one because of the parallel execution of G1 and the Java application. Using G1, APINetworks Java outperforms the C++ version and the well-known NetworkX and JGraphT packages in the building and BFS traversal of linear and complete networks. The better memory management of the present version allows for the modeling of much larger networks.

  3. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    Science.gov (United States)

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  4. GuiTope: an application for mapping random-sequence peptides to protein sequences

    Directory of Open Access Journals (Sweden)

    Halperin Rebecca F

    2012-01-01

    Full Text Available Abstract Background Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. Results GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. Conclusions GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  5. A Model for Java with Wildcards

    DEFF Research Database (Denmark)

    Cameron, Nicholas R.; Drossopoulou, Sophia; Ernst, Erik

    2008-01-01

    Wildcards are a complex and subtle part of the Java type system, present since version 5.0. Although there have been various formalisations and partial type soundness results concerning wildcards, to the best of our knowledge, no system that includes all the key aspects of Java wildcards has been...... proven type sound. This paper establishes that Java wildcards are type sound. We describe a new formal model based on explicit existential types whose pack and unpack operations are handled implicitly, and prove it type sound. Moreover, we specify a translation from a subset of Java to our formal model......, and discuss how several interesting aspects of the Java type system are handled....

  6. Model Checker for Java Programs

    Science.gov (United States)

    Visser, Willem

    2007-01-01

    Java Pathfinder (JPF) is a verification and testing environment for Java that integrates model checking, program analysis, and testing. JPF consists of a custom-made Java Virtual Machine (JVM) that interprets bytecode, combined with a search interface to allow the complete behavior of a Java program to be analyzed, including interleavings of concurrent programs. JPF is implemented in Java, and its architecture is highly modular to support rapid prototyping of new features. JPF is an explicit-state model checker, because it enumerates all visited states and, therefore, suffers from the state-explosion problem inherent in analyzing large programs. It is suited to analyzing programs less than 10kLOC, but has been successfully applied to finding errors in concurrent programs up to 100kLOC. When an error is found, a trace from the initial state to the error is produced to guide the debugging. JPF works at the bytecode level, meaning that all of Java can be model-checked. By default, the software checks for all runtime errors (uncaught exceptions), assertions violations (supports Java s assert), and deadlocks. JPF uses garbage collection and symmetry reductions of the heap during model checking to reduce state-explosion, as well as dynamic partial order reductions to lower the number of interleavings analyzed. JPF is capable of symbolic execution of Java programs, including symbolic execution of complex data such as linked lists and trees. JPF is extensible as it allows for the creation of listeners that can subscribe to events during searches. The creation of dedicated code to be executed in place of regular classes is supported and allows users to easily handle native calls and to improve the efficiency of the analysis.

  7. Java online monitoring framework

    International Nuclear Information System (INIS)

    Ronan, M.; Kirkby, D.; Johnson, A.S.; Groot, D. de

    1997-10-01

    An online monitoring framework has been written in the Java Language Environment to develop applications for monitoring special purpose detectors during commissioning of the PEP-II Interaction Region. PEP-II machine parameters and signals from several of the commissioning detectors are logged through VxWorks/EPICS and displayed by Java display applications. Remote clients are able to monitor the machine and detector performance using graphical displays and analysis histogram packages. In this paper, the design and implementation of the object-oriented Java framework is described. Illustrations of data acquisition, display and histograming applications are also given

  8. A troglomorphic spider from Java (Araneae, Ctenidae, Amauropelma)

    Science.gov (United States)

    Miller, Jeremy; Rahmadi, Cahyo

    2012-01-01

    Abstract A new troglomorphic spider from caves in Central Java, Indonesia, is described and placed in the ctenid genus Amauropelma Raven, Stumkat & Gray, until now containing only species from Queensland, Australia. Only juveniles and mature females of the new species are known. We give our reasons for placing the new species in Amauropelma, discuss conflicting characters, and make predictions about the morphology of the as yet undiscovered male that will test our taxonomic hypothesis. The description includes DNA barcode sequence data. PMID:22303127

  9. Long Read Alignment with Parallel MapReduce Cloud Platform

    Directory of Open Access Journals (Sweden)

    Ahmed Abdulhakim Al-Absi

    2015-01-01

    Full Text Available Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner’s Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.

  10. Graph Transforming Java Data

    NARCIS (Netherlands)

    de Mol, M.J.; Rensink, Arend; Hunt, James J.

    This paper introduces an approach for adding graph transformation-based functionality to existing JAVA programs. The approach relies on a set of annotations to identify the intended graph structure, as well as on user methods to manipulate that structure, within the user’s own JAVA class

  11. Der Weg zum Java-Profi Konzepte und Techniken für die professionelle Java-Entwicklung

    CERN Document Server

    Inden, Michael

    2015-01-01

    Sie haben bereits Einiges an Erfahrung mit Java und möchten Ihre Entwicklungstätigkeit nun professionalisieren? Oder sind Sie schon auf dem Weg zum Profi, benötigen aber ein Nachschlagewerk, das Ihnen die wichtigen Themen aus der Java-Welt kompakt und kompetent vermittelt? Dieses Buch bietet eine umfassende Einführung in die professionelle Entwicklung und vermittelt Ihnen das notwendige Wissen, um stabile und erweiterbare Softwaresysteme auf Java-SE-Basis zu bauen. Praxisnahe Beispiele helfen dabei, das Gelernte rasch umzusetzen. Neben der Praxis wird viel Wert auf das Verständnis zugrunde liegender Konzepte gelegt. Dabei kommen dem Autor Michael Inden seine umfangreichen Schulungs- und Entwicklererfahrungen zugute - und Ihnen als Leser damit ebenso. Diese Neuauflage wurde durchgehend überarbeitet, aktualisiert und erweitert. Natürlich darf das aktuelle Java 8 nicht fehlen. Verschiedene Kapitel sind Java 8 und seinen Neuerungen gewidmet. Dort wird ein fundierter Einstieg in die umfangreichen Erweit...

  12. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  13. Java EE 7 the big picture

    CERN Document Server

    Coward, Danny

    2015-01-01

    Java EE 7: The Big Picture uniquely explores the entire Java EE 7 platform in an all-encompassing style while examining each tier of the platform in enough detail so that you can select the right technologies for specific project needs. In this authoritative guide, Java expert Danny Coward walks you through the code, applications, and frameworks that power the platform. Take full advantage of the robust capabilities of Java EE 7, increase your productivity, and meet enterprise demands with help from this Oracle Press resource.

  14. Java EE 7 development with WildFly

    CERN Document Server

    Ćmil, Michał; Marchioni, Francesco

    2014-01-01

    If you are a Java developer who wants to learn about Java EE, this is the book for you. It's also ideal for developers who already have experience with the Java EE platform but would like to learn more about the new Java EE 7 features by analyzing fully functional sample applications using the new application server WildFly.

  15. MUMmer4: A fast and versatile genome alignment system.

    Directory of Open Access Journals (Sweden)

    Guillaume Marçais

    2018-01-01

    Full Text Available The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.

  16. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  17. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  18. Object oriented JavaScript

    CERN Document Server

    Stefanov, Stoyan

    2013-01-01

    You will first be introduced to object-oriented programming, then to the basics of objects in JavaScript. This book takes a do-it-yourself approach when it comes to writing code, because the best way to really learn a programming language is by writing code. You are encouraged to type code into Firebug's console, see how it works and then tweak it and play around with it. There are practice questions at the end of each chapter to help you review what you have learned.For new to intermediate JavaScript developer who wants to prepare themselves for web development problems solved by smart JavaSc

  19. JavaFX2 a Swing

    OpenAIRE

    Čapek, Ondřej

    2011-01-01

    The bachelor thesis deals mainly with the JavaFX2 library and its innovations. An aim is to acquaint readers with wider possibilities and help them with a transition from Swing to JavaFX2. In the thesis there are gone into benefits JavaFX2 which is accompanied by a source code examples for an illustration. A part of the thesis is also an application which shows differences between Swing and JavaFX2 both a graphical view as well as a viewing of the source code. Most of the information containe...

  20. libgapmis: extending short-read alignments.

    Science.gov (United States)

    Alachiotis, Nikolaos; Berger, Simon; Flouri, Tomáš; Pissis, Solon P; Stamatakis, Alexandros

    2013-01-01

    A wide variety of short-read alignment programmes have been published recently to tackle the problem of mapping millions of short reads to a reference genome, focusing on different aspects of the procedure such as time and memory efficiency, sensitivity, and accuracy. These tools allow for a small number of mismatches in the alignment; however, their ability to allow for gaps varies greatly, with many performing poorly or not allowing them at all. The seed-and-extend strategy is applied in most short-read alignment programmes. After aligning a substring of the reference sequence against the high-quality prefix of a short read--the seed--an important problem is to find the best possible alignment between a substring of the reference sequence succeeding and the remaining suffix of low quality of the read--extend. The fact that the reads are rather short and that the gap occurrence frequency observed in various studies is rather low suggest that aligning (parts of) those reads with a single gap is in fact desirable. In this article, we present libgapmis, a library for extending pairwise short-read alignments. Apart from the standard CPU version, it includes ultrafast SSE- and GPU-based implementations. libgapmis is based on an algorithm computing a modified version of the traditional dynamic-programming matrix for sequence alignment. Extensive experimental results demonstrate that the functions of the CPU version provided in this library accelerate the computations by a factor of 20 compared to other programmes. The analogous SSE- and GPU-based implementations accelerate the computations by a factor of 6 and 11, respectively, compared to the CPU version. The library also provides the user the flexibility to split the read into fragments, based on the observed gap occurrence frequency and the length of the read, thereby allowing for a variable, but bounded, number of gaps in the alignment. We present libgapmis, a library for extending pairwise short-read alignments. We

  1. In silico site-directed mutagenesis informs species-specific predictions of chemical susceptibility derived from the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool

    Science.gov (United States)

    The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to address needs for rapid, cost effective methods of species extrapolation of chemical susceptibility. Specifically, the SeqAPASS tool compares the primary sequence (Level 1), functiona...

  2. Predicting Consensus Structures for RNA Alignments Via Pseudo-Energy Minimization

    Directory of Open Access Journals (Sweden)

    Junilda Spirollari

    2009-01-01

    Full Text Available Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http:// datalab.njit.edu/biology/RSpredict.

  3. The definitive guide to Java Swing

    CERN Document Server

    Zukowski, John

    2005-01-01

    Updated for the 1.5 edition of the Java 2 Platform, this third edition is a one-stop resource for serious Java developers. It shows the parts of Java Swing API used to create graphical user interfaces (GUI); and Model-View-Controller architecture that lies behind all Swing components; and customizing components for specific environments.

  4. Java simulations of embedded control systems.

    Science.gov (United States)

    Farias, Gonzalo; Cervin, Anton; Arzén, Karl-Erik; Dormido, Sebastián; Esquembre, Francisco

    2010-01-01

    This paper introduces a new Open Source Java library suited for the simulation of embedded control systems. The library is based on the ideas and architecture of TrueTime, a toolbox of Matlab devoted to this topic, and allows Java programmers to simulate the performance of control processes which run in a real time environment. Such simulations can improve considerably the learning and design of multitasking real-time systems. The choice of Java increases considerably the usability of our library, because many educators program already in this language. But also because the library can be easily used by Easy Java Simulations (EJS), a popular modeling and authoring tool that is increasingly used in the field of Control Education. EJS allows instructors, students, and researchers with less programming capabilities to create advanced interactive simulations in Java. The paper describes the ideas, implementation, and sample use of the new library both for pure Java programmers and for EJS users. The JTT library and some examples are online available on http://lab.dia.uned.es/jtt.

  5. JAVA based LCD Reconstruction and Analysis Tools

    International Nuclear Information System (INIS)

    Bower, G.

    2004-01-01

    We summarize the current status and future developments of the North American Group's Java-based system for studying physics and detector design issues at a linear collider. The system is built around Java Analysis Studio (JAS) an experiment-independent Java-based utility for data analysis. Although the system is an integrated package running in JAS, many parts of it are also standalone Java utilities

  6. Java based LCD reconstruction and analysis tools

    International Nuclear Information System (INIS)

    Bower, Gary; Cassell, Ron; Graf, Norman; Johnson, Tony; Ronan, Mike

    2001-01-01

    We summarize the current status and future developments of the North American Group's Java-based system for studying physics and detector design issues at a linear collider. The system is built around Java Analysis Studio (JAS) an experiment-independent Java-based utility for data analysis. Although the system is an integrated package running in JAS, many parts of it are also standalone Java utilities

  7. A new prosthetic alignment device to read and record prosthesis alignment data.

    Science.gov (United States)

    Pirouzi, Gholamhossein; Abu Osman, Noor Azuan; Ali, Sadeeq; Davoodi Makinejad, Majid

    2017-12-01

    Prosthetic alignment is an essential process to rehabilitate patients with amputations. This study presents, for the first time, an invented device to read and record prosthesis alignment data. The digital device consists of seven main parts: the trigger, internal shaft, shell, sensor adjustment button, digital display, sliding shell, and tip. The alignment data were read and recorded by the user or a computer to replicate prosthesis adjustment for future use or examine the sequence of changes in alignment and its effect on the posture of the patient. Alignment data were recorded at the anterior/posterior and medial/lateral positions for five patients. Results show the high level of confidence to record alignment data and replicate adjustments. Therefore, the device helps patients readjust their prosthesis by themselves, or prosthetists to perform adjustment for patients and analyze the effects of malalignment.

  8. Java to C: A Primer

    DEFF Research Database (Denmark)

    McDowell, Charlie; Villadsen, Jørgen

    This book is designed to be used as a quick introduction to C for programmers already familiar with Java. It is not a replacement for a reference book on C but is instead a supplement. For the programmer already familiar with Java, the typical book on C requires the reader to wade through many...... details of already-familiar material. In this book, we quickly present the main concepts needed to begin writing serious programs in C, highlighting the differences between C and Java....

  9. Study on Java Programming Education

    OpenAIRE

    太田, 信宏

    2009-01-01

    The purpose of this study is to consider the content and key points for inclusion in a Java programming course for beginners. The Java programming language has a variety of functions and has the largest application field of all such languages, containing many themes that are appropriate for any such programming course. The multifunctional and wide-ranging functions of Java, however, may actually act as a barrier to study for beginners. The core content of a programming class for beginners sho...

  10. A Profile for Safety Critical Java

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Søndergaard, Hans; Thomsen, Bent

    2007-01-01

    We propose a new, minimal specification for real-time Java for safety critical applications. The intention is to provide a profile that supports programming of applications that can be validated against safety critical standards such as DO-178B [15]. The proposed profile is in line with the Java...... specification request JSR-302: Safety Critical Java Technology, which is still under discussion. In contrast to the current direction of the expert group for the JSR-302 we do not subset the rather complex Real-Time Specification for Java (RTSJ). Nevertheless, our profile can be implemented on top of an RTSJ...

  11. Safety-Critical Java for Embedded Systems

    DEFF Research Database (Denmark)

    Rios Rivas, Juan Ricardo

    for Java aims at providing a reduced set of the Java programming language that can be used for systems that need to be certified at the highest levels of criticality. Safety-critical Java (SCJ) restricts how a developer can structure an application by providing a specific programming model...... and by restricting the set of methods and libraries that can be used. Furthermore, its memory model do not use a garbage-collected heap but scoped memories. In this thesis we examine the use of the SCJ specification through an implementation in a time-predictable, FPGA-based Java processor. The specification is now...

  12. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

    Science.gov (United States)

    Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

    2018-02-01

    Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

  13. HTML5 programming with JavaScript for dummies

    CERN Document Server

    Mueller, John Paul

    2013-01-01

    Web designers and programmers, add JavaScript to your HTML5 development toolkit without fear Modern websites are complex, and some of the most exciting features - things like geolocation, canvas, portability to mobile and more - require JavaScript to leverage what HTML5 can create. Don't know JavaScript? That's where HTML5 Programming with JavaScript For Dummies comes in. Rather than walking you through JavaScript as a programming language, it approaches JavaScript as a tool to help you enhance web pages. Helps web designers and programmers tap the full power of HT

  14. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  15. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

    Directory of Open Access Journals (Sweden)

    Kierzynka Michal

    2011-05-01

    Full Text Available Abstract Background Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. Results In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. Conclusions The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.

  16. Pathotypic characterization of Newcastle disease virus isolated from vaccinated chicken in West Java, Indonesia.

    Science.gov (United States)

    Putri, Dwi Desmiyeni; Handharyani, Ekowati; Soejoedono, Retno Damajanti; Setiyono, Agus; Mayasari, Ni Luh Putu Ika; Poetri, Okti Nadia

    2017-04-01

    This research was conducted to differentiate and characterize eight Newcastle disease virus (NDV) isolates collected from vaccinated chicken at commercial flocks in West Java, Indonesia, in 2011, 2014 and 2015 by pathotype specific primers. A total of eight NDV isolates collected from clinical outbreaks among commercial vaccinated flocks in West Java, Indonesia, in 2011, 2014, and 2015 were used in this study. Reverse transcription-polymerase chain reaction was used to detect and differentiate virulence of NDV strains, using three sets of primers targeting their M and F gene. First primers were universal primers to detect NDV targeting matrix (M) gene. Other two sets of primers were specific for the fusion (F) gene cleavage site sequence of virulent and avirulent NDV strains. Our results showed that three isolates belong to NDV virulent strains, and other five isolates belong to NDV avirulent strains. The nucleotide sequence of the F protein cleavage site showed 112 K/R-R-Q/R-K-R/G-F 117 on NDV virulent strains and 112 G-K/R-Q-G-R-L 117 on NDV avirulent strain. Result from the current study suggested that NDV virulent strain were circulating among vaccinated chickens in West Java, Indonesia; this might possess a risk of causing ND outbreaks and causing economic losses within the poultry industry.

  17. Writing Kurdish Alphabetics in Java Programming Language

    OpenAIRE

    Rebwar Mala Nabi; Sardasht M-Raouf Mahmood; Mohammed Qadir Kheder; Shadman Mahmood

    2016-01-01

    Nowadays, Kurdish programmers usually suffer when they need to write Kurdish letter while they program in java. More to say, all the versions of Java Development Kits have not supported Kurdish letters. Therefore, the aim of this study is to develop Java Kurdish Language Package (JKLP) for solving writing Kurdish alphabetic in Java programming language. So that Kurdish programmer and/or students they can converts the English-alphabetic to Kurdish-alphabetic. Furthermore, adding Kurdish langua...

  18. Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment?

    Directory of Open Access Journals (Sweden)

    Hartmann Stefanie

    2008-03-01

    Full Text Available Abstract Background While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood and studied ways to improve the accuracy of trees obtained from such datasets. Results We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. Conclusion These results demonstrate that partial gene

  19. Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?

    Science.gov (United States)

    Hartmann, Stefanie; Vision, Todd J

    2008-03-26

    While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood) and studied ways to improve the accuracy of trees obtained from such datasets. We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. These results demonstrate that partial gene sequences and gappy multiple sequence alignments can pose a

  20. Visualization program development using Java

    International Nuclear Information System (INIS)

    Sasaki, Akira; Suto, Keiko

    2002-03-01

    Method of visualization programs using Java for the PC with the graphical user interface (GUI) is discussed, and applied to the visualization and analysis of 1D and 2D data from experiments and numerical simulations. Based on an investigation of programming techniques such as drawing graphics and event driven program, example codes are provided in which GUI is implemented using the Abstract Window Toolkit (AWT). The marked advantage of Java comes from the inclusion of library routines for graphics and networking as its language specification, which enables ordinary scientific programmers to make interactive visualization a part of their simulation codes. Moreover, the Java programs are machine independent at the source level. Object oriented programming (OOP) methods used in Java programming will be useful for developing large scientific codes which includes number of modules with better maintenance ability. (author)

  1. Certifiable Java for Embedded Systems

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Dalsgaard, Andreas Engelbredt; Hansen, Rene Rydhof

    2014-01-01

    The Certifiable Java for Embedded Systems (CJ4ES) project aimed to develop a prototype development environment and platform for safety-critical software for embedded applications. There are three core constituents: A profile of the Java programming language that is tailored for safety......-critical applications, a predictable Java processor built with FPGA technology, and an Eclipse based application development environment that binds the profile and the platform together and provides analyses that help to provide evidence that can be used as part of a safety case. This paper summarizes key contributions...

  2. Embedding Java Types in CPN Tools

    DEFF Research Database (Denmark)

    Lassen, Kristian Bisgaard; Westergaard, Michael

    the modeller to call methods on Java ob jects. This paper is about how the stub code is generated, i.e., representing Java classes to Standard ML to be able to call Java code in the CPN models, and how the BRITNeY Suite framework handles the invocations of the stub code. The contribution of this paper is give......CPN Tools is a well known editor for Colored Petri nets (CPNs) that is capable of doing state space and performance analysis. The BRITNeY Suite has added yet another feature to CPN Tools for integrating CPN models with Java programs, by providing stubs accessible from the models, to allow...

  3. Interrupt Handlers in Java

    DEFF Research Database (Denmark)

    Korsholm, Stephan; Schoeberl, Martin; Ravn, Anders Peter

    2008-01-01

    An important part of implementing device drivers is to control the interrupt facilities of the hardware platform and to program interrupt handlers. Current methods for handling interrupts in Java use a server thread waiting for the VM to signal an interrupt occurrence. It means that the interrupt...... is handled at a later time, which has some disadvantages. We present constructs that allow interrupts to be handled directly and not at a later point decided by a scheduler. A desirable feature of our approach is that we do not require a native middleware layer but can handle interrupts entirely with Java...... code. We have implemented our approach using an interpreter and a Java processor, and give an example demonstrating its use....

  4. The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Gorodkin, Jan

    2005-01-01

    FOLDALIGN is a Sankoff-based algorithm for making structural alignments of RNA sequences. Here, we present a web server for making pairwise alignments between two RNA sequences, using the recently updated version of FOLDALIGN. The server can be used to scan two sequences for a common structural RNA...... motif of limited size, or the entire sequences can be aligned locally or globally. The web server offers a graphical interface, which makes it simple to make alignments and manually browse the results. the web server can be accessed at http://foldalign.kvl.dk...

  5. Java Foundation Classes in a Nutshell Desktop Quick Reference

    CERN Document Server

    Flanagan, David

    1999-01-01

    Java Foundation Classes in a Nutshell is an indispensable quick reference for Java programmers who are writing applications that use graphics or graphical user interfaces. The author of the bestsellingJava in a Nutshell has written fast-paced introductions to the Java APIs that comprise the Java Foundation Classes (JFC), such as the Swing GUI components and Java 2D, so that you can start using these exciting new technologies right away. This book also includes O'Reilly's classic-style, quick-reference material for all of the classes in the javax.swing and java.awt packages and their numerous

  6. Jess, the Java expert system shell

    Energy Technology Data Exchange (ETDEWEB)

    Friedman-Hill, E.J.

    1997-11-01

    This report describes Jess, a clone of the popular CLIPS expert system shell written entirely in Java. Jess supports the development of rule-based expert systems which can be tightly coupled to code written in the powerful, portable Java language. The syntax of the Jess language is discussed, and a comprehensive list of supported functions is presented. A guide to extending Jess by writing Java code is also included.

  7. Distributed, Embedded and Real-time Java Systems

    CERN Document Server

    Wellings, Andy

    2012-01-01

    Research on real-time Java technology has been prolific over the past decade, leading to a large number of corresponding hardware and software solutions, and frameworks for distributed and embedded real-time Java systems.  This book is aimed primarily at researchers in real-time embedded systems, particularly those who wish to understand the current state of the art in using Java in this domain.  Much of the work in real-time distributed, embedded and real-time Java has focused on the Real-time Specification for Java (RTSJ) as the underlying base technology, and consequently many of the Chapters in this book address issues with, or solve problems using, this framework. Describes innovative techniques in: scheduling, memory management, quality of service and communication systems supporting real-time Java applications; Includes coverage of multiprocessor embedded systems and parallel programming; Discusses state-of-the-art resource management for embedded systems, including Java’s real-time garbage collect...

  8. Bringing Interactivity to the Web: The JAVA Solution.

    Science.gov (United States)

    Knee, Richard H.; Cafolla, Ralph

    Java is an object-oriented programming language of the Internet. It's popularity lies in its ability to create interactive Web sites across platforms. The most common Java programs are applications and applets, which adhere to a set of conventions that lets them run within a Java-compatible browser. Java is becoming an essential subject matter and…

  9. Tool-supported Refactoring for JavaScript

    DEFF Research Database (Denmark)

    Feldthaus, Asger; Millstein, Todd; Møller, Anders

    2011-01-01

    Script because of its dynamic nature. We propose a framework for specifying and implementing JavaScript refactorings based on pointer analysis. We describe novel refactorings motivated by best practice recommendations for JavaScript programming, and demonstrate how they can be described concisely in terms...... of queries provided by our framework. Experiments performed with a prototype implementation on a suite of existing applications show that our approach is well-suited for developing practical refactoring tools for JavaScript.......Refactoring is a popular technique for improving the structure of existing programs while maintaining their behavior. For statically typed programming languages such as Java, a wide variety of refactorings have been described, and tool support for performing refactorings and ensuring...

  10. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

    Directory of Open Access Journals (Sweden)

    Charlotte Herzeel

    Full Text Available elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878, we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878, elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

  11. Java for dummies quick reference

    CERN Document Server

    Lowe, Doug

    2012-01-01

    A reference that answers your questions as you move through your coding The demand for Android programming and web apps continues to grow at an unprecedented pace and Java is the preferred language for both. Java For Dummies Quick Reference keeps you moving through your coding while you solve a problem, look up a command or syntax, or search for a programming tip. Whether you're a Java newbie or a seasoned user, this fast reference offers you quick access to solutions without requiring that you wade through pages of tutorial material. Leverages the true reference format that is organized with

  12. Learn Java for Android Development

    CERN Document Server

    Friesen, J

    2010-01-01

    Android development is hot, and many programmers are interested in joining the fun. However, because this technology is based on Java, you should first obtain a solid grasp of the Java language and its foundational APIs to improve your chances of succeeding as an Android app developer. After all, you will be busy learning the architecture of an Android app, the various Android-specific APIs, and Android-specific tools. If you do not already know Java fundamentals, you will probably end up with a massive headache from also having to quickly cram those fundamentals into your knowledge base. Lear

  13. DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules.

    Science.gov (United States)

    Eernisse, D J

    1992-04-01

    DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.

  14. T3i: A Tool for Generating and Querying Test Suites for Java

    NARCIS (Netherlands)

    Prasetya, I.S.W.B.

    2015-01-01

    T3i is an automated unit-testing tool to test Java classes. To expose interactions T3i generates test-cases in the form of sequences of calls to the methods of the target class. What separates it from other testing tools is that it treats test suites as first class objects and allows users to e.g.

  15. A Type Graph Model for Java Programs

    NARCIS (Netherlands)

    Rensink, Arend; Zambon, Eduardo

    2009-01-01

    In this report we present a type graph that models all executable constructs of the Java programming language. Such a model is useful for any graph-based technique that relies on a representation of Java programs as graphs. The model can be regarded as a common representation to which all Java

  16. Embedded Java security security for mobile devices

    CERN Document Server

    Debbabi, Mourad; Talhi, Chamseddine

    2007-01-01

    Java brings more functionality and versatility to the world of mobile devices, but it also introduces new security threats. This book contains a presentation of embedded Java security and presents the main components of embedded Java. It gives an idea of the platform architecture and is useful for researchers and practitioners.

  17. Infernal 1.0: inference of RNA alignments

    OpenAIRE

    Nawrocki, Eric P.; Kolbe, Diana L.; Eddy, Sean R.

    2009-01-01

    Summary: infernal builds consensus RNA secondary structure profiles called covariance models (CMs), and uses them to search nucleic acid sequence databases for homologous RNAs, or to create new sequence- and structure-based multiple sequence alignments.

  18. Improving your target-template alignment with MODalign.

    KAUST Repository

    Barbato, Alessandro; Benkert, Pascal; Schwede, Torsten; Tramontano, Anna; Kosinski, Jan

    2012-01-01

    , upon modification of the target-template alignment, updates the multiple sequence alignments of the two protein families, their conservation score, secondary structure and solvent accessibility values, and local quality scores of the implied three

  19. A Type Graph Model for Java Programs

    NARCIS (Netherlands)

    Rensink, Arend; Zambon, Eduardo; Lee, D.; Lopes, A.; Poetzsch-Heffter, A.

    2009-01-01

    In this work we present a type graph that models all executable constructs of the Java programming language. Such a model is useful for any graph-based technique that relies on a representation of Java programs as graphs. The model can be regarded as a common representation to which all Java syntax

  20. Efficient Approximate JavaScript Call Graph Construction

    NARCIS (Netherlands)

    S. Benschop

    2014-01-01

    htmlabstractJavaScript has seen an increase in popularity in the last few years, both in the browser as well as on other platforms such as Node.js. However, the tools to help developers reason about JavaScript code remain fairly barebone in comparison with tooling for static languages such as Java.

  1. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

    Directory of Open Access Journals (Sweden)

    Xin Yi Ng

    2015-01-01

    Full Text Available This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM- LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity.

  2. Declarative Programming in Java

    Directory of Open Access Journals (Sweden)

    Razvan DINA

    2014-03-01

    Full Text Available Despite the code is rarely self-explanatory, the imperative programming languages are the most commonly used in our days by the programmers all over the world and Java is definitely the lead language in popularity. This paper tries to conclude if there are any chances to use the most popular programming language of the moment in a declarative manner, even if Java itself is an intrinsic imperative language.

  3. Mastering JavaScript design patterns

    CERN Document Server

    Timms, Simon

    2014-01-01

    If you are a developer interested in creating easily maintainable applications that can grow and change with your needs, then this book is for you. Some experience with JavaScript (not necessarily with entire applications written in JavaScript) is required to follow the examples written in the book.

  4. Application of Java technology in radiation image processing

    International Nuclear Information System (INIS)

    Cheng Weifeng; Li Zheng; Chen Zhiqiang; Zhang Li; Gao Wenhuan

    2002-01-01

    The acquisition and processing of radiation image plays an important role in modern application of civil nuclear technology. The author analyzes the rationale of Java image processing technology which includes Java AWT, Java 2D and JAI. In order to demonstrate applicability of Java technology in field of image processing, examples of application of JAI technology in processing of radiation images of large container have been given

  5. Java in a Nutshell a Desktop Quick Reference

    CERN Document Server

    Flanagan, David

    2005-01-01

    With more than 700,000 copies sold to date, Java ina Nutshellfrom O'Reilly is clearly the favorite resource amongst the legion ofdevelopers and programmers using Java technology. And now, with therelease of the 5.0 version of Java, O'Reilly has given the book thatdefined the "in a Nutshell" category another impressive tune-up. In this latest revision, readers will find Java in aNutshell,5th Edition, does more than just cover the extensive changes implicit in5.0, the newest version of Java. It's undergone a complete makeover--inscope, size, and type of coverage--in order to more closely meet

  6. Java EE 7 recipes a problem-solution approach

    CERN Document Server

    Juneau, Josh

    2013-01-01

    Java EE 7 Recipes takes an example-based approach in showing how to program Enterprise Java applications in many different scenarios. Be it a small-business web application, or an enterprise database application, Java EE 7 Recipes provides effective and proven solutions to accomplish just about any task that you may encounter. You can feel confident using the reliable solutions that are demonstrated in this book in your personal or corporate environment. The solutions in Java EE 7 Recipes are built using the most current Java Enterprise specifications, including EJB 3.2, JSF 2.2, Expression La

  7. Geothermal and volcanism in west Java

    Science.gov (United States)

    Setiawan, I.; Indarto, S.; Sudarsono; Fauzi I, A.; Yuliyanti, A.; Lintjewas, L.; Alkausar, A.; Jakah

    2018-02-01

    Indonesian active volcanoes extend from Sumatra, Jawa, Bali, Lombok, Flores, North Sulawesi, and Halmahera. The volcanic arc hosts 276 volcanoes with 29 GWe of geothermal resources. Considering a wide distribution of geothermal potency, geothermal research is very important to be carried out especially to tackle high energy demand in Indonesia as an alternative energy sources aside from fossil fuel. Geothermal potency associated with volcanoes-hosted in West Java can be found in the West Java segment of Sunda Arc that is parallel with the subduction. The subduction of Indo-Australian oceanic plate beneath the Eurasian continental plate results in various volcanic products in a wide range of geochemical and mineralogical characteristics. The geochemical and mineralogical characteristics of volcanic and magmatic rocks associated with geothermal systems are ill-defined. Comprehensive study of geochemical signatures, mineralogical properties, and isotopes analysis might lead to the understanding of how large geothermal fields are found in West Java compared to ones in Central and East Java. The result can also provoke some valuable impacts on Java tectonic evolution and can suggest the key information for geothermal exploration enhancement.

  8. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

    Science.gov (United States)

    Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

    2008-09-01

    A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.

  9. Java EE 7 development with NetBeans 8

    CERN Document Server

    Heffelfinger, David R

    2015-01-01

    The book is aimed at Java developers who wish to develop Java EE applications while taking advantage of NetBeans functionality to automate repetitive tasks. Familiarity with NetBeans or Java EE is not assumed.

  10. A modification of Java virtual machine for counting bytecode commands

    OpenAIRE

    Nikolaj, Janko

    2014-01-01

    The objective of the thesis was to implement or modify an existing Java virtual machine (JVM) in a way that it will allow insight into statistics of the executed Java instructions of an executed user program. The functionality will allow analysis of the algorithms in Java environment. After studying the theory of Java and Java virtual machine, we decided to modify an existing Java virtual machine. We chose JamVM which is a lightweight, open-source Java virtual machine under GNU license. The i...

  11. Severe Outbreak of a Yellow Mosaic Disease on the Yard Long Bean in Bogor, West Java

    Directory of Open Access Journals (Sweden)

    TRI ASMIRA DAMAYANTI

    2009-06-01

    Full Text Available During 2008 crop season, an outbreak of severe yellow mosaic disease on yard long bean (Vigna unguiculata subsp. Sesquipedalis occurred in several farmers’ fields in West Java. Yard long bean var. Parade inoculated manually with extracts from symptomatic leaves showed the symptoms indicating the presence of virus. Symptomatic leaf samples tested positive in enzyme linked immunosorbent assay (ELISA with antibodies to group specific Potyvirus and Cucumber mosaic virus (CMV. Total RNA derived from symptomatic leaves was subjected to reverse transcription-polymerase chain reaction (RT-PCR using primers specific to the cylindrical inclusion (CI protein of potyviruses and CMV coat protein (CP specific primers. Pair wise comparison of sequences obtained from cloned RT-PCR products with corresponding nucleotide sequences in the GenBank confirmed the presence of Bean common mosaic virus strain Blackeye (BCMV-BlC and CMV in the symptomatic beans. Sequences of BCMV and CMV isolates from the beans showed maximum nucleotide sequence identities (92-97% and (90%, respectively with BCMV-BIC and CMV isolates from Taiwan. Each virus isolate also clustered closely with corresponding isolates from Taiwan in a phylogenetic analyses. These results provide first evidence of the occurrence of multiple infection of BCMV-BIC and CMV in the yard long been from Bogor, West Java.

  12. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

    Science.gov (United States)

    Alcantara, Luiz Carlos Junior; Cassol, Sharon; Libin, Pieter; Deforche, Koen; Pybus, Oliver G; Van Ranst, Marc; Galvão-Castro, Bernardo; Vandamme, Anne-Mieke; de Oliveira, Tulio

    2009-07-01

    Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.

  13. JAVA PathFinder

    Science.gov (United States)

    Mehhtz, Peter

    2005-01-01

    JPF is an explicit state software model checker for Java bytecode. Today, JPF is a swiss army knife for all sort of runtime based verification purposes. This basically means JPF is a Java virtual machine that executes your program not just once (like a normal VM), but theoretically in all possible ways, checking for property violations like deadlocks or unhandled exceptions along all potential execution paths. If it finds an error, JPF reports the whole execution that leads to it. Unlike a normal debugger, JPF keeps track of every step how it got to the defect.

  14. Java Power Tools

    CERN Document Server

    Smart, John

    2008-01-01

    All true craftsmen need the best tools to do their finest work, and programmers are no different. Java Power Tools delivers 30 open source tools designed to improve the development practices of Java developers in any size team or organization. Each chapter includes a series of short articles about one particular tool -- whether it's for build systems, version control, or other aspects of the development process -- giving you the equivalent of 30 short reference books in one package. No matter which development method your team chooses, whether it's Agile, RUP, XP, SCRUM, or one of many other

  15. Java I/O

    CERN Document Server

    Harold, Elliotte Rusty

    2006-01-01

    All of Java's Input/Output (I/O) facilities are based on streams, which provide simple ways to read and write data of different types. Java provides many different kinds of streams, each with its own application. The universe of streams is divided into four largecategories: input streams and output streams, for reading and writing binary data; and readers and writers, for reading and writing textual (character) data. You're almost certainly familiar with the basic kinds of streams--but did you know that there's a CipherInputStream for reading encrypted data? And a ZipOutputStream for automati

  16. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified...

  17. Learn Objective-C for Java Developers

    CERN Document Server

    Bucanek, James

    2009-01-01

    Learn Objective-C for Java Developers will guide experienced Java developers into the world of Objective-C. It will show them how to take their existing language knowledge and design patterns and transfer that experience to Objective-C and the Cocoa runtime library. This is the express train to productivity for every Java developer who dreamt of developing for Mac OS X or iPhone, but felt that Objective-C was too intimidating. So hop on and enjoy the ride!

  18. A safety-critical java technology compatibility kit

    DEFF Research Database (Denmark)

    Søndergaard, Hans; Korsholm, Stephan E.; Ravn, Anders Peter

    2014-01-01

    In order to claim conformance with a given Java Specification Request (JSR), a Java implementation has to pass all tests in an associated Technology Compatibility Kit (TCK). This paper presents development of test cases and tools for the draft Safety-Critical Java (SCJ) specification. In previous...... work we have shown how the Java Modeling Language (JML) is applied to specify conformance constraints for SCJ, and how JML-related tools may assist in generating and executing tests. Here we extend this work with a layout for concrete test cases including checking of results in a simplified version...

  19. JavaScript & jQuery The Missing Manual

    CERN Document Server

    McFarland, David

    2011-01-01

    JavaScript lets you supercharge your HTML with animation, interactivity, and visual effects-but many web designers find the language hard to learn. This jargon-free guide covers JavaScript basics and shows you how to save time and effort with the jQuery library of prewritten JavaScript code. You'll soon be building web pages that feel and act like desktop programs, without having to do much programming. The important stuff you need to know: Make your pages interactive. Create JavaScript events that react to visitor actions.Use animations and effects. Build drop-down navigation menus, pop-ups

  20. A predictable Java profile - rationale and implementations

    DEFF Research Database (Denmark)

    Søndergaard, Hans; Bøgholm, Thomas; Hansen, Rene Rydhof

    A Java profile suitable for development of high integrity embedded systems is presented. It is based on event handlers which are grouped in missions and equipped with respectively private handler memory and shared mission memory. This is a result of our previous work on developing a Java profile......, and is directly inspired by interactions with the Open Group on their on-going work on a safety critical Java profile (JSR-302). The main contribution is an arrangement of the class hierarchy such that the proposal is a generalization of Real-Time Specification for Java (RTSJ). A further contribution...

  1. Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot.

    Science.gov (United States)

    Loh, Yong-Hwee Eddie; Shen, Li

    2016-01-01

    The continual maturation and increasing applications of next-generation sequencing technology in scientific research have yielded ever-increasing amounts of data that need to be effectively and efficiently analyzed and innovatively mined for new biological insights. We have developed ngs.plot-a quick and easy-to-use bioinformatics tool that performs visualizations of the spatial relationships between sequencing alignment enrichment and specific genomic features or regions. More importantly, ngs.plot is customizable beyond the use of standard genomic feature databases to allow the analysis and visualization of user-specified regions of interest generated by the user's own hypotheses. In this protocol, we demonstrate and explain the use of ngs.plot using command line executions, as well as a web-based workflow on the Galaxy framework. We replicate the underlying commands used in the analysis of a true biological dataset that we had reported and published earlier and demonstrate how ngs.plot can easily generate publication-ready figures. With ngs.plot, users would be able to efficiently and innovatively mine their own datasets without having to be involved in the technical aspects of sequence coverage calculations and genomic databases.

  2. Designing Abstractions for JavaScript Program Analysis

    DEFF Research Database (Denmark)

    Andreasen, Esben Sparre

    JavaScript is a widely used dynamic programming language. What started out as a client-side scripting language for browsers, is now used for large applications in many different settings. As for other dynamic languages, JavaScript makes it easy to write programs quickly without being constrained...... by the language, and programmers exploit that power to write highly dynamic programs. Automated tools for helping programmers and optimizing programs are used successfully for many programming languages. Unfortunately, the automated tools for JavaScript are not as good as for other programming languages....... The program analyses, that the automated tools are built upon, are poorly suited to deal with the highly dynamic nature of JavaScript programs. The lack of language restrictions on the programmer are detrimental to the quality of program analyses for JavaScript. The aim of this dissertation is to address...

  3. Formalising Java safety -- An overview

    NARCIS (Netherlands)

    Hartel, Pieter H.; Domingo-Ferrer, J; Chan, D.; Watson, A.

    We review the existing literature on Java safety, emphasizing formal approaches, and the impact of Java safety on small footprint devices such as smart cards. The conclusion is that while a lot of good work has been done, a more concerted effort is needed to build a coherent set of machine readable

  4. Trichosanthes L. (Cucurbitaceae) in Java

    NARCIS (Netherlands)

    Wilde, de Rugayah; Wilde, de W.J.J.O.

    1997-01-01

    As compared with the treatment in the Flora of Java (Backer in Backer & Bakhuizen van den Brink, 1963) with 8 species, a recent review of the genus Trichosanthes in Java resulted in the acceptance of 10 species for this island. Important changes are: the name T. trifolia has to be replaced by a

  5. High-Performance Java Codes for Computational Fluid Dynamics

    Science.gov (United States)

    Riley, Christopher; Chatterjee, Siddhartha; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The computational science community is reluctant to write large-scale computationally -intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of Computational Fluid Dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance.

  6. Efficient alignment-free DNA barcode analytics.

    Science.gov (United States)

    Kuksa, Pavel; Pavlovic, Vladimir

    2009-11-10

    In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding.

  7. SVM-dependent pairwise HMM: an application to protein pairwise alignments.

    Science.gov (United States)

    Orlando, Gabriele; Raimondi, Daniele; Khan, Taushif; Lenaerts, Tom; Vranken, Wim F

    2017-12-15

    Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. wim.vranken@vub.be. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  8. JPP: A Java Pre-Processor

    OpenAIRE

    Kiniry, Joseph R.; Cheong, Elaine

    1998-01-01

    The Java Pre-Processor, or JPP for short, is a parsing pre-processor for the Java programming language. Unlike its namesake (the C/C++ Pre-Processor, cpp), JPP provides functionality above and beyond simple textual substitution. JPP's capabilities include code beautification, code standard conformance checking, class and interface specification and testing, and documentation generation.

  9. JavaScript domain-driven design

    CERN Document Server

    Fehre, Philipp

    2015-01-01

    If you are an experienced JavaScript developer who wants to improve the design of his or her applications, or find yourself in a situation to implement an application in an unfamiliar domain, this book is for you. Prior knowledge of JavaScript is required and prior experience with Node.js will also be helpful.

  10. SWAMP+: multiple subsequence alignment using associative massive parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  11. Data for amino acid alignment of Japanese stingray melanocortin receptors with other gnathostome melanocortin receptor sequences, and the ligand selectivity of Japanese stingray melanocortin receptors

    Directory of Open Access Journals (Sweden)

    Akiyoshi Takahashi

    2016-06-01

    Full Text Available This article contains structure and pharmacological characteristics of melanocortin receptors (MCRs related to research published in “Characterization of melanocortin receptors from stingray Dasyatis akajei, a cartilaginous fish” (Takahashi et al., 2016 [1]. The amino acid sequences of the stingray, D. akajei, MC1R, MC2R, MC3R, MC4R, and MC5R were aligned with the corresponding melanocortin receptor sequences from the elephant shark, Callorhinchus milii, the dogfish, Squalus acanthias, the goldfish, Carassius auratus, and the mouse, Mus musculus. These alignments provide the basis for phylogenetic analysis of these gnathostome melanocortin receptor sequences. In addition, the Japanese stingray melanocortin receptors were separately expressed in Chinese Hamster Ovary cells, and stimulated with stingray ACTH, α-MSH, β-MSH, γ-MSH, δ-MSH, and β-endorphin. The dose response curves reveal the order of ligand selectivity for each stingray MCR.

  12. THE NATURE, THE BEAUTY AND THE DIFFICULTY IN JAVA PROGRAMMING

    Directory of Open Access Journals (Sweden)

    Dror BENAMI

    2016-12-01

    Full Text Available JAVA language in recent years is widely used for the reason that integrates multiple information technologies. JAVA benefits are not fully exploited. The article discusses some aspects of the design of Data Mining algorithms in Java.JAVA: NATURA, FRUMUSEŢEA ŞI DIFICULTĂTILE PROGRAMĂRIILimbajul JAVA în ultimii ani se utilizează pe scară largă dat fiind că integrează mai multe tehnologii informaţionale. Avantajele JAVA nu sunt pe deplin exploatate. În articol sunt discutate unele aspecte de proiectare a algoritmilor de Data Mining în limbajul JAVA.

  13. The ENSDF Java Package

    International Nuclear Information System (INIS)

    Sonzogni, A.A.

    2005-01-01

    A package of computer codes has been developed to process and display nuclear structure and decay data stored in the ENSDF (Evaluated Nuclear Structure Data File) library. The codes were written in an object-oriented fashion using the java language. This allows for an easy implementation across multiple platforms as well as deployment on web pages. The structure of the different java classes that make up the package is discussed as well as several different implementations

  14. Alignment and prediction of cis-regulatory modules based on a probabilistic model of evolution.

    Directory of Open Access Journals (Sweden)

    Xin He

    2009-03-01

    Full Text Available Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii binding sites in distal bound sequences (relative to transcription start sites tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis, ready to be applied in a broad biological context.

  15. An Evaluation of Java for Numerical Computing

    Directory of Open Access Journals (Sweden)

    Brian Blount

    1999-01-01

    Full Text Available This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of object‐oriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemented JLAPACK, a subset of the LAPACK library in Java. LAPACK is a high‐performance Fortran 77 library used to solve common linear algebra problems. JLAPACK is an object‐oriented library, using encapsulation, inheritance, and exception handling. It performs within a factor of four of the optimized Fortran version for certain platforms and test cases. When used with the native BLAS library, JLAPACK performs comparably with the Fortran version using the native BLAS library. We conclude that high‐performance numerical software could be written in Java if a handful of concerns about language features and compilation strategies are adequately addressed.

  16. JLAPACK – Compiling LAPACK FORTRAN to Java

    Directory of Open Access Journals (Sweden)

    David M. Doolin

    1999-01-01

    Full Text Available The JLAPACK project provides the LAPACK numerical subroutines translated from their subset Fortran 77 source into class files, executable by the Java Virtual Machine (JVM and suitable for use by Java programmers. This makes it possible for Java applications or applets, distributed on the World Wide Web (WWW to use established legacy numerical code that was originally written in Fortran. The translation is accomplished using a special purpose Fortran‐to‐Java (source‐to‐source compiler. The LAPACK API will be considerably simplified to take advantage of Java’s object‐oriented design. This report describes the research issues involved in the JLAPACK project, and its current implementation and status.

  17. Natural language processing with Java

    CERN Document Server

    Reese, Richard M

    2015-01-01

    If you are a Java programmer who wants to learn about the fundamental tasks underlying natural language processing, this book is for you. You will be able to identify and use NLP tasks for many common problems, and integrate them in your applications to solve more difficult problems. Readers should be familiar/experienced with Java software development.

  18. A Hardware Abstraction Layer in Java

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Korsholm, Stephan; Kalibera, Tomas

    2011-01-01

    Embedded systems use specialized hardware devices to interact with their environment, and since they have to be dependable, it is attractive to use a modern, type-safe programming language like Java to develop programs for them. Standard Java, as a platform-independent language, delegates access...... to devices, direct memory access, and interrupt handling to some underlying operating system or kernel, but in the embedded systems domain resources are scarce and a Java Virtual Machine (JVM) without an underlying middleware is an attractive architecture. The contribution of this article is a proposal...... for Java packages with hardware objects and interrupt handlers that interface to such a JVM. We provide implementations of the proposal directly in hardware, as extensions of standard interpreters, and finally with an operating system middleware. The latter solution is mainly seen as a migration path...

  19. Java Card for PayTv Application

    OpenAIRE

    Dutta, Pallab

    2013-01-01

    Smart cards are widely used along with PayTV receivers to store secret user keys and to perform security functions to prevent any unauthorized viewing of PayTV channels. Java Card technology enables programs written in the Java programming language to run on smart cards. Smart cards represent one of the smallest computing platforms in use today. The memory configuration of a smart card are of the order of 4K of RAM, 72K of EEPROM, and 24K of ROM. Using Java card provides advantages to the ind...

  20. Efficient Incremental Checkpointing of Java Programs

    DEFF Research Database (Denmark)

    Lawall, Julia Laetitia; Muller, Gilles

    2000-01-01

    This paper investigates the optimization of language-level checkpointing of Java programs. First, we describe how to systematically associate incremental checkpoints with Java classes. While being safe, the genericness of this solution induces substantial execution overhead. Second, to solve...

  1. Java EE 7 with GlassFish 4 Application Server

    CERN Document Server

    Heffelfinger, David R

    2014-01-01

    This book is a practical guide and follows a very user-friendly approach. The book aims to get the reader up to speed in Java EE 7 development. All major Java EE 7 APIs and the details of the GlassFish 4 server are covered followed by examples of their use.If you are a Java developers who wants to become proficient with Java EE 7 this book is ideal for you. Readers are expected to have some experience with Java and to have developed and deployed applications in the past, but don't need any previous knowledge of Java EE or J2EE. It teaches the reader how to use GlassFish 4 to develop and deploy

  2. JavaScript Cookbook

    CERN Document Server

    Powers, Shelley

    2010-01-01

    Why reinvent the wheel every time you run into a problem with JavaScript? This cookbook is chock-full of code recipes that address common programming tasks, as well as techniques for building web apps that work in any browser. Just copy and paste the code samples into your project -- you'll get the job done faster and learn more about JavaScript in the process. You'll also learn how to take advantage of the latest features in ECMAScript 5 and HTML5, including the new cross-domain widget communication technique, HTML5's video and audio elements, and the drawing canvas. You'll find recipes for

  3. OpenCV 3.0 computer vision with Java

    CERN Document Server

    Baggio, Daniel Lélis

    2015-01-01

    If you are a Java developer, student, researcher, or hobbyist wanting to create computer vision applications in Java then this book is for you. If you are an experienced C/C++ developer who is used to working with OpenCV, you will also find this book very useful for migrating your applications to Java. All you need is basic knowledge of Java, with no prior understanding of computer vision required, as this book will give you clear explanations and examples of the basics.

  4. BFAST: an alignment tool for large scale genome resequencing.

    Directory of Open Access Journals (Sweden)

    Nils Homer

    2009-11-01

    Full Text Available The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net.

  5. A genome survey sequencing of the Java mouse deer (Tragulus javanicus) adds new aspects to the evolution of lineage specific retrotransposons in Ruminantia (Cetartiodactyla).

    Science.gov (United States)

    Gallus, S; Kumar, V; Bertelsen, M F; Janke, A; Nilsson, M A

    2015-10-25

    Ruminantia, the ruminating, hoofed mammals (cow, deer, giraffe and allies) are an unranked artiodactylan clade. Around 50-60 million years ago the BovB retrotransposon entered the ancestral ruminantian genome through horizontal gene transfer. A survey genome screen using 454-pyrosequencing of the Java mouse deer (Tragulus javanicus) and the lesser kudu (Tragelaphus imberbis) was done to investigate and to compare the landscape of transposable elements within Ruminantia. The family Tragulidae (mouse deer) is the only representative of Tragulina and phylogenetically important, because it represents the earliest divergence in Ruminantia. The data analyses show that, relative to other ruminantian species, the lesser kudu genome has seen an expansion of BovB Long INterspersed Elements (LINEs) and BovB related Short INterspersed Elements (SINEs) like BOVA2. In comparison the genome of Java mouse deer has fewer BovB elements than other ruminants, especially Bovinae, and has in addition a novel CHR-3 SINE most likely propagated by LINE-1. By contrast the other ruminants have low amounts of CHR SINEs but high numbers of actively propagating BovB-derived and BovB-propagated SINEs. The survey sequencing data suggest that the transposable element landscape in mouse deer (Tragulina) is unique among Ruminantia, suggesting a lineage specific evolutionary trajectory that does not involve BovB mediated retrotransposition. This shows that the genomic landscape of mobile genetic elements can rapidly change in any lineage. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. BBMap: A Fast, Accurate, Splice-Aware Aligner

    Energy Technology Data Exchange (ETDEWEB)

    Bushnell, Brian

    2014-03-17

    Alignment of reads is one of the primary computational tasks in bioinformatics. Of paramount importance to resequencing, alignment is also crucial to other areas - quality control, scaffolding, string-graph assembly, homology detection, assembly evaluation, error-correction, expression quantification, and even as a tool to evaluate other tools. An optimal aligner would greatly improve virtually any sequencing process, but optimal alignment is prohibitively expensive for gigabases of data. Here, we will present BBMap [1], a fast splice-aware aligner for short and long reads. We will demonstrate that BBMap has superior speed, sensitivity, and specificity to alternative high-throughput aligners bowtie2 [2], bwa [3], smalt, [4] GSNAP [5], and BLASR [6].

  7. A desktop 3D printer in safety-critical Java

    DEFF Research Database (Denmark)

    Strøm, Tórur Biskopstø; Schoeberl, Martin

    2012-01-01

    there exist several safety-critical Java framework implementations, there is a lack of safety-critical use cases implemented according to the specification. In this paper we present a 3D printer and its safety-critical Java level 1 implementation as a use case. With basis in the implementation we evaluate......It is desirable to bring Java technology to safety-critical systems. To this end The Open Group has created the safety-critical Java specification, which will allow Java applications, written according to the specification, to be certifiable in accordance with safety-critical standards. Although...

  8. Global alignment algorithms implementations | Fatumo ...

    African Journals Online (AJOL)

    In this paper, we implemented the two routes for sequence comparison, that is; the dotplot and Needleman-wunsch algorithm for global sequence alignment. Our algorithms were implemented in python programming language and were tested on Linux platform 1.60GHz, 512 MB of RAM SUSE 9.2 and 10.1 versions.

  9. Development of Remote Inspection Systems with the Java Applet

    International Nuclear Information System (INIS)

    Choi, Yoo Rark; Lee, Jae Cheol; Kim, Jae Hee

    2005-01-01

    The world wide web and java are powerful networking technologies on the internet. An applet is a program written in the java programming language that can be included in an HTML page, much in the same way as an image is included. When we use a Java technology-enabled browser to view a page that contains an applet, the applet code is transferred to a client's system and executed by the browser's Java Virtual Machine (JVM). We have developed two remote inspection systems for a reactor wall inspection and guide tube spilt pin inspection based on the java and traditional programming language. The java is used on a GUI(graphic user interface) and the traditional visual C++ programming language is used to control the inspection equipments

  10. Towards an Existential Types Model for Java with Wildcards

    DEFF Research Database (Denmark)

    Cameron, Nicholas; Drossopoulou, Sophia; Ernst, Erik

    2007-01-01

    Wildcards extend Java generics by softening the mismatch between subtype and parametric polymorphism. Although they are a key part of the Java 5.0 programming language, a type system including wildcards has never been proven type sound. Wildcards have previously been formalised as existential types....... In this paper we extend FGJ, a featherweight formalisation of Java with generics, with existential types. We prove that this calculus, ExistsJ, is type sound, and illustrate how it models wildcards in the Java Programming Language. ExistsJ is not a full model for Java wildcards, because it does not support...... lower bounds for wildcards. We discuss why ExistsJ can not be easily extended with lower bounds, and how full Java wildcards could be modelled in a type sound way....

  11. Visualization Software for VisIT Java Client

    Energy Technology Data Exchange (ETDEWEB)

    2017-01-01

    The VisIT Java Client (JVC) library is a lightweight thin client that is designed and written purely in the native language of Java (the Python & JavaScript versions of the library use the same concept) and communicates with any new unmodified standalone version of VisIT, a high performance computing parallel visualization toolkit, over traditional or web sockets and dynamically determines capabilities of the running VisIT instance whether local or remote.

  12. Conformance test development with the Java modeling language

    DEFF Research Database (Denmark)

    Søndergaard, Hans; Korsholm, Stephan E.; Ravn, Anders P.

    2017-01-01

    In order to claim conformance with a Java Specification Request, a Java implementation has to pass all tests in an associated Technology Compatibility Kit (TCK). This paper presents a model-based development of a TCK test suite and a test execution tool for the draft Safety-Critical Java (SCJ......) profile specification. The Java Modeling Language (JML) is used to model conformance constraints for the profile. JML annotations define contracts for classes and interfaces. The annotations are translated by a tool into runtime assertion checks.Hereby the design and elaboration of the concrete test cases...

  13. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  14. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...

  15. Interactive Web Services with Java

    DEFF Research Database (Denmark)

    Møller, Anders; Schwartzbach, Michael Ignatieff

    This slide collection about Java Web service programming, JSP, Servlets and JWIG is created by: Anders Møller and Michael I. Schwartzbach at the BRICS research center at University of Aarhus, Denmark.......This slide collection about Java Web service programming, JSP, Servlets and JWIG is created by: Anders Møller and Michael I. Schwartzbach at the BRICS research center at University of Aarhus, Denmark....

  16. Functional programming in JavaScript

    CERN Document Server

    Mantyla, Dan

    2015-01-01

    If you are a JavaScript developer interested in learning functional programming, looking for the quantum leap towards mastering the JavaScript language, or just want to become a better programmer in general, then this book is ideal for you. It is aimed at programmers involved in developing reactive frontend apps, server-side apps that wrangle with reliability and concurrency, and everything in between.

  17. Development of Remote Inspection Systems with the Java Applet

    Energy Technology Data Exchange (ETDEWEB)

    Choi, Yoo Rark; Lee, Jae Cheol; Kim, Jae Hee [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2005-07-01

    The world wide web and java are powerful networking technologies on the internet. An applet is a program written in the java programming language that can be included in an HTML page, much in the same way as an image is included. When we use a Java technology-enabled browser to view a page that contains an applet, the applet code is transferred to a client's system and executed by the browser's Java Virtual Machine (JVM). We have developed two remote inspection systems for a reactor wall inspection and guide tube spilt pin inspection based on the java and traditional programming language. The java is used on a GUI(graphic user interface) and the traditional visual C++ programming language is used to control the inspection equipments.

  18. JBoss Weld CDI for Java platform

    CERN Document Server

    Finnegan, Ken

    2013-01-01

    This book is a mini tutorial with plenty of code examples and strategies to give you numerous options when building your own applications.""JBoss Weld CDI for Java Platform"" is written for developers who are new to dependency injection. A rudimentary knowledge of Java is required.

  19. jFuzz: A Concolic Whitebox Fuzzer for Java

    Science.gov (United States)

    Jayaraman, Karthick; Harvison, David; Ganesh, Vijay; Kiezun, Adam

    2009-01-01

    We present jFuzz, a automatic testing tool for Java programs. jFuzz is a concolic whitebox fuzzer, built on the NASA Java PathFinder, an explicit-state Java model checker, and a framework for developing reliability and analysis tools for Java. Starting from a seed input, jFuzz automatically and systematically generates inputs that exercise new program paths. jFuzz uses a combination of concrete and symbolic execution, and constraint solving. Time spent on solving constraints can be significant. We implemented several well-known optimizations and name-independent caching, which aggressively normalizes the constraints to reduce the number of calls to the constraint solver. We present preliminary results due to the optimizations, and demonstrate the effectiveness of jFuzz in creating good test inputs. The source code of jFuzz is available as part of the NASA Java PathFinder. jFuzz is intended to be a research testbed for investigating new testing and analysis techniques based on concrete and symbolic execution. The source code of jFuzz is available as part of the NASA Java PathFinder.

  20. Extensible numerical library in JAVA

    International Nuclear Information System (INIS)

    Aso, T.; Okazawa, H.; Takashimizu, N.

    2001-01-01

    The authors present the current status of the project for developing the numerical library in JAVA. The authors have presented how object-oriented techniques improve usage and also development of numerical libraries compared with the conventional way at previous conference. The authors need many functions for data analysis which is not provided within JAVA language, for example, good random number generators, special functions and so on. Authors' development strategy is focused on easiness of implementation and adding new features by users themselves not only by developers. In HPC field, there are other focus efforts to develop numerical libraries in JAVA. However, their focus is on the performance of execution, not easiness of extension. Following the strategy, the authors have designed and implemented more classes for random number generators and so on

  1. Explicit Precedence Constraints in Safety-Critical Java

    DEFF Research Database (Denmark)

    Puffitsch, Wolfgang; Noulard, Eric; Pagetti, Claire

    2013-01-01

    Safety-critical Java (SCJ) aims at making the amenities of Java available for the development of safety-critical applications. The multi-rate synchronous language Prelude facilitates the specification of the communication and timing requirements of complex real-time systems. This paper combines...... to provide explicit support for precedence constraints. We present the considerations behind the design of this extension and discuss our experiences with a first prototype implementation based on the SCJ implementation of the Java Optimized Processor....

  2. COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.

    Science.gov (United States)

    Lu, Yang Young; Chen, Ting; Fuhrman, Jed A; Sun, Fengzhu

    2017-03-15

    The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based on sequence composition and coverage across multiple samples. The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is using L 1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy. In our study, we have investigated two types of additional knowledge, the co-alignment to reference genomes and linkage of contigs provided by paired-end reads, as well as the ensemble of both. We find that both co-alignment and linkage information further improve binning in the majority of cases. COCACOLA is scalable and faster than CONCOCT, GroopM, MaxBin and MetaBAT. The software is available at https://github.com/younglululu/COCACOLA . fsun@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  3. Learning Java by building Android games

    CERN Document Server

    Horton, John

    2015-01-01

    If you are completely new to either Java, Android, or game programming and are aiming to publish Android games, then this book is for you. This book also acts as a refresher for those who already have experience in Java on another platforms or other object-oriented languages.

  4. D-GENIES: dot plot large genomes in an interactive, efficient and simple way.

    Science.gov (United States)

    Cabanettes, Floréal; Klopp, Christophe

    2018-01-01

    Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.

  5. The Java EE architect's handbook how to be a successful application architect for Java EE applications

    CERN Document Server

    Ashmore, Derek C.

    2014-01-01

    This handbook is a concise guide to assuming the role of application architect for Java EE applications. This handbook will guide the application architect through the entire Java EE project including identifying business requirements, performing use-case analysis, object and data modeling, and guiding a development team during construction. This handbook will provide tips and techniques for communicating with project managers and management. This handbook will provide strategies for making your application easier and less costly to support. Whether you are about to architect your first Java EE application or are looking for ways to keep your projects on-time and on-budget, you will refer to this handbook again and again.

  6. New Web Server - the Java Version of Tempest - Produced

    Science.gov (United States)

    York, David W.; Ponyik, Joseph G.

    2000-01-01

    A new software design and development effort has produced a Java (Sun Microsystems, Inc.) version of the award-winning Tempest software (refs. 1 and 2). In 1999, the Embedded Web Technology (EWT) team received a prestigious R&D 100 Award for Tempest, Java Version. In this article, "Tempest" will refer to the Java version of Tempest, a World Wide Web server for desktop or embedded systems. Tempest was designed at the NASA Glenn Research Center at Lewis Field to run on any platform for which a Java Virtual Machine (JVM, Sun Microsystems, Inc.) exists. The JVM acts as a translator between the native code of the platform and the byte code of Tempest, which is compiled in Java. These byte code files are Java executables with a ".class" extension. Multiple byte code files can be zipped together as a "*.jar" file for more efficient transmission over the Internet. Today's popular browsers, such as Netscape (Netscape Communications Corporation) and Internet Explorer (Microsoft Corporation) have built-in Virtual Machines to display Java applets.

  7. Method and apparatus for biological sequence comparison

    Science.gov (United States)

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  8. Professionell entwickeln mit Java EE 7 das umfassende Handbuch

    CERN Document Server

    Salvanos, Alexander

    2014-01-01

    Ein praxisorientiertes Lehrbuch zur JEE 7 und zugleich ein Standardwerk, das Sie auf Ihrem Weg zum Enterprise-Entwickler begleitet. Inklusive Best Practices für leichtgewichtige Enterprise Applications, Standards zu servicebasierten Anwendungen, Persistence API, Dependency Injection, HTML5 und Performance-Tuning. Aus dem Inhalt: Grundlagen und Installation Software-Architektur mit Java EE Webanwendungen mit HTML5 und CSS3 Java Standard Tag Library Asynchrone Servlets und Non-blocking I/O Datenbanken anbinden mit der JDBC API Java Server Faces Datenbank-Organisation Java Persistence API Objektrelationales Mapping Transaktionssicherheit Performance-Tuning Internationalisierung Enterprise Java Beans WebSockets PrimeFaces Deployment Galileo Press heißt von jetzt an Rheinwerk Verlag.

  9. Java Performance Mysteries

    Directory of Open Access Journals (Sweden)

    Maldikar Pranita

    2016-01-01

    The contributions of this paper are (1 Observing Java performance mysteries in the cloud, (2 Identifying the sources of performance mysteries, and (3 Obtaining optimal and reproducible performance data.

  10. Introduction to Graphics Programming in Java

    DEFF Research Database (Denmark)

    Rosendahl, Mads

    Writing graphics applications in Java using Swing can be quite a daunting experience which requires understanding of some large libraries, and fairly advanced aspects of Java. In these notes we will show that by using a small subset of the Swing package we can write a write range of graphics...

  11. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  12. RNA Structural Alignments, Part I

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Gorodkin, Jan

    2014-01-01

    Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as "RNA structural alignment." A class of the methods for structural alignment is based on the principles proposed by Sankoff more than 25 years ago. The Sankoff algorithm simultaneously folds and aligns...... is so high that it took more than a decade before the first implementation of a Sankoff style algorithm was published. However, with the faster computers available today and the improved heuristics used in the implementations the Sankoff-based methods have become practical. This chapter describes...... the methods based on the Sankoff algorithm. All the practical implementations of the algorithm use heuristics to make them run in reasonable time and memory. These heuristics are also described in this chapter....

  13. Accelerator and transport line survey and alignment

    International Nuclear Information System (INIS)

    Ruland, R.E.

    1991-10-01

    This paper summarizes the survey and alignment processes of accelerators and transport lines and discusses the propagation of errors associated with these processes. The major geodetic principles governing the survey and alignment measurement space are introduced and their relationship to a lattice coordinate system shown. The paper continues with a broad overview about the activities involved in the step sequence from initial absolute alignment to final smoothing. Emphasis is given to the relative alignment of components, in particular to the importance of incorporating methods to remove residual systematic effects in surveying and alignment operations. Various approaches to smoothing used at major laboratories are discussed. 47 refs., 19 figs., 1 tab

  14. Long sequence correlation coprocessor

    Science.gov (United States)

    Gage, Douglas W.

    1994-09-01

    A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.

  15. Overview of Java application configuration frameworks

    OpenAIRE

    Denisov, Victor

    2013-01-01

    This paper reviews three major application configuration frameworks for Java-based applications: java.util.Properties, Apache Commons Configuration and Preferences API. Basic functionality of each framework is illustrated with code examples. Pros and cons of each framework are described in moderate detail. Suggestions are made about typical use cases for each framework.

  16. A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

    Directory of Open Access Journals (Sweden)

    Robert Lindner

    Full Text Available Transcriptome sequencing (RNA-Seq overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.

  17. RxJava essentials

    CERN Document Server

    Morgillo, Ivan

    2015-01-01

    If you are an experienced Java developer, reactive programming will give you a new way to approach scalability and concurrency in your backend systems, without forcing you to switch programming languages.

  18. T-BAS: Tree-Based Alignment Selector toolkit for phylogenetic-based placement, alignment downloads and metadata visualization: an example with the Pezizomycotina tree of life.

    Science.gov (United States)

    Carbone, Ignazio; White, James B; Miadlikowska, Jolanta; Arnold, A Elizabeth; Miller, Mark A; Kauff, Frank; U'Ren, Jana M; May, Georgiana; Lutzoni, François

    2017-04-15

    High-quality phylogenetic placement of sequence data has the potential to greatly accelerate studies of the diversity, systematics, ecology and functional biology of diverse groups. We developed the Tree-Based Alignment Selector (T-BAS) toolkit to allow evolutionary placement and visualization of diverse DNA sequences representing unknown taxa within a robust phylogenetic context, and to permit the downloading of highly curated, single- and multi-locus alignments for specific clades. In its initial form, T-BAS v1.0 uses a core phylogeny of 979 taxa (including 23 outgroup taxa, as well as 61 orders, 175 families and 496 genera) representing all 13 classes of largest subphylum of Fungi-Pezizomycotina (Ascomycota)-based on sequence alignments for six loci (nr5.8S, nrLSU, nrSSU, mtSSU, RPB1, RPB2 ). T-BAS v1.0 has three main uses: (i) Users may download alignments and voucher tables for members of the Pezizomycotina directly from the reference tree, facilitating systematics studies of focal clades. (ii) Users may upload sequence files with reads representing unknown taxa and place these on the phylogeny using either BLAST or phylogeny-based approaches, and then use the displayed tree to select reference taxa to include when downloading alignments. The placement of unknowns can be performed for large numbers of Sanger sequences obtained from fungal cultures and for alignable, short reads of environmental amplicons. (iii) User-customizable metadata can be visualized on the tree. T-BAS Version 1.0 is available online at http://tbas.hpc.ncsu.edu . Registration is required to access the CIPRES Science Gateway and NSF XSEDE's large computational resources. icarbon@ncsu.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  19. Pointer Analysis for JavaScript Programming Tools

    DEFF Research Database (Denmark)

    Feldthaus, Asger

    Tools that can assist the programmer with tasks, such as, refactoring or code navigation, have proven popular for Java, C#, and other programming languages. JavaScript is a widely used programming language, and its users could likewise benefit from such tools, but the dynamic nature of the language...... is an obstacle for the development of these. Because of this, tools for JavaScript have long remained ineffective compared to those for many other programming languages. Static pointer analysis can provide a foundation for more powerful tools, although the design of this analysis is itself a complicated endeavor....... In this work, we explore techniques for performing pointer analysis of JavaScript programs, and we find novel applications of these techniques. In particular, we demonstrate how these can be used for code navigation, automatic refactoring, semi-automatic refactoring of incomplete programs, and checking of type...

  20. Patterns for Safety-Critical Java Memory Usage

    DEFF Research Database (Denmark)

    Rios Rivas, Juan Ricardo; Nilsen, Kelvin; Schoeberl, Martin

    2012-01-01

    Scoped memories are introduced in real-time Java profiles in order to make object allocation and deallocation time and space predictable. However, explicit scoping requires care from programmers when dealing with temporary objects, passing scope-allocated objects as arguments to methods, and retu......Scoped memories are introduced in real-time Java profiles in order to make object allocation and deallocation time and space predictable. However, explicit scoping requires care from programmers when dealing with temporary objects, passing scope-allocated objects as arguments to methods...... are illustrated by implementations in the safety-critical Java profile....

  1. Fixing the Sorting Algorithm for Android, Java and Python

    NARCIS (Netherlands)

    C.P.T. de Gouw (Stijn); F.S. de Boer (Frank)

    2015-01-01

    htmlabstractTim Peters developed the Timsort hybrid sorting algorithm in 2002. TimSort was first developed for Python, a popular programming language, but later ported to Java (where it appears as java.util.Collections.sort and java.util.Arrays.sort). TimSort is today used as the default sorting

  2. A new Java Thread model for concurrent programming of real-time systems

    NARCIS (Netherlands)

    Hilderink, G.H.; Broenink, Johannes F.; Bakkers, André

    1998-01-01

    The Java ™ Virtual Machine (JVM) provides a high degree of platform independence, but being an interpreter, Java has a poor system performance. New compiler techniques and Java processors will gradually improve the performance of Java, but despite these developments, Java is still far from

  3. ASH structure alignment package: Sensitivity and selectivity in domain classification

    Directory of Open Access Journals (Sweden)

    Toh Hiroyuki

    2007-04-01

    Full Text Available Abstract Background Structure alignment methods offer the possibility of measuring distant evolutionary relationships between proteins that are not visible by sequence-based analysis. However, the question of how structural differences and similarities ought to be quantified in this regard remains open. In this study we construct a training set of sequence-unique CATH and SCOP domains, from which we develop a scoring function that can reliably identify domains with the same CATH topology and SCOP fold classification. The score is implemented in the ASH structure alignment package, for which the source code and a web service are freely available from the PDBj website http://www.pdbj.org/ASH/. Results The new ASH score shows increased selectivity and sensitivity compared with values reported for several popular programs using the same test set of 4,298,905 structure pairs, yielding an area of .96 under the receiver operating characteristic (ROC curve. In addition, weak sequence homologies between similar domains are revealed that could not be detected by BLAST sequence alignment. Also, a subset of domain pairs is identified that exhibit high similarity, even though their CATH and SCOP classification differs. Finally, we show that the ranking of alignment programs based solely on geometric measures depends on the choice of the quality measure. Conclusion ASH shows high selectivity and sensitivity with regard to domain classification, an important step in defining distantly related protein sequence families. Moreover, the CPU cost per alignment is competitive with the fastest programs, making ASH a practical option for large-scale structure classification studies.

  4. Practical database programming with Java

    CERN Document Server

    Bai, Ying

    2011-01-01

    "This important resource offers a detailed description about the practical considerations and applications in database programming using Java NetBeans 6.8 with authentic examples and detailed explanations. This book provides readers with a clear picture as to how to handle the database programming issues in the Java NetBeans environment. The book is ideal for classroom and professional training material. It includes a wealth of supplemental material that is available for download including Powerpoint slides, solution manuals, and sample databases"--

  5. W-curve alignments for HIV-1 genomic comparisons.

    Directory of Open Access Journals (Sweden)

    Douglas J Cork

    2010-06-01

    Full Text Available The W-curve was originally developed as a graphical visualization technique for viewing DNA and RNA sequences. Its ability to render features of DNA also makes it suitable for computational studies. Its main advantage in this area is utilizing a single-pass algorithm for comparing the sequences. Avoiding recursion during sequence alignments offers advantages for speed and in-process resources. The graphical technique also allows for multiple models of comparison to be used depending on the nucleotide patterns embedded in similar whole genomic sequences. The W-curve approach allows us to compare large numbers of samples quickly.We are currently tuning the algorithm to accommodate quirks specific to HIV-1 genomic sequences so that it can be used to aid in diagnostic and vaccine efforts. Tracking the molecular evolution of the virus has been greatly hampered by gap associated problems predominantly embedded within the envelope gene of the virus. Gaps and hypermutation of the virus slow conventional string based alignments of the whole genome. This paper describes the W-curve algorithm itself, and how we have adapted it for comparison of similar HIV-1 genomes. A treebuilding method is developed with the W-curve that utilizes a novel Cylindrical Coordinate distance method and gap analysis method. HIV-1 C2-V5 env sequence regions from a Mother/Infant cohort study are used in the comparison.The output distance matrix and neighbor results produced by the W-curve are functionally equivalent to those from Clustal for C2-V5 sequences in the mother/infant pairs infected with CRF01_AE.Significant potential exists for utilizing this method in place of conventional string based alignment of HIV-1 genomes, such as Clustal X. With W-curve heuristic alignment, it may be possible to obtain clinically useful results in a short time-short enough to affect clinical choices for acute treatment. A description of the W-curve generation process, including a comparison

  6. W-curve alignments for HIV-1 genomic comparisons.

    Science.gov (United States)

    Cork, Douglas J; Lembark, Steven; Tovanabutra, Sodsai; Robb, Merlin L; Kim, Jerome H

    2010-06-01

    The W-curve was originally developed as a graphical visualization technique for viewing DNA and RNA sequences. Its ability to render features of DNA also makes it suitable for computational studies. Its main advantage in this area is utilizing a single-pass algorithm for comparing the sequences. Avoiding recursion during sequence alignments offers advantages for speed and in-process resources. The graphical technique also allows for multiple models of comparison to be used depending on the nucleotide patterns embedded in similar whole genomic sequences. The W-curve approach allows us to compare large numbers of samples quickly. We are currently tuning the algorithm to accommodate quirks specific to HIV-1 genomic sequences so that it can be used to aid in diagnostic and vaccine efforts. Tracking the molecular evolution of the virus has been greatly hampered by gap associated problems predominantly embedded within the envelope gene of the virus. Gaps and hypermutation of the virus slow conventional string based alignments of the whole genome. This paper describes the W-curve algorithm itself, and how we have adapted it for comparison of similar HIV-1 genomes. A treebuilding method is developed with the W-curve that utilizes a novel Cylindrical Coordinate distance method and gap analysis method. HIV-1 C2-V5 env sequence regions from a Mother/Infant cohort study are used in the comparison. The output distance matrix and neighbor results produced by the W-curve are functionally equivalent to those from Clustal for C2-V5 sequences in the mother/infant pairs infected with CRF01_AE. Significant potential exists for utilizing this method in place of conventional string based alignment of HIV-1 genomes, such as Clustal X. With W-curve heuristic alignment, it may be possible to obtain clinically useful results in a short time-short enough to affect clinical choices for acute treatment. A description of the W-curve generation process, including a comparison technique of

  7. Design and Implementation of Web Based Supply Centers Material Request and Tracking (SMART) System Using With JAVA and JAVA Servlets

    National Research Council Canada - National Science Library

    Ciftci, Cemalettin

    2001-01-01

    .... The third tier maintains the database management systems. Java servlets and Java provide programmers platform and operating system independent, multi-threaded, object oriented, secure and mobile means to create dynamic content on the web...

  8. Programación Java

    OpenAIRE

    Martínez de Morentin Iribarren, Xabier

    2014-01-01

    Este proyecto trata de informar y dar una base sobre Java, así como los programas a los cuales se les da uso, para facilitar la programación en el mundo laboral, ya sean programas de gestión de datos como Assembla y Tortoise o de desarrollo de aplicaciones, como Eclipse o NetBeans. Trata de llevar al ámbito más profesional, la realización de una aplicación Java. Para ello se respetarán los convenios a la hora de denominaciones de clases, así como los métodos, etc., y la realización d...

  9. Learning PHP, MySQL, and JavaScript

    CERN Document Server

    Nixon, Robin

    2009-01-01

    If you know HTML, this guide will have you building interactive websites quickly. You'll learn how to create responsive, data-driven websites with PHP, MySQL, and JavaScript, regardless of whether you already know how to program. Discover how the powerful combination of PHP and MySQL provides an easy way to build modern websites complete with dynamic data and user interaction. You'll also learn how to add JavaScript to create rich Internet applications and websites. Learning PHP, MySQL, and JavaScript explains each technology separately, shows you how to combine them, and introduces valuable

  10. Runtime Support for Type-Safe Dynamic Java Classes

    National Research Council Canada - National Science Library

    Malabarba, Scott; Pandey, Raju; Gragg, Jeff; Barr, Earl; Barnes, J. F

    2000-01-01

    .... In this paper we present an approach for supporting dynamic evolution of Java programs. In this approach, Java programs can evolve by changing their components, namely classes, during their execution...

  11. Learning JavaScript data structures and algorithms

    CERN Document Server

    Groner, Loiane

    2014-01-01

    If you are a JavaScript developer or someone who has basic knowledge of JavaScript, and want to explore its optimum ability, this fast-paced book is definitely for you. Programming logic is the only thing you need to know to start having fun with algorithms.

  12. Java parallel secure stream for grid computing

    International Nuclear Information System (INIS)

    Chen, J.; Akers, W.; Chen, Y.; Watson, W.

    2001-01-01

    The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed

  13. Java 7 New Features Cookbook

    CERN Document Server

    Reese, Richard M

    2012-01-01

    Each recipe comprises step-by-step instructions followed by an analysis of what was done in each task and other useful information. The book is designed so that you can read it chapter by chapter, or look at the list of recipes and refer to them in no particular order. Each example comes with its expected output to make your learning even easier. This book is designed to bring those who are familiar with Java up-to-speed on the new features found in Java 7.

  14. Potential of Fishing Port Development in the East Java

    Science.gov (United States)

    Rosana, N.; Prasita, V. D.

    2018-03-01

    The development of fishing ports in East Java is one of efforts to increase the province revenue, develop the fishing industry and to improve the welfare of fishermen. Profile of capture fisheries in East Java should be provide information that can be developed based on priorities and types of resources. The purpose of this study was to determine several districts in East Java which is a priority for the development of fishing ports based on the potential that exists. The method used is descriptive and spatial analysis in order to obtain an overview of the districts in East Java that has the potential to be developed. Results of the study is the districts in East Java are priorities for the development of fishing ports based on aspects related to the fishery is Banyuwangi, Trenggalek and Jember.

  15. Semiautomated improvement of RNA alignments

    DEFF Research Database (Denmark)

    Andersen, Ebbe Sloth; Lind-Thomsen, Allan; Knudsen, Bjarne

    2007-01-01

    connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database...... and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster......: the mir-399 RNA, vertebrate telomase RNA (vert-TR), bacterial transfer-messenger RNA (tmRNA), and the signal recognition particle (SRP) RNA. The general use of the method is illustrated by the ability to accommodate pseudoknots and handle even large and divergent RNA families. The open architecture...

  16. BioJava-ModFinder: identification of protein modifications in 3D structures from the Protein Data Bank.

    Science.gov (United States)

    Gao, Jianjiong; Prlic, Andreas; Bi, Chunxiao; Bluhm, Wolfgang F; Dimitropoulos, Dimitris; Xu, Dong; Bourne, Philip E; Rose, Peter W

    2017-07-01

    We developed a new software tool, BioJava-ModFinder, for identifying protein modifications observed in 3D structures archived in the Protein Data Bank (PDB). Information on more than 400 types of protein modifications were collected and curated from annotations in PDB, RESID, and PSI-MOD. We divided these modifications into three categories: modified residues, attachment modifications, and cross-links. We have developed a systematic method to identify these modifications in 3D protein structures. We have integrated this package with the RCSB PDB web application and added protein modification annotations to the sequence diagram and structure display. By scanning all 3D structures in the PDB using BioJava-ModFinder, we identified more than 30 000 structures with protein modifications, which can be searched, browsed, and visualized on the RCSB PDB website. BioJava-ModFinder is available as open source (LGPL license) at ( https://github.com/biojava/biojava/tree/master/biojava-modfinder ). The RCSB PDB can be accessed at http://www.rcsb.org . pwrose@ucsd.edu. © The Author 2017. Published by Oxford University Press.

  17. Pairwise structure alignment specifically tuned for surface pockets and interaction interfaces

    KAUST Repository

    Cui, Xuefeng

    2015-09-09

    To detect and evaluate the similarities between the three-dimensional (3D) structures of two molecules, various kinds of methods have been proposed for the pairwise structure alignment problem [6, 9, 7, 11]. The problem plays important roles when studying the function and the evolution of biological molecules. Recently, pairwise structure alignment methods have been extended and applied on surface pocket structures [10, 3, 5] and interaction interface structures [8, 4]. The results show that, even when there are no global similarities discovered between the global sequences and the global structures, biological molecules or complexes could share similar functions because of well conserved pockets and interfaces. Thus, pairwise pocket and interface structure alignments are promising to unveil such shared functions that cannot be discovered by the well-studied global sequence and global structure alignments. State-of-the-art methods for pairwise pocket and interface structure alignments [4, 5] are direct extensions of the classic pairwise protein structure alignment methods, and thus such methods share a few limitations. First, the goal of the classic protein structure alignment methods is to align single-chain protein structures (i.e., a single fragment of residues connected by peptide bonds). However, we observed that pockets and interfaces tend to consist of tens of extremely short backbone fragments (i.e., three or fewer residues connected by peptide bonds). Thus, existing pocket and interface alignment methods based on the protein structure alignment methods still rely on the existence of long-enough backbone fragments, and the fragmentation issue of pockets and interfaces rises the risk of missing the optimal alignments. Moreover, existing interface structure alignment methods focus on protein-protein interfaces, and require a "blackbox preprocessing" before aligning protein-DNA and protein-RNA interfaces. Therefore, we introduce the PROtein STucture Alignment

  18. SPA: a probabilistic algorithm for spliced alignment.

    Directory of Open Access Journals (Sweden)

    2006-04-01

    Full Text Available Recent large-scale cDNA sequencing efforts show that elaborate patterns of splice variation are responsible for much of the proteome diversity in higher eukaryotes. To obtain an accurate account of the repertoire of splice variants, and to gain insight into the mechanisms of alternative splicing, it is essential that cDNAs are very accurately mapped to their respective genomes. Currently available algorithms for cDNA-to-genome alignment do not reach the necessary level of accuracy because they use ad hoc scoring models that cannot correctly trade off the likelihoods of various sequencing errors against the probabilities of different gene structures. Here we develop a Bayesian probabilistic approach to cDNA-to-genome alignment. Gene structures are assigned prior probabilities based on the lengths of their introns and exons, and based on the sequences at their splice boundaries. A likelihood model for sequencing errors takes into account the rates at which misincorporation, as well as insertions and deletions of different lengths, occurs during sequencing. The parameters of both the prior and likelihood model can be automatically estimated from a set of cDNAs, thus enabling our method to adapt itself to different organisms and experimental procedures. We implemented our method in a fast cDNA-to-genome alignment program, SPA, and applied it to the FANTOM3 dataset of over 100,000 full-length mouse cDNAs and a dataset of over 20,000 full-length human cDNAs. Comparison with the results of four other mapping programs shows that SPA produces alignments of significantly higher quality. In particular, the quality of the SPA alignments near splice boundaries and SPA's mapping of the 5' and 3' ends of the cDNAs are highly improved, allowing for more accurate identification of transcript starts and ends, and accurate identification of subtle splice variations. Finally, our splice boundary analysis on the human dataset suggests the existence of a novel non

  19. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    DEFF Research Database (Denmark)

    de Souza, S J; Camargo, A A; Briones, M R

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central ...

  20. JavaScript and interactive web pages in radiology.

    Science.gov (United States)

    Gurney, J W

    2001-10-01

    Web publishing is becoming a more common method of disseminating information. JavaScript is an object-orientated language embedded into modern browsers and has a wide variety of uses. The use of JavaScript in radiology is illustrated by calculating the indices of sensitivity, specificity, and predictive values from a table of true positives, true negatives, false positives, and false negatives. In addition, a single line of JavaScript code can be used to annotate images, which has a wide variety of uses.

  1. Assessing genetic diversity in java fine-flavor cocoa (theobroma cacao l.) Germplasm by simple sequence repeat (ssr) markers

    Science.gov (United States)

    Indonesia is the 3rd largest cocoa producing countries in the world, with an annual cacao bean production of 572,000 tons. The currently cultivated cacao varieties in Indonesia were inter-hybrids of various clones introduced from the Americas since the 16th century. Among them, “Java cocoa” is a wel...

  2. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  3. Java and its future in biomedical computing.

    Science.gov (United States)

    Rodgers, R P

    1996-01-01

    Java, a new object-oriented computing language related to C++, is receiving considerable attention due to its use in creating network-sharable, platform-independent software modules (known as "applets") that can be used with the World Wide Web. The Web has rapidly become the most commonly used information-retrieval tool associated with the global computer network known as the Internet, and Java has the potential to further accelerate the Web's application to medical problems. Java's potentially wide acceptance due to its Web association and its own technical merits also suggests that it may become a popular language for non-Web-based, object-oriented computing. PMID:8880677

  4. ANALYSIS OF HIV SUBTYPES AND CLINICAL STAGING OF HIV DISEASE/AIDS IN EAST JAVA

    Directory of Open Access Journals (Sweden)

    Yulia Ismail

    2012-04-01

    Full Text Available Human Immunodeficiency Virus type 1 (HIV-1 known to cause Acquired Immune Deficiency Syndrome (AIDS disease are divided into several subtypes (A, B, C, D, F, G, H, J, K and Circulating Recombinant Form (CRF. Different characteristics of subtype of the virus and its interaction with the host can affect the severity of the disease. This study was to analyze HIV-1 subtypes circulating in HIV/AIDS patients from the East Java region descriptively and to analyze its relationship with clinical stadiums of HIV/AIDS. Information from this research was expected to complement the data of mocular epidemiology of HIV in Indonesia. This study utilited blood plasma from patients who had been tested to be HIV positive who sected treatment to or were reffered to the Intermediate Care Unit of Infectious Disease (UPIPI Dr. Soetomo Hospital Surabaya from various area representing the East Java regions. Plasma was separated from blood samples by centrifugation for use in the the molecular biology examination including RNA extraction, nested PCR using specific primer for HIV gp120 env gene region, DNA purifying, DNA sequencing, and homology and phylogenetic analysis. Based on the nucleotide sequence of the HIV gp120 env gene, it was found that the most dominant subtypes in East Java were in one group of Circulating Recombinant Form (CRF that is CRF01_AE, CRF33_01B and CRF34_01B which was also found in Southeast Asia. In the phylogenetic tree, most of HIV samples (30 samples are in the same branch with CRF01_AE, CRF33_01B and CRF34_01B, except for one sample (HIV40 which is in the same branch with subtype B. HIV subtypes are associated with clinical stadiums (disease severity since samples from different stages of HIV disease have the same subtype.

  5. Using Java for distributed computing in the Gaia satellite data processing

    Science.gov (United States)

    O'Mullane, William; Luri, Xavier; Parsons, Paul; Lammers, Uwe; Hoar, John; Hernandez, Jose

    2011-10-01

    In recent years Java has matured to a stable easy-to-use language with the flexibility of an interpreter (for reflection etc.) but the performance and type checking of a compiled language. When we started using Java for astronomical applications around 1999 they were the first of their kind in astronomy. Now a great deal of astronomy software is written in Java as are many business applications. We discuss the current environment and trends concerning the language and present an actual example of scientific use of Java for high-performance distributed computing: ESA's mission Gaia. The Gaia scanning satellite will perform a galactic census of about 1,000 million objects in our galaxy. The Gaia community has chosen to write its processing software in Java. We explore the manifold reasons for choosing Java for this large science collaboration. Gaia processing is numerically complex but highly distributable, some parts being embarrassingly parallel. We describe the Gaia processing architecture and its realisation in Java. We delve into the astrometric solution which is the most advanced and most complex part of the processing. The Gaia simulator is also written in Java and is the most mature code in the system. This has been successfully running since about 2005 on the supercomputer "Marenostrum" in Barcelona. We relate experiences of using Java on a large shared machine. Finally we discuss Java, including some of its problems, for scientific computing.

  6. HeurAA: accurate and fast detection of genetic variations with a novel heuristic amplicon aligner program for next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Lőrinc S Pongor

    Full Text Available Next generation sequencing (NGS of PCR amplicons is a standard approach to detect genetic variations in personalized medicine such as cancer diagnostics. Computer programs used in the NGS community often miss insertions and deletions (indels that constitute a large part of known human mutations. We have developed HeurAA, an open source, heuristic amplicon aligner program. We tested the program on simulated datasets as well as experimental data from multiplex sequencing of 40 amplicons in 12 oncogenes collected on a 454 Genome Sequencer from lung cancer cell lines. We found that HeurAA can accurately detect all indels, and is more than an order of magnitude faster than previous programs. HeurAA can compare reads and reference sequences up to several thousand base pairs in length, and it can evaluate data from complex mixtures containing reads of different gene-segments from different samples. HeurAA is written in C and Perl for Linux operating systems, the code and the documentation are available for research applications at http://sourceforge.net/projects/heuraa/

  7. Sequence Similarity Presenter: a tool for the graphic display of similarities of long sequences for use in presentations.

    Science.gov (United States)

    Fröhlich, K U

    1994-04-01

    A new method for the presentation of alignments of long sequences is described. The degree of identity for the aligned sequences is averaged for sections of a fixed number of residues. The resulting values are converted to shades of gray, with white corresponding to lack of identity and black corresponding to perfect identity. A sequence alignment is represented as a bar filled with varying shades of gray. The display is compact and allows for a fast and intuitive recognition of the distribution of regions with a high similarity. It is well suited for the presentation of alignments of long sequences, e.g. of protein superfamilies, in plenary lectures. The method is implemented as a HyperCard stack for Apple Macintosh computers. Several options for the modification of the output are available (e.g. background reduction, size of the summation window, consideration of amino acid similarity, inclusion of graphic markers to indicate specific domains). The output is a PostScript file which can be printed, imported as EPS or processed further with Adobe Illustrator.

  8. A Global Network Alignment Method Using Discrete Particle Swarm Optimization.

    Science.gov (United States)

    Huang, Jiaxiang; Gong, Maoguo; Ma, Lijia

    2016-10-19

    Molecular interactions data increase exponentially with the advance of biotechnology. This makes it possible and necessary to comparatively analyse the different data at a network level. Global network alignment is an important network comparison approach to identify conserved subnetworks and get insight into evolutionary relationship across species. Network alignment which is analogous to subgraph isomorphism is known to be an NP-hard problem. In this paper, we introduce a novel heuristic Particle-Swarm-Optimization based Network Aligner (PSONA), which optimizes a weighted global alignment model considering both protein sequence similarity and interaction conservations. The particle statuses and status updating rules are redefined in a discrete form by using permutation. A seed-and-extend strategy is employed to guide the searching for the superior alignment. The proposed initialization method "seeds" matches with high sequence similarity into the alignment, which guarantees the functional coherence of the mapping nodes. A greedy local search method is designed as the "extension" procedure to iteratively optimize the edge conservations. PSONA is compared with several state-of-art methods on ten network pairs combined by five species. The experimental results demonstrate that the proposed aligner can map the proteins with high functional coherence and can be used as a booster to effectively refine the well-studied aligners.

  9. The Strategy of KPID West Java in Socializing Media Literacy

    Directory of Open Access Journals (Sweden)

    Lucy Pujasari Supratman

    2017-07-01

    Full Text Available The socialization strategy of media literacy which has done by KPID West Java (Regional-Indonesia Broadcasting Commission aims to strengthen national integration, national identity, and educate the nation. This study focused on the strategies of KPID West Java as an independent institution of the state in disseminating a variety of media literacy programs that have been run by the West Java KPID period 2015-2018 to promote media literacy. The author used a descriptive case study method. In this study, a case to be analyzed is in how KPID West Java disseminated the media literacy. The essence of media literacy conducted by KPID West Java is to grow the wise society in gratification media and to encourage the broadcaster’s institution to produce quality broadcasting content. KPID West Java as a representative of the public continues to call for media literacy socialization which is harder to be censored if there is no public participation in reporting a content violation. This form of socialization by using new media approach encourage KPID West Java Period 2015-2018 to develop steps of innovative media literacy along with the technology development.

  10. Database Access through Java Technologies

    Directory of Open Access Journals (Sweden)

    Nicolae MERCIOIU

    2010-09-01

    Full Text Available As a high level development environment, the Java technologies offer support to the development of distributed applications, independent of the platform, providing a robust set of methods to access the databases, used to create software components on the server side, as well as on the client side. Analyzing the evolution of Java tools to access data, we notice that these tools evolved from simple methods that permitted the queries, the insertion, the update and the deletion of the data to advanced implementations such as distributed transactions, cursors and batch files. The client-server architectures allows through JDBC (the Java Database Connectivity the execution of SQL (Structured Query Language instructions and the manipulation of the results in an independent and consistent manner. The JDBC API (Application Programming Interface creates the level of abstractization needed to allow the call of SQL queries to any DBMS (Database Management System. In JDBC the native driver and the ODBC (Open Database Connectivity-JDBC bridge and the classes and interfaces of the JDBC API will be described. The four steps needed to build a JDBC driven application are presented briefly, emphasizing on the way each step has to be accomplished and the expected results. In each step there are evaluations on the characteristics of the database systems and the way the JDBC programming interface adapts to each one. The data types provided by SQL2 and SQL3 standards are analyzed by comparison with the Java data types, emphasizing on the discrepancies between those and the SQL types, but also the methods that allow the conversion between different types of data through the methods of the ResultSet object. Next, starting from the metadata role and studying the Java programming interfaces that allow the query of result sets, we will describe the advanced features of the data mining with JDBC. As alternative to result sets, the Rowsets add new functionalities that

  11. Anatomy of the western Java plate interface from depth-migrated seismic images

    Science.gov (United States)

    Kopp, H.; Hindle, D.; Klaeschen, D.; Oncken, O.; Reichert, C.; Scholl, D.

    2009-01-01

    Newly pre-stack depth-migrated seismic images resolve the structural details of the western Java forearc and plate interface. The structural segmentation of the forearc into discrete mechanical domains correlates with distinct deformation styles. Approximately 2/3 of the trench sediment fill is detached and incorporated into frontal prism imbricates, while the floor sequence is underthrust beneath the d??collement. Western Java, however, differs markedly from margins such as Nankai or Barbados, where a uniform, continuous d??collement reflector has been imaged. In our study area, the plate interface reveals a spatially irregular, nonlinear pattern characterized by the morphological relief of subducted seamounts and thicker than average patches of underthrust sediment. The underthrust sediment is associated with a low velocity zone as determined from wide-angle data. Active underplating is not resolved, but likely contributes to the uplift of the large bivergent wedge that constitutes the forearc high. Our profile is located 100 km west of the 2006 Java tsunami earthquake. The heterogeneous d??collement zone regulates the friction behavior of the shallow subduction environment where the earthquake occurred. The alternating pattern of enhanced frictional contact zones associated with oceanic basement relief and weak material patches of underthrust sediment influences seismic coupling and possibly contributed to the heterogeneous slip distribution. Our seismic images resolve a steeply dipping splay fault, which originates at the d??collement and terminates at the sea floor and which potentially contributes to tsunami generation during co-seismic activity. ?? 2009 Elsevier B.V.

  12. Anatomy of the western Java plate interface from depth-migrated seismic images

    Science.gov (United States)

    Kopp, H.; Hindle, D.; Klaeschen, D.; Oncken, O.; Reichert, C.; Scholl, D.

    2009-11-01

    Newly pre-stack depth-migrated seismic images resolve the structural details of the western Java forearc and plate interface. The structural segmentation of the forearc into discrete mechanical domains correlates with distinct deformation styles. Approximately 2/3 of the trench sediment fill is detached and incorporated into frontal prism imbricates, while the floor sequence is underthrust beneath the décollement. Western Java, however, differs markedly from margins such as Nankai or Barbados, where a uniform, continuous décollement reflector has been imaged. In our study area, the plate interface reveals a spatially irregular, nonlinear pattern characterized by the morphological relief of subducted seamounts and thicker than average patches of underthrust sediment. The underthrust sediment is associated with a low velocity zone as determined from wide-angle data. Active underplating is not resolved, but likely contributes to the uplift of the large bivergent wedge that constitutes the forearc high. Our profile is located 100 km west of the 2006 Java tsunami earthquake. The heterogeneous décollement zone regulates the friction behavior of the shallow subduction environment where the earthquake occurred. The alternating pattern of enhanced frictional contact zones associated with oceanic basement relief and weak material patches of underthrust sediment influences seismic coupling and possibly contributed to the heterogeneous slip distribution. Our seismic images resolve a steeply dipping splay fault, which originates at the décollement and terminates at the sea floor and which potentially contributes to tsunami generation during co-seismic activity.

  13. JAVA CONCURENT PROGRAM FOR THE SMARANDACHE FUNCTION

    OpenAIRE

    Power, David; Tabirca, S.; Tabirca, T.

    2004-01-01

    The aim of this article is to propose a Java concurrent program for the Smarandache fimction based on an equation. Some results concerning the theoretical complexity of this program are proposed. Finally, the experimental results of the sequential and Java programs are given in order to demonstrate the efficiency of the conament implementation.

  14. Formal specification with the Java modeling language

    NARCIS (Netherlands)

    Huisman, Marieke; Ahrendt, Wolfgang; Grahl, Daniel; Hentschel, Martin; Ahrendt, Wolfgang; Beckert, Bernhard; Bubel, Richard; Hähnle, Reiner; Schmitt, Peter H.; Ulbrich, Mattoas

    2016-01-01

    This text is a general, self contained, and tool independent introduction into the Java Modeling Language, JML. It appears in a book about the KeY approach and tool, because JML is the dominating starting point of KeY style Java verification. However, this chapter does not depend on KeY, nor any

  15. Distributed nuclear medicine applications using World Wide Web and Java technology

    International Nuclear Information System (INIS)

    Knoll, P.; Hoell, K.; Koriska, K.; Mirzaei, S.; Koehn, H.

    2000-01-01

    At present, medical applications applying World Wide Web (WWW) technology are mainly used to view static images and to retrieve some information. The Java platform is a relative new way of computing, especially designed for network computing and distributed applications which enables interactive connection between user and information via the WWW. The Java 2 Software Development Kit (SDK) including Java2D API, Java Remote Method Invocation (RMI) technology, Object Serialization and the Java Advanced Imaging (JAI) extension was used to achieve a robust, platform independent and network centric solution. Medical image processing software based on this technology is presented and adequate performance capability of Java is demonstrated by an iterative reconstruction algorithm for single photon emission computerized tomography (SPECT). (orig.)

  16. Predicting RNA hyper-editing with a novel tool when unambiguous alignment is impossible.

    Science.gov (United States)

    McKerrow, Wilson H; Savva, Yiannis A; Rezaei, Ali; Reenan, Robert A; Lawrence, Charles E

    2017-07-10

    Repetitive elements are now known to have relevant cellular functions, including self-complementary sequences that form double stranded (ds) RNA. There are numerous pathways that determine the fate of endogenous dsRNA, and misregulation of endogenous dsRNA is a driver of autoimmune disease, particularly in the brain. Unfortunately, the alignment of high-throughput, short-read sequences to repeat elements poses a dilemma: Such sequences may align equally well to multiple genomic locations. In order to differentiate repeat elements, current alignment methods depend on sequence variation in the reference genome. Reads are discarded when no such variations are present. However, RNA hyper-editing, a possible fate for dsRNA, introduces enough variation to distinguish between repeats that are otherwise identical. To take advantage of this variation, we developed a new algorithm, RepProfile, that simultaneously aligns reads and predicts novel variations. RepProfile accurately aligns hyper-edited reads that other methods discard. In particular we predict hyper-editing of Drosophila melanogaster repeat elements in vivo at levels previously described only in vitro, and provide validation by Sanger sequencing sixty-two individual cloned sequences. We find that hyper-editing is concentrated in genes involved in cell-cell communication at the synapse, including some that are associated with neurodegeneration. We also find that hyper-editing tends to occur in short runs. Previous studies of RNA hyper-editing discarded ambiguously aligned reads, ignoring hyper-editing in long, perfect dsRNA - the perfect substrate for hyper-editing. We provide a method that simulation and Sanger validation show accurately predicts such RNA editing, yielding a superior picture of hyper-editing.

  17. A Simple Application Program Interface for Saving Java Program Data on a Wiki

    OpenAIRE

    Yamanoue, Takashi; Oda, Kentaro; Shimozono, Koichi

    2012-01-01

    A simple application program interface (API) for Java programs running on a wiki is implemented experimentally. A Java program with the API can be running on a wiki, and the Java program can save its data on the wiki. The Java program consists of PukiWiki, which is a popular wiki in Japan, and a plug-in, which starts up Java programs and classes of Java. A Java applet with default access privilege cannot save its data at a local host. We have constructed an API of applets for easy and unified...

  18. Learning Online Alignments with Continuous Rewards Policy Gradient

    OpenAIRE

    Luo, Yuping; Chiu, Chung-Cheng; Jaitly, Navdeep; Sutskever, Ilya

    2016-01-01

    Sequence-to-sequence models with soft attention had significant success in machine translation, speech recognition, and question answering. Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition. To address this problem, we present a new method for solving sequence-to-sequence problems using hard online alignments instead of soft offlin...

  19. Molecular characterization and phylogenetic analysis of Fasciola gigantica from western Java, Indonesia.

    Science.gov (United States)

    Hayashi, Kei; Ichikawa-Seki, Madoka; Allamanda, Puttik; Wibowo, Putut Eko; Mohanta, Uday Kumar; Sodirun; Guswanto, Azirwan; Nishikawa, Yoshifumi

    2016-10-01

    Fasciola gigantica and aspermic (hybrid) Fasciola flukes are thought to be distributed in Southeast Asian countries. The objectives of this study were to investigate the distribution of these flukes from unidentified ruminants in western Java, Indonesia, and to determine their distribution history into the area. Sixty Fasciola flukes from western Java were identified as F. gigantica based on the nucleotide sequences of the nuclear phosphoenolpyruvate carboxykinase (pepck) and DNA polymerase delta (pold) genes. The flukes were then analyzed phylogenetically based on the nucleotide sequence of the mitochondrial NADH dehydrogenase subunit 1 (nad1) gene, together with Fasciola flukes from other Asian countries. All but one F. gigantica fluke were classified in F. gigantica haplogroup C, which mainly contains nad1 haplotypes detected in flukes from Thailand, Vietnam, and China. A population genetic analysis suggested that haplogroup C spread from Thailand to the neighboring countries including Indonesia together with domestic ruminants, such as the swamp buffalo, Bubalus bubalis. The swamp buffalo is one of the important definitive hosts of Fasciola flukes in Indonesia, and is considered to have been domesticated in the north of Thailand. The remaining one fluke displayed a novel nad1 haplotype that has never been detected in the reference countries. Therefore, the origin of the fluke could not be established. No hybrid Fasciola flukes were detected in this study, in contrast to neighboring Asian countries. Copyright © 2016. Published by Elsevier Ireland Ltd.

  20. Considerations in the identification of functional RNA structural elements in genomic alignments

    Directory of Open Access Journals (Sweden)

    Blencowe Benjamin J

    2007-01-01

    Full Text Available Abstract Background Accurate identification of novel, functional noncoding (nc RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in sequence alignments. Neither the algorithms nor the information gained from the individual inputs have been independently assessed. Furthermore, due to issues in modelling background signal, it has been difficult to gauge the precision of these algorithms on a genomic scale, in which even a seemingly small false-positive rate can result in a vast excess of false discoveries. Results We developed a shuffling algorithm, shuffle-pair.pl, that simultaneously preserves dinucleotide frequency, gaps, and local conservation in pairwise sequence alignments. We used shuffle-pair.pl to assess precision and recall of six ncRNA search tools (MSARI, QRNA, ddbRNA, RNAz, Evofold, and several variants of simple thermodynamic stability on a test set of 3046 alignments of known ncRNAs. Relative to mononucleotide shuffling, preservation of dinucleotide content in shuffling the alignments resulted in a drastic increase in estimated false-positive detection rates for ncRNA elements, precluding evaluation of higher order alignments, which cannot not be adequately shuffled maintaining both dinucleotides and alignment structure. On pairwise alignments, none of the covariance-based tools performed markedly better than thermodynamic scoring alone. Although the high false-positive rates call into question the veracity of any individual predicted secondary structural element in our analysis, we nevertheless identified intriguing global trends in human genome alignments. The distribution of ncRNA prediction scores in 75-base windows overlapping UTRs, introns, and intergenic regions analyzed using both thermodynamic stability and EvoFold (which has no thermodynamic component was

  1. A numerical library in Java for scientists and engineers

    CERN Document Server

    Lau, Hang T

    2003-01-01

    At last researchers have an inexpensive library of Java-based numeric procedures for use in scientific computation. The first and only book of its kind, A Numeric Library in Java for Scientists and Engineers is a translation into Java of the library NUMAL (NUMerical procedures in ALgol 60). This groundbreaking text presents procedural descriptions for linear algebra, ordinary and partial differential equations, optimization, parameter estimation, mathematical physics, and other tools that are indispensable to any dynamic research group. The book offers test programs that allow researchers to execute the examples provided; users are free to construct their own tests and apply the numeric procedures to them in order to observe a successful computation or simulate failure. The entry for each procedure is logically presented, with name, usage parameters, and Java code included. This handbook serves as a powerful research tool, enabling the performance of critical computations in Java. It stands as a cost-effi...

  2. Centroid based clustering of high throughput sequencing reads based on n-mer counts.

    Science.gov (United States)

    Solovyov, Alexander; Lipkin, W Ian

    2013-09-08

    Many problems in computational biology require alignment-free sequence comparisons. One of the common tasks involving sequence comparison is sequence clustering. Here we apply methods of alignment-free comparison (in particular, comparison using sequence composition) to the challenge of sequence clustering. We study several centroid based algorithms for clustering sequences based on word counts. Study of their performance shows that using k-means algorithm with or without the data whitening is efficient from the computational point of view. A higher clustering accuracy can be achieved using the soft expectation maximization method, whereby each sequence is attributed to each cluster with a specific probability. We implement an open source tool for alignment-free clustering. It is publicly available from github: https://github.com/luscinius/afcluster. We show the utility of alignment-free sequence clustering for high throughput sequencing analysis despite its limitations. In particular, it allows one to perform assembly with reduced resources and a minimal loss of quality. The major factor affecting performance of alignment-free read clustering is the length of the read.

  3. Practical Analysis of the Dynamic Characteristics of JavaScript

    OpenAIRE

    Wei, Shiyi

    2015-01-01

    JavaScript is a dynamic object-oriented programming language, which is designed with flexible programming mechanisms. JavaScript is widely used in developing sophisticated software systems, especially web applications. Despite of its popularity, there is a lack of software tools that support JavaScript for software engineering clients. Dataflow analysis approximates software behavior by analyzing the program code; it is the foundation for many software tools. However, several unique features...

  4. Java-Based Coupling for Parallel Predictive-Adaptive Domain Decomposition

    Directory of Open Access Journals (Sweden)

    Cécile Germain‐Renaud

    1999-01-01

    Full Text Available Adaptive domain decomposition exemplifies the problem of integrating heterogeneous software components with intermediate coupling granularity. This paper describes an experiment where a data‐parallel (HPF client interfaces with a sequential computation server through Java. We show that seamless integration of data‐parallelism is possible, but requires most of the tools from the Java palette: Java Native Interface (JNI, Remote Method Invocation (RMI, callbacks and threads.

  5. Java facilities in processing XML files - JAXB and generating PDF reports

    Directory of Open Access Journals (Sweden)

    Danut-Octavian SIMION

    2008-01-01

    Full Text Available The paper presents the Java programming language facilities in working with XML files using JAXB (The Java Architecture for XML Binding technology and generating PDF reports from XML files using Java objects. The XML file can be an existing one and could contain the data about an entity (Clients for example or it might be the result of a SELECT-SQL statement. JAXB generates JAVA classes through xs rules and a Marshalling, Unmarshalling compiler. The PDF file is build from a XML file and uses XSL-FO formatting file and a Java ResultSet object.

  6. eShadow: A tool for comparing closely related sequences

    Energy Technology Data Exchange (ETDEWEB)

    Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.

    2004-01-15

    Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualization of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/

  7. Molecular characterization of an Akabane virus isolate from West Java, Indonesia.

    Science.gov (United States)

    Purnomo Edi, Suryo; Ibrahim, Afif; Sukoco, Rinto; Bunali, Lukman; Taguchi, Masaji; Kato, Tomoko; Yanase, Tohru; Shirafuji, Hiroaki

    2017-04-08

    We isolated an arbovirus from bovine blood in Indonesia. The arbovirus was obtained from the plasma of a cow showing no clinical symptoms in West Java in February 2014, and was identified as Akabane virus (AKAV) by AKAV-specific RT-PCR and subsequent sequence analysis. Phylogenetic analysis based on partial S segment indicated the AKAV isolate, WJ-1SA/P/2014, was most closely related with two isolates from Israel and Turkey reported in 2001 and 2015, respectively, and that WJ-1SA/P/2014 isolate belongs to AKAV genogroup Ib. This is the first isolation of AKAV from Indonesia.

  8. Improving model construction of profile HMMs for remote homology detection through structural alignment

    Directory of Open Access Journals (Sweden)

    Zaverucha Gerson

    2007-11-01

    Full Text Available Abstract Background Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.

  9. Improving model construction of profile HMMs for remote homology detection through structural alignment.

    Science.gov (United States)

    Bernardes, Juliana S; Dávila, Alberto M R; Costa, Vítor S; Zaverucha, Gerson

    2007-11-09

    Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.

  10. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    DEFF Research Database (Denmark)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan

    2009-01-01

    MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary...... determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than...

  11. Aligners: the Invisible Corrector-A Boon or Bane.

    Science.gov (United States)

    Mahendra, Lodd

    2018-03-01

    The trend of clinical orthodontics has shown a palpable shift from conventional braces to innovative technologies like invisible aligners. Aligners are sequences of clear trays worn by patients to straighten their teeth. They were envisaged for the main purpose of esthetics, mainly directed toward self-conscious teenagers who otherwise would shy away from essential correction of malocclusion.

  12. Safety-critical Java on a time-predictable processor

    DEFF Research Database (Denmark)

    Korsholm, Stephan E.; Schoeberl, Martin; Puffitsch, Wolfgang

    2015-01-01

    For real-time systems the whole execution stack needs to be time-predictable and analyzable for the worst-case execution time (WCET). This paper presents a time-predictable platform for safety-critical Java. The platform consists of (1) the Patmos processor, which is a time-predictable processor......; (2) a C compiler for Patmos with support for WCET analysis; (3) the HVM, which is a Java-to-C compiler; (4) the HVM-SCJ implementation which supports SCJ Level 0, 1, and 2 (for both single and multicore platforms); and (5) a WCET analysis tool. We show that real-time Java programs translated to C...... and compiled to a Patmos binary can be analyzed by the AbsInt aiT WCET analysis tool. To the best of our knowledge the presented system is the second WCET analyzable real-time Java system; and the first one on top of a RISC processor....

  13. Sawja: Static Analysis Workshop for Java

    Science.gov (United States)

    Hubert, Laurent; Barré, Nicolas; Besson, Frédéric; Demange, Delphine; Jensen, Thomas; Monfort, Vincent; Pichardie, David; Turpin, Tiphaine

    Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when developing a full-fledged analyzer for a realistic language such as Java. Efficiency and precision of such a tool rely partly on low level components which only depend on the syntactic structure of the language and therefore should not be redesigned for each implementation of a new static analysis. This paper describes the Sawja library: a static analysis workshop fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including i) efficient functional data-structures for representing a program with implicit sharing and lazy parsing, ii) an intermediate stack-less representation, and iii) fast computation and manipulation of complete programs. We provide experimental evaluations of the different features with respect to time, memory and precision.

  14. MatrixPlot: visualizing sequence constraints

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole

    1999-01-01

    MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...

  15. The r-Java 2.0 code: nuclear physics

    Science.gov (United States)

    Kostka, M.; Koning, N.; Shand, Z.; Ouyed, R.; Jaikumar, P.

    2014-08-01

    Aims: We present r-Java 2.0, a nucleosynthesis code for open use that performs r-process calculations, along with a suite of other analysis tools. Methods: Equipped with a straightforward graphical user interface, r-Java 2.0 is capable of simulating nuclear statistical equilibrium (NSE), calculating r-process abundances for a wide range of input parameters and astrophysical environments, computing the mass fragmentation from neutron-induced fission and studying individual nucleosynthesis processes. Results: In this paper we discuss enhancements to this version of r-Java, especially the ability to solve the full reaction network. The sophisticated fission methodology incorporated in r-Java 2.0 that includes three fission channels (beta-delayed, neutron-induced, and spontaneous fission), along with computation of the mass fragmentation, is compared to the upper limit on mass fission approximation. The effects of including beta-delayed neutron emission on r-process yield is studied. The role of Coulomb interactions in NSE abundances is shown to be significant, supporting previous findings. A comparative analysis was undertaken during the development of r-Java 2.0 whereby we reproduced the results found in the literature from three other r-process codes. This code is capable of simulating the physical environment of the high-entropy wind around a proto-neutron star, the ejecta from a neutron star merger, or the relativistic ejecta from a quark nova. Likewise the users of r-Java 2.0 are given the freedom to define a custom environment. This software provides a platform for comparing proposed r-process sites.

  16. The Operational Semantics of a Java Secure Processor

    NARCIS (Netherlands)

    Hartel, Pieter H.; Butler, M.J.; Levy, M.; Alves-Foss, J.

    1999-01-01

    A formal specification of a Java Secure Processor is presented, which is mechanically checked for type consistency, well formedness and operational conservativity. The specification is executable and it is used to animate and study the behaviour of sample Java programs. The purpose of the semantics

  17. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin.

    Science.gov (United States)

    Guzzi, Pietro Hiram; Milenković, Tijana

    2017-01-05

    Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  18. Web-based Java application to advanced JT-60 Man-Machine Interfacing System for remote experiments

    International Nuclear Information System (INIS)

    Totsuka, Toshiyuki; Suzuki, Yoshio; Sakata, Shinya; Oshima, Takayuki; Iba, Katsuyuki

    2008-01-01

    Since remote participation in ITER experiments is planned, it is expected to demonstrate that the JT-60SA experiment is controlled from a Japanese remote experiment center located in Rokkasho-mura, Aomori-ken, Japan as a part of the ITER-BA project. Functions required for this experiment are monitoring of the discharge sequence status, handling of the discharge parameter, checking of experiment data, and monitoring of plant data, all of which are included in the existing JT-60 Man-Machine Interfacing System (MMIF). The MMIF is now only available to on-site users at the Naka site due to network safety. The motivation for remote MMIF is prompted by the issue of developing and achieving compatibility with network safety. The Java language has been chosen to implement this task. This paper deals with details of the JT-60 MMIF for the remote experiment that has evolved using the Java language

  19. Safety-critical Java for low-end embedded platforms

    DEFF Research Database (Denmark)

    Søndergaard, Hans; Korsholm, Stephan E.; Ravn, Anders Peter

    2012-01-01

    We present an implementation of the Safety-Critical Java profile (SCJ), targeted for low-end embedded platforms with as little as 16 kB RAM and 256 kB flash. The distinctive features of the implementation are a combination of a lean Java virtual machine (HVM), with a bare metal kernel implementing...... hardware objects, first level interrupt handlers, and native variables, and an infrastructure written in Java which is minimized through program specialization. The HVM allows the implementation to be easily ported to embedded platforms which have a C compiler as part of the development environment...

  20. MATLAB-Like Scripting of Java Scientific Libraries in ScalaLab

    Directory of Open Access Journals (Sweden)

    Stergios Papadimitriou

    2014-01-01

    Full Text Available Although there are a lot of robust and effective scientific libraries in Java, the utilization of these libraries in pure Java is difficult and cumbersome, especially for the average scientist that does not expertise in software development. We illustrate that ScalaLab presents an easier and productive MATLAB like front end. Also, the main strengths and weaknesses of the core Java libraries of ScalaLab are elaborated. Since performance is of paramount importance for scientific computation, the article discusses extensively performance aspects of the ScalaLab environment. Also, Java bytecode performance is compared to native code.