stacked sequence alignment: Topics by WorldWideScience.org

Sample records for stacked sequence alignment

MSuPDA: A Memory Efficient Algorithm for Sequence Alignment.

Science.gov (United States)

Khan, Mohammad Ibrahim; Kamal, Md Sarwar; Chowdhury, Linkon

2016-03-01

Space complexity is a million dollar question in DNA sequence alignments. In this regard, memory saving under pushdown automata can help to reduce the occupied spaces in computer memory. Our proposed process is that anchor seed (AS) will be selected from given data set of nucleotide base pairs for local sequence alignment. Quick splitting techniques will separate the AS from all the DNA genome segments. Selected AS will be placed to pushdown automata's (PDA) input unit. Whole DNA genome segments will be placed into PDA's stack. AS from input unit will be matched with the DNA genome segments from stack of PDA. Match, mismatch and indel of nucleotides will be popped from the stack under the control unit of pushdown automata. During the POP operation on stack, it will free the memory cell occupied by the nucleotide base pair.
AlignMe—a membrane protein sequence alignment web server

Science.gov (United States)

Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

2014-01-01

We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425
Pairwise Sequence Alignment Library

Energy Technology Data Exchange (ETDEWEB)

2015-05-20

Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.
Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment

Directory of Open Access Journals (Sweden)

Daniels Noah M

2012-10-01

Full Text Available Abstract Background The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. Results We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. Conclusions Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.
High-frequency self-aligned graphene transistors with transferred gate stacks

Science.gov (United States)

Cheng, Rui; Bai, Jingwei; Liao, Lei; Zhou, Hailong; Chen, Yu; Liu, Lixin; Lin, Yung-Chen; Jiang, Shan; Huang, Yu; Duan, Xiangfeng

2012-01-01

Graphene has attracted enormous attention for radio-frequency transistor applications because of its exceptional high carrier mobility, high carrier saturation velocity, and large critical current density. Herein we report a new approach for the scalable fabrication of high-performance graphene transistors with transferred gate stacks. Specifically, arrays of gate stacks are first patterned on a sacrificial substrate, and then transferred onto arbitrary substrates with graphene on top. A self-aligned process, enabled by the unique structure of the transferred gate stacks, is then used to position precisely the source and drain electrodes with minimized access resistance or parasitic capacitance. This process has therefore enabled scalable fabrication of self-aligned graphene transistors with unprecedented performance including a record-high cutoff frequency up to 427 GHz. Our study defines a unique pathway to large-scale fabrication of high-performance graphene transistors, and holds significant potential for future application of graphene-based devices in ultra–high-frequency circuits. PMID:22753503
ABS: Sequence alignment by scanning

KAUST Repository

Bonny, Mohamed Talal; Salama, Khaled N.

2011-01-01

Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the 'Needleman-Wunsch' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.
ABS: Sequence alignment by scanning

KAUST Repository

Bonny, Mohamed Talal

2011-08-01

Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.
Fast global sequence alignment technique

KAUST Repository

Bonny, Mohamed Talal

2011-11-01

Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.
Fast global sequence alignment technique

KAUST Repository

Bonny, Mohamed Talal; Salama, Khaled N.

2011-01-01

fast alignment algorithm, called 'Alignment By Scanning' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the 'GAP' (which is heuristic) and the 'Needleman
Ancestral sequence alignment under optimal conditions

Directory of Open Access Journals (Sweden)

Brown Daniel G

2005-11-01

Full Text Available Abstract Background Multiple genome alignment is an important problem in bioinformatics. An important subproblem used by many multiple alignment approaches is that of aligning two multiple alignments. Many popular alignment algorithms for DNA use the sum-of-pairs heuristic, where the score of a multiple alignment is the sum of its induced pairwise alignment scores. However, the biological meaning of the sum-of-pairs of pairs heuristic is not obvious. Additionally, many algorithms based on the sum-of-pairs heuristic are complicated and slow, compared to pairwise alignment algorithms. An alternative approach to aligning alignments is to first infer ancestral sequences for each alignment, and then align the two ancestral sequences. In addition to being fast, this method has a clear biological basis that takes into account the evolution implied by an underlying phylogenetic tree. In this study we explore the accuracy of aligning alignments by ancestral sequence alignment. We examine the use of both maximum likelihood and parsimony to infer ancestral sequences. Additionally, we investigate the effect on accuracy of allowing ambiguity in our ancestral sequences. Results We use synthetic sequence data that we generate by simulating evolution on a phylogenetic tree. We use two different types of phylogenetic trees: trees with a period of rapid growth followed by a period of slow growth, and trees with a period of slow growth followed by a period of rapid growth. We examine the alignment accuracy of four ancestral sequence reconstruction and alignment methods: parsimony, maximum likelihood, ambiguous parsimony, and ambiguous maximum likelihood. Additionally, we compare against the alignment accuracy of two sum-of-pairs algorithms: ClustalW and the heuristic of Ma, Zhang, and Wang. Conclusion We find that allowing ambiguity in ancestral sequences does not lead to better multiple alignments. Regardless of whether we use parsimony or maximum likelihood, the
Adaptive Processing for Sequence Alignment

KAUST Repository

Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

2012-01-01

Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.
Adaptive Processing for Sequence Alignment

KAUST Repository

Zidan, Mohammed A.

2012-01-26

Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.
Progressive multiple sequence alignments from triplets

Directory of Open Access Journals (Sweden)

Stadler Peter F

2007-07-01

Full Text Available Abstract Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mismatch scores.
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

Science.gov (United States)

Martin, Andrew C R

2014-01-01

The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.
Alignment of Memory Transfers of a Time-Predictable Stack Cache

DEFF Research Database (Denmark)

Abbaspourseyedi, Sahar; Brandner, Florian

2014-01-01

of complex cache states. Instead, only the occupancy level of the cache has to be determined. The memory transfers generated by the standard stack cache are not generally aligned. These unaligned accesses risk to introduce complexity to the otherwise simple WCET analysis. In this work, we investigate three...
Comparative genomics beyond sequence-based alignments

DEFF Research Database (Denmark)

Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.

2008-01-01

Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...
Heuristics for multiobjective multiple sequence alignment.

Science.gov (United States)

Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

2016-07-15

Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show
High-throughput sequence alignment using Graphics Processing Units

Directory of Open Access Journals (Sweden)

Trapnell Cole

2007-12-01

Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.
Spatio-temporal alignment of pedobarographic image sequences.

Science.gov (United States)

Oliveira, Francisco P M; Sousa, Andreia; Santos, Rubim; Tavares, João Manuel R S

2011-07-01

This article presents a methodology to align plantar pressure image sequences simultaneously in time and space. The spatial position and orientation of a foot in a sequence are changed to match the foot represented in a second sequence. Simultaneously with the spatial alignment, the temporal scale of the first sequence is transformed with the aim of synchronizing the two input footsteps. Consequently, the spatial correspondence of the foot regions along the sequences as well as the temporal synchronizing is automatically attained, making the study easier and more straightforward. In terms of spatial alignment, the methodology can use one of four possible geometric transformation models: rigid, similarity, affine, or projective. In the temporal alignment, a polynomial transformation up to the 4th degree can be adopted in order to model linear and curved time behaviors. Suitable geometric and temporal transformations are found by minimizing the mean squared error (MSE) between the input sequences. The methodology was tested on a set of real image sequences acquired from a common pedobarographic device. When used in experimental cases generated by applying geometric and temporal control transformations, the methodology revealed high accuracy. In addition, the intra-subject alignment tests from real plantar pressure image sequences showed that the curved temporal models produced better MSE results (P alignment of pedobarographic image data, since previous methods can only be applied on static images.
QUASAR--scoring and ranking of sequence-structure alignments.

Science.gov (United States)

Birzele, Fabian; Gewehr, Jan E; Zimmer, Ralf

2005-12-15

Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.

Energy level alignment in Au/pentacene/PTCDA trilayer stacks

OpenAIRE

Sehati, P.; Braun, S.; Fahlman, M.

2013-01-01

Ultraviolet photoelectron spectroscopy is used to investigate the energy level alignment and molecular orientation at the interfaces in Au/pentacene/PTCDA trilayer stacks. We deduced a standing orientation for pentacene grown on Au while we conclude a flat lying geometry for PTCDA grown onto pentacene. We propose that the rough surface of polycrystalline Au induces the standing geometry in pentacene. It is further shown that in situ deposition of PTCDA on pentacene can influence the orientati...
Optimization of sequence alignment for simple sequence repeat regions

Directory of Open Access Journals (Sweden)

Ogbonnaya Francis C

2011-07-01

Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic
RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

DEFF Research Database (Denmark)

Wernersson, Rasmus; Pedersen, Anders Gorm

2003-01-01

The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA...
Genomic multiple sequence alignments: refinement using a genetic algorithm

Directory of Open Access Journals (Sweden)

Lefkowitz Elliot J

2005-08-01

Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only
MANGO: a new approach to multiple sequence alignment.

Science.gov (United States)

Zhang, Zefeng; Lin, Hao; Li, Ming

2007-01-01

Multiple sequence alignment is a classical and challenging task for biological sequence analysis. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state of the art multiple sequence alignment programs suffer from the 'once a gap, always a gap' phenomenon. Is there a radically new way to do multiple sequence alignment? This paper introduces a novel and orthogonal multiple sequence alignment method, using multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds are provably significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks showing that MANGO compares favorably, in both accuracy and speed, against state-of-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, Prob-ConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0 and Kalign 2.0.
Hardware Accelerated Sequence Alignment with Traceback

Directory of Open Access Journals (Sweden)

Scott Lloyd

2009-01-01

in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.
Alignment-Annotator web server: rendering and annotating sequence alignments.

Science.gov (United States)

Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

2014-07-01

Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Spreadsheet macros for coloring sequence alignments.

Science.gov (United States)

Haygood, M G

1993-12-01

This article describes a set of Microsoft Excel macros designed to color amino acid and nucleotide sequence alignments for review and preparation of visual aids. The colored alignments can then be modified to emphasize features of interest. Procedures for importing and coloring sequences are described. The macro file adds a new menu to the menu bar containing sequence-related commands to enable users unfamiliar with Excel to use the macros more readily. The macros were designed for use with Macintosh computers but will also run with the DOS version of Excel.
Multiple sequence alignment accuracy and phylogenetic inference.

Science.gov (United States)

Ogden, T Heath; Rosenberg, Michael S

2006-04-01

Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.
AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

Directory of Open Access Journals (Sweden)

Claros M Gonzalo

2010-06-01

Full Text Available Abstract Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used
Using structure to explore the sequence alignment space of remote homologs.

Science.gov (United States)

Kuziemko, Andrew; Honig, Barry; Petrey, Donald

2011-10-01

Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
Using structure to explore the sequence alignment space of remote homologs.

Directory of Open Access Journals (Sweden)

Andrew Kuziemko

2011-10-01

Full Text Available Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
Sequence embedding for fast construction of guide trees for multiple sequence alignment

LENUS (Irish Health Repository)

Blackshields, Gordon

2010-05-14

Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.
Analysis and prediction of stacking sequences in intercalated lamellar vanadium phosphates

Energy Technology Data Exchange (ETDEWEB)

Gautier, Romain [Institut des Sciences Chimiques de Rennes, UMR 6226 CNRS - Ecole Nationale Superieure de Chimie de Rennes (France); Centre Nationale de la Recherche Scientifique (CNRS), Institut des Materiaux Jean Rouxel (IMN), Universite de Nantes (France); Fourre, Yoann; Furet, Eric; Gautier, Regis; Le Fur, Eric [Institut des Sciences Chimiques de Rennes, UMR 6226 CNRS - Ecole Nationale Superieure de Chimie de Rennes (France)

2015-04-15

An approach is presented that enables the analysis and prediction of stacking sequences in intercalated lamellar vanadium phosphates. A comparison of previously reported vanadium phosphates reveals two modes of intercalation: (i) 3d transition metal ions intercalated between VOPO{sub 4} layers and (ii) alkali/alkaline earth metal ions between VOPO{sub 4}.H{sub 2}O layers. Both intercalations were investigated using DFT calculations in order to understand the relative shifts of the vanadium phosphate layers. These calculations in addition to an analysis of the stacking sequences in previously reported materials enable the prediction of the crystal structures of M{sub x}(VOPO{sub 4}).yH{sub 2}O (M = Cs{sup +}, Cd{sup 2+} and Sn{sup 2+}). Experimental realization and structural determination of Cd(VOPO{sub 4}){sub 2}.4H{sub 2}O by single-crystal X-ray diffraction confirmed the predicted stacking sequences. (Copyright copyright 2015 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim)
Effects of stacking sequence on impact damage resistance and residual strength for quasi-isotropic laminates

Science.gov (United States)

Dost, Ernest F.; Ilcewicz, Larry B.; Avery, William B.; Coxon, Brian R.

1991-01-01

Residual strength of an impacted composite laminate is dependent on details of the damage state. Stacking sequence was varied to judge its effect on damage caused by low-velocity impact. This was done for quasi-isotropic layups of a toughened composite material. Experimental observations on changes in the impact damage state and postimpact compressive performance were presented for seven different laminate stacking sequences. The applicability and limitations of analysis compared to experimental results were also discussed. Postimpact compressive behavior was found to be a strong function of the laminate stacking sequence. This relationship was found to depend on thickness, stacking sequence, size, and location of sublaminates that comprise the impact damage state. The postimpact strength for specimens with a relatively symmetric distribution of damage through the laminate thickness was accurately predicted by models that accounted for sublaminate stability and in-plane stress redistribution. An asymmetric distribution of damage in some laminate stacking sequences tended to alter specimen stability. Geometrically nonlinear finite element analysis was used to predict this behavior.
DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules.

Science.gov (United States)

Eernisse, D J

1992-04-01

DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.
CAFE: aCcelerated Alignment-FrEe sequence analysis.

Science.gov (United States)

Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

2017-07-03

Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Image correlation method for DNA sequence alignment.

Science.gov (United States)

Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

2012-01-01

The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.
Differential evolution-simulated annealing for multiple sequence alignment

Science.gov (United States)

Addawe, R. C.; Addawe, J. M.; Sueño, M. R. K.; Magadia, J. C.

2017-10-01

Multiple sequence alignments (MSA) are used in the analysis of molecular evolution and sequence structure relationships. In this paper, a hybrid algorithm, Differential Evolution - Simulated Annealing (DESA) is applied in optimizing multiple sequence alignments (MSAs) based on structural information, non-gaps percentage and totally conserved columns. DESA is a robust algorithm characterized by self-organization, mutation, crossover, and SA-like selection scheme of the strategy parameters. Here, the MSA problem is treated as a multi-objective optimization problem of the hybrid evolutionary algorithm, DESA. Thus, we name the algorithm as DESA-MSA. Simulated sequences and alignments were generated to evaluate the accuracy and efficiency of DESA-MSA using different indel sizes, sequence lengths, deletion rates and insertion rates. The proposed hybrid algorithm obtained acceptable solutions particularly for the MSA problem evaluated based on the three objectives.
Wurtzite/zinc-blende electronic-band alignment in basal-plane stacking faults in semi-polar GaN

Science.gov (United States)

Monavarian, Morteza; Hafiz, Shopan; Izyumskaya, Natalia; Das, Saikat; Özgür, Ümit; Morkoç, Hadis; Avrutin, Vitaliy

2016-02-01

Heteroepitaxial semipolar and nonpolar GaN layers often suffer from high densities of extended defects including basal plane stacking faults (BSFs). BSFs which are considered as inclusions of cubic zinc-blende phase in wurtzite matrix act as quantum wells strongly affecting device performance. Band alignment in BSFs has been discussed as type of band alignment at the wurtzite/zinc blende interface governs the response in differential transmission; fast decay after the pulse followed by slow recovery due to spatial splitting of electrons and heavy holes for type- II band alignment in contrast to decay with no recovery in case of type I band alignment. Based on the results, band alignment is demonstrated to be of type II in zinc-blende segments in wurtzite matrix as in BSFs.

CodonLogo: a sequence logo-based viewer for codon patterns.

Science.gov (United States)

Sharma, Virag; Murphy, David P; Provan, Gregory; Baranov, Pavel V

2012-07-15

Conserved patterns across a multiple sequence alignment can be visualized by generating sequence logos. Sequence logos show each column in the alignment as stacks of symbol(s) where the height of a stack is proportional to its informational content, whereas the height of each symbol within the stack is proportional to its frequency in the column. Sequence logos use symbols of either nucleotide or amino acid alphabets. However, certain regulatory signals in messenger RNA (mRNA) act as combinations of codons. Yet no tool is available for visualization of conserved codon patterns. We present the first application which allows visualization of conserved regions in a multiple sequence alignment in the context of codons. CodonLogo is based on WebLogo3 and uses the same heuristics but treats codons as inseparable units of a 64-letter alphabet. CodonLogo can discriminate patterns of codon conservation from patterns of nucleotide conservation that appear indistinguishable in standard sequence logos. The CodonLogo source code and its implementation (in a local version of the Galaxy Browser) are available at http://recode.ucc.ie/CodonLogo and through the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/.
Enhanced spatio-temporal alignment of plantar pressure image sequences using B-splines.

Science.gov (United States)

Oliveira, Francisco P M; Tavares, João Manuel R S

2013-03-01

This article presents an enhanced methodology to align plantar pressure image sequences simultaneously in time and space. The temporal alignment of the sequences is accomplished using B-splines in the time modeling, and the spatial alignment can be attained using several geometric transformation models. The methodology was tested on a dataset of 156 real plantar pressure image sequences (3 sequences for each foot of the 26 subjects) that was acquired using a common commercial plate during barefoot walking. In the alignment of image sequences that were synthetically deformed both in time and space, an outstanding accuracy was achieved with the cubic B-splines. This accuracy was significantly better (p align real image sequences with unknown transformation involved, the alignment based on cubic B-splines also achieved superior results than our previous methodology (p alignment on the dynamic center of pressure (COP) displacement was also assessed by computing the intraclass correlation coefficients (ICC) before and after the temporal alignment of the three image sequence trials of each foot of the associated subject at six time instants. The results showed that, generally, the ICCs related to the medio-lateral COP displacement were greater when the sequences were temporally aligned than the ICCs of the original sequences. Based on the experimental findings, one can conclude that the cubic B-splines are a remarkable solution for the temporal alignment of plantar pressure image sequences. These findings also show that the temporal alignment can increase the consistency of the COP displacement on related acquired plantar pressure image sequences.
Pareto optimal pairwise sequence alignment.

Science.gov (United States)

DeRonne, Kevin W; Karypis, George

2013-01-01

Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.
An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

KAUST Repository

Bonny, Talal; Salama, Khaled N.; Zidan, Mohammed A.

2012-01-01

Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we
Sequence alignment visualization in HTML5 without Java.

Science.gov (United States)

Gille, Christoph; Birgit, Weyand; Gille, Andreas

2014-01-01

Java has been extensively used for the visualization of biological data in the web. However, the Java runtime environment is an additional layer of software with an own set of technical problems and security risks. HTML in its new version 5 provides features that for some tasks may render Java unnecessary. Alignment-To-HTML is the first HTML-based interactive visualization for annotated multiple sequence alignments. The server side script interpreter can perform all tasks like (i) sequence retrieval, (ii) alignment computation, (iii) rendering, (iv) identification of a homologous structural models and (v) communication with BioDAS-servers. The rendered alignment can be included in web pages and is displayed in all browsers on all platforms including touch screen tablets. The functionality of the user interface is similar to legacy Java applets and includes color schemes, highlighting of conserved and variable alignment positions, row reordering by drag and drop, interlinked 3D visualization and sequence groups. Novel features are (i) support for multiple overlapping residue annotations, such as chemical modifications, single nucleotide polymorphisms and mutations, (ii) mechanisms to quickly hide residue annotations, (iii) export to MS-Word and (iv) sequence icons. Alignment-To-HTML, the first interactive alignment visualization that runs in web browsers without additional software, confirms that to some extend HTML5 is already sufficient to display complex biological data. The low speed at which programs are executed in browsers is still the main obstacle. Nevertheless, we envision an increased use of HTML and JavaScript for interactive biological software. Under GPL at: http://www.bioinformatics.org/strap/toHTML/.
SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.

Science.gov (United States)

Wala, Jeremiah; Beroukhim, Rameen

2017-03-01

We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. jwala@broadinstitue.org ; rameen@broadinstitute.org. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
GapMis: a tool for pairwise sequence alignment with a single gap.

Science.gov (United States)

Flouri, Tomás; Frousios, Kimon; Iliopoulos, Costas S; Park, Kunsoo; Pissis, Solon P; Tischler, German

2013-08-01

Pairwise sequence alignment has received a new motivation due to the advent of recent patents in next-generation sequencing technologies, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important problem is to find the alignment between a relatively short succeeding factor of the reference sequence and the remaining low-quality part of the read allowing a number of mismatches and the insertion of a single gap in the alignment. We present GapMis, a tool for pairwise sequence alignment with a single gap. It is based on a simple algorithm, which computes a different version of the traditional dynamic programming matrix. The presented experimental results demonstrate that GapMis is more suitable and efficient than most popular tools for this task.
Statistical distributions of optimal global alignment scores of random protein sequences

Directory of Open Access Journals (Sweden)

Tang Jiaowei

2005-10-01

Full Text Available Abstract Background The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. Results In this study, random and real but unrelated sequences prepared in six different ways were selected as reference datasets to obtain their respective statistical distributions of global alignment scores. All alignments were carried out with the Needleman-Wunsch algorithm and optimal scores were fitted to the Gumbel, normal and gamma distributions respectively. The three-parameter gamma distribution performs the best as the theoretical distribution function of global alignment scores, as it agrees perfectly well with the distribution of alignment scores. The normal distribution also agrees well with the score distribution frequencies when the shape parameter of the gamma distribution is sufficiently large, for this is the scenario when the normal distribution can be viewed as an approximation of the gamma distribution. Conclusion We have shown that the optimal global alignment scores of random protein sequences fit the three-parameter gamma distribution function. This would be useful for the inference of homology between sequences whose relationship is unknown, through the evaluation of gamma distribution significance between sequences.
Spreadsheet-based program for alignment of overlapping DNA sequences.

Science.gov (United States)

Anbazhagan, R; Gabrielson, E

1999-06-01

Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.
SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

Science.gov (United States)

Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

2012-07-15

In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.
IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments.

Science.gov (United States)

Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Naeem; Naveed, Nasir; Ahmad, Sarfraz; Muhammad, Shah; Qadri, Salman; Shahid, Muhammad; Hussain, Tanveer; Javed, Maryam

2015-01-01

IVisTMSA is a software package of seven graphical tools for multiple sequence alignments. MSApad is an editing and analysis tool. It can load 409% more data than Jalview, STRAP, CINEMA, and Base-by-Base. MSA comparator allows the user to visualize consistent and inconsistent regions of reference and test alignments of more than 21-MB size in less than 12 seconds. MSA comparator is 5,200% efficient and more than 40% efficient as compared to BALiBASE c program and FastSP, respectively. MSA reconstruction tool provides graphical user interfaces for four popular aligners and allows the user to load several sequence files at a time. FASTA generator converts seven formats of alignments of unlimited size into FASTA format in a few seconds. MSA ID calculator calculates identity matrix of more than 11,000 sequences with a sequence length of 2,696 base pairs in less than 100 seconds. Tree and Distance Matrix calculation tools generate phylogenetic tree and distance matrix, respectively, using neighbor joining% identity and BLOSUM 62 matrix.
Influence of the stacking sequence of layers on the mechanical behavior of polymeric composite cylinders

International Nuclear Information System (INIS)

Carvalho, Osni de

2006-01-01

This work evaluated experimentally the influence of the stacking sequence of layers symmetrical and asymmetrical on the mechanical behavior of polymeric composite cylinders. For so much, two open-ended cylinders groups were manufactured by filament winding process, which had different stacking sequence related to the laminate midplane, characterizing symmetrical and asymmetrical laminates. The composite cylinders were made with epoxy matrix and carbon fiber as reinforcement. For evaluation of the mechanical strength, the cylinders were tested hydrostatically, which consisted of internal pressurization in a hydrostatic device through the utilization of a fluid until the cylinders burst. Additionally, were compared the strains and failure modes between the cylinders groups. The utilization of a finite element program allowed to conclude that this tool, very used in design, does not get to identify tensions in the fiber direction in each composite layer, as well as interlaminar shear stress, that appears in the cylinders with asymmetrical stacking sequence. The tests results showed that the stacking sequence had influence in the mechanical behavior of the composite cylinders, favoring the symmetrical construction. (author)
OPTIMIZATION OF PLY STACKING SEQUENCE OF COMPOSITE DRIVE SHAFT USING PARTICLE SWARM ALGORITHM

Directory of Open Access Journals (Sweden)

CHANNAKESHAVA K. R.

2011-06-01

Full Text Available In this paper an attempt has been made to optimize ply stacking sequence of single piece E-Glass/Epoxy and Boron /Epoxy composite drive shafts using Particle swarm algorithm (PSA. PSA is a population based evolutionary stochastic optimization technique which is a resent heuristic search method, where mechanics are inspired by swarming or collaborative behavior of biological population. PSA programme is developed to optimize the ply stacking sequence with an objective of weight minimization by considering design constraints as torque transmission capacity, fundamental natural frequency, lateral vibration and torsional buckling strength having number of laminates, ply thickness and stacking sequence as design variables. The weight savings of the E-Glass/epoxy and Boron /Epoxy shaft from PAS were 51% and 85 % of the steel shaft respectively. The optimum results of PSA obtained are compared with results of genetic algorithm (GA results and found that PSA yields better results than GA.
Effects of stacking sequence on fracture mechanisms in quasi-isotropic Carbon/epoxy laminates under tensile loading

International Nuclear Information System (INIS)

Hessabi, Z. R.; Majidi, B.; Aghazadeh, J.

2006-01-01

The progress of damage in quasi-isotropic carbon/epoxy laminates under tensile loading has been Investigated microscopically. One significant mode of failure in laminated composites is delamination initiating at free edges. The interlaminar stress in the boundary ply along the free edges of a laminated composite is the main factor to cause delamination. The laminate stacking sequence affects the interlaminar stress distribution and consequently may change the mode of failure. It is of design importance to determine a suitable criterion based on stress analysis to obtain the best stacking sequence. In the present work, tensile properties of six samples with different stacking sequences have been examined. Results showed that stress analysis at distance very close to the free edges is a suitable criterion to predict the initiation of delamination and the stacking sequence of [90/45/0/-45] s , has the highest strength among the others. Furthermore finite element analysis showed that the adjacent ±45 plies cause premature delamination during tensile loading
DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

Directory of Open Access Journals (Sweden)

Kaufmann Michael

2004-09-01

Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

Science.gov (United States)

Nagar, Anurag; Hahsler, Michael

2013-01-01

Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to
Bellerophon: a program to detect chimeric sequences in multiple sequence alignments.

Science.gov (United States)

Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

2004-09-22

Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl
Recognition of depositional sequences and stacking patterns, Late Devonian (Frasnian) carbonate platforms, Alberta basin

Energy Technology Data Exchange (ETDEWEB)

Anderson, J.H.; Reeckmann, S.A.; Sarg, J.F.; Greenlee, S.M.

1987-05-01

Six depositional sequences bounded by regional unconformities or their correlative equivalents (sequence boundaries) have been recognized in Late Devonian (Frasnian) carbonate platforms in the Alberta basin. These sequences consist of a predictable vertical succession of smaller scale shoaling-upward cycles (parasequences). Parasequences are arranged in retrogradational, aggradational, and progradational stacking patterns that can be modeled as a sediment response to relative changes in sea level. Sequence boundaries are recognized by onlap onto underlying shelf or shelf margin strata. This onlap includes shelf margin wedges and deep marine onlap. In outcrop sections shelf margin wedges exhibit an abrupt juxtaposition of shallow water facies over deeper water deposits with no gradational facies changes at the boundaries. High on the platform, subaerial exposure fabrics may be present. The shelf margin wedges are interpreted to have formed during lowstands in sea level and typically exhibit an aggradational stacking pattern. On the platform, two types of sequences are recognized. A type 1 cycle occurs where the sequence boundary is overlain by a flooding surface and subsequent parasequences exhibit retrogradational stacking. In a type 2 cycle the sequence boundary is overlain by an aggradational package of shallow water parasequences, followed by a retrogradational package. These two types of sequences can be modeled using a sinusoidal eustatic sea level curve superimposed on thermo-tectonic subsidence.
GROUPING WEB ACCESS SEQUENCES uSING SEQUENCE ALIGNMENT METHOD

OpenAIRE

BHUPENDRA S CHORDIA; KRISHNAKANT P ADHIYA

2011-01-01

In web usage mining grouping of web access sequences can be used to determine the behavior or intent of a set of users. Grouping websessions is how to measure the similarity between web sessions. There are many shortcomings in traditional measurement methods. The taskof grouping web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-groupsimilarity is done using sequence alignment method. This paper introduces a new method to group we...
DIALIGN: multiple DNA and protein sequence alignment at BiBiServ.

OpenAIRE

Morgenstern, Burkhard

2004-01-01

DIALIGN is a widely used software tool for multiple DNA and protein sequence alignment. The program combines local and global alignment features and can therefore be applied to sequence data that cannot be correctly aligned by more traditional approaches. DIALIGN is available online through Bielefeld Bioinformatics Server (BiBiServ). The downloadable version of the program offers several new program features. To compare the output of different alignment programs, we developed the program AltA...

Sequence-controlled polymerization guided by aryl-fluoroaryl π-stacking

KAUST Repository

Mugemana, Clement; Almahdali, Sarah; Rodionov, Valentin

2014-01-01

The ability to control monomer sequences is essential in macromolecular chemistry. Better sequence control leads to better control over macromolecular folding and self-assembly, which, in turn, would enable control over bulk properties (such as thermal behavior, conductivity and rigidity), as well as mimicking the properties of globular proteins. Here, we present a three-part synopsis of recent advances in research on sequence-controlled polymerization guided by aryl-perfluoroaryl π-π stacking of monomer pairs. We also show that for monomers that are capable of strong associative interactions, the classical reactivity ratio analysis based on Fineman-Ross/terminal reactivity models may lead to an imprecise determination of the monomer alternation mode. © 2014 American Chemical Society.
Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

KAUST Repository

Odat, Enas M.

2011-05-01

The purpose of this dissertation is to present a methodology to model global sequence alignment problem as directed acyclic graph which helps to extract all possible optimal alignments. Moreover, a mechanism to sequentially optimize sequence alignment problem relative to different cost functions is suggested. Sequence alignment is mostly important in computational biology. It is used to find evolutionary relationships between biological sequences. There are many algo- rithms that have been developed to solve this problem. The most famous algorithms are Needleman-Wunsch and Smith-Waterman that are based on dynamic program- ming. In dynamic programming, problem is divided into a set of overlapping sub- problems and then the solution of each subproblem is found. Finally, the solutions to these subproblems are combined into a final solution. In this thesis it has been proved that for two sequences of length m and n over a fixed alphabet, the suggested optimization procedure requires O(mn) arithmetic operations per cost function on a single processor machine. The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.
BarraCUDA - a fast short read sequence aligner using graphics processing units

Directory of Open Access Journals (Sweden)

Klus Petr

2012-01-01

Full Text Available Abstract Background With the maturation of next-generation DNA sequencing (NGS technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU, extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net
BarraCUDA - a fast short read sequence aligner using graphics processing units

LENUS (Irish Health Repository)

Klus, Petr

2012-01-13

Abstract Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http:\\/\\/seqbarracuda.sf.net
CHSalign: A Web Server That Builds upon Junction-Explorer and RNAJAG for Pairwise Alignment of RNA Secondary Structures with Coaxial Helical Stacking.

Directory of Open Access Journals (Sweden)

Lei Hua

Full Text Available RNA junctions are important structural elements of RNA molecules. They are formed when three or more helices come together in three-dimensional space. Recent studies have focused on the annotation and prediction of coaxial helical stacking (CHS motifs within junctions. Here we exploit such predictions to develop an efficient alignment tool to handle RNA secondary structures with CHS motifs. Specifically, we build upon our Junction-Explorer software for predicting coaxial stacking and RNAJAG for modelling junction topologies as tree graphs to incorporate constrained tree matching and dynamic programming algorithms into a new method, called CHSalign, for aligning the secondary structures of RNA molecules containing CHS motifs. Thus, CHSalign is intended to be an efficient alignment tool for RNAs containing similar junctions. Experimental results based on thousands of alignments demonstrate that CHSalign can align two RNA secondary structures containing CHS motifs more accurately than other RNA secondary structure alignment tools. CHSalign yields a high score when aligning two RNA secondary structures with similar CHS motifs or helical arrangement patterns, and a low score otherwise. This new method has been implemented in a web server, and the program is also made freely available, at http://bioinformatics.njit.edu/CHSalign/.
RNA-Pareto: interactive analysis of Pareto-optimal RNA sequence-structure alignments.

Science.gov (United States)

Schnattinger, Thomas; Schöning, Uwe; Marchfelder, Anita; Kestler, Hans A

2013-12-01

Incorporating secondary structure information into the alignment process improves the quality of RNA sequence alignments. Instead of using fixed weighting parameters, sequence and structure components can be treated as different objectives and optimized simultaneously. The result is not a single, but a Pareto-set of equally optimal solutions, which all represent different possible weighting parameters. We now provide the interactive graphical software tool RNA-Pareto, which allows a direct inspection of all feasible results to the pairwise RNA sequence-structure alignment problem and greatly facilitates the exploration of the optimal solution set.
MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Science.gov (United States)

Edgar, Robert C

2004-01-01

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
A time warping approach to multiple sequence alignment.

Science.gov (United States)

Arribas-Gil, Ana; Matias, Catherine

2017-04-25

We propose an approach for multiple sequence alignment (MSA) derived from the dynamic time warping viewpoint and recent techniques of curve synchronization developed in the context of functional data analysis. Starting from pairwise alignments of all the sequences (viewed as paths in a certain space), we construct a median path that represents the MSA we are looking for. We establish a proof of concept that our method could be an interesting ingredient to include into refined MSA techniques. We present a simple synthetic experiment as well as the study of a benchmark dataset, together with comparisons with 2 widely used MSA softwares.
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

Science.gov (United States)

Daily, Jeff

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

Directory of Open Access Journals (Sweden)

Sven Warris

Full Text Available To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis.With the Parallel SW Alignment Software (PaSWAS it is possible (a to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs to perform high-speed sequence alignments, and (b retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1 tag recovery in next generation sequence data and (2 isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

Science.gov (United States)

Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

2015-01-01

To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

Science.gov (United States)

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-08-01

RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
Efficient alignment of pyrosequencing reads for re-sequencing applications

Directory of Open Access Journals (Sweden)

Russo Luis MS

2011-05-01

Full Text Available Abstract Background Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects. Results We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454 system against a reference sequence. Our approach explores the characteristics of the data in these re-sequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time. Conclusions The proposed methodology was implemented in a software tool called TAPyR--Tool for the Alignment of Pyrosequencing Reads--which is publicly available from http://www.tapyr.net.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Science.gov (United States)

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-01-01

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

Science.gov (United States)

Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

2018-05-03

Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Noisy: Identification of problematic columns in multiple sequence alignments

Directory of Open Access Journals (Sweden)

Grünewald Stefan

2008-06-01

Full Text Available Abstract Motivation Sequence-based methods for phylogenetic reconstruction from (nucleic acid sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i phylogenetically informative and (ii effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. Results We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. Software The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1 the average bootstrap support obtained from the original alignment is low, and (2 there are sufficiently many taxa in the data set – at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.
PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

Science.gov (United States)

Kuznetsov, Igor B; McDuffie, Michael

2015-05-07

Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities. PR2ALIGN is a stand-alone software program and a web-server that provide the functionality for implementing flexible user-specified alignment scoring functions and aligning pairs of amino acid sequences based on the comparison of the profiles of biochemical properties of these sequences. Unlike the conventional sequence alignment methods that use 20x20 fixed amino acid substitution matrices, PR2ALIGN uses a set of weighted biochemical properties of amino acids to measure the distance between pairs of aligned residues and to find an optimal minimal distance global alignment. The user can provide any number of amino acid properties and specify a weight for each property. The higher the weight for a given property, the more this property affects the final alignment. We show that in many cases the approach implemented in PR2ALIGN produces better quality pair-wise alignments than the conventional matrix-based approach. PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix. To the best of the authors' knowledge, there are no existing stand-alone software programs or web-servers analogous to PR2ALIGN. The software is freely available from http://pr2align.rit.albany.edu.
Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

DEFF Research Database (Denmark)

Havgaard, Jakob Hull; Lyngsø, Rune B.; Stormo, Gary D.

2005-01-01

detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include...... the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy....... The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability...
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

Science.gov (United States)

Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

2011-01-01

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
Effect of stacking sequence on mechanical properties neem wood veneer plastic composites

Science.gov (United States)

Nagamadhu, M.; Kumar, G. C. Mohan; Jeyaraj, P.

2018-04-01

This study investigates the effect of wood veneer stacking sequence on mechanical properties of neem wood polymer composite (WPC) experimentally. Wood laminated samples were fabricated by conventional hand layup technique in a mold and cured under pressure at room temperature and then post cured at elevated temperature. Initially, the tensile, flexural, and impact test were conducted to understand the effect of weight fraction of fiber on mechanical properties. The mechanical properties have increased with the weight fraction of fiber. Moreover the stacking sequence of neem wood plays an important role. As it has a significant impact on the mechanical properties. The results indicated that 0°/0° WPC shows highest mechanical properties as compared to other sequences (90°/90°, 0°/90°, 45°/90°, 45°/45°). The Fourier Transform Infrared Spectroscopy (FTIR) Analysis were carried out to identify chemical compounds both in raw neem wood and neem wood epoxy composite. The microstructure raw/neat neem wood and the interfacial bonding characteristics of neem wood composite investigated using Scanning electron microscopy images.

Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

Science.gov (United States)

Neuwald, Andrew F

2009-08-01

The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.
Coval: improving alignment quality and variant calling accuracy for next-generation sequencing data.

Directory of Open Access Journals (Sweden)

Shunichi Kosugi

Full Text Available Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in 'targeted' alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.
Evolutionary rates at codon sites may be used to align sequences and infer protein domain function

Directory of Open Access Journals (Sweden)

Hazelhurst Scott

2010-03-01

Full Text Available Abstract Background Sequence alignments form part of many investigations in molecular biology, including the determination of phylogenetic relationships, the prediction of protein structure and function, and the measurement of evolutionary rates. However, to obtain meaningful results, a significant degree of sequence similarity is required to ensure that the alignments are accurate and the inferences correct. Limitations arise when sequence similarity is low, which is particularly problematic when working with fast-evolving genes, evolutionary distant taxa, genomes with nucleotide biases, and cases of convergent evolution. Results A novel approach was conceptualized to address the "low sequence similarity" alignment problem. We developed an alignment algorithm termed FIRE (Functional Inference using the Rates of Evolution, which aligns sequences using the evolutionary rate at codon sites, as measured by the dN/dS ratio, rather than nucleotide or amino acid residues. FIRE was used to test the hypotheses that evolutionary rates can be used to align sequences and that the alignments may be used to infer protein domain function. Using a range of test data, we found that aligning domains based on evolutionary rates was possible even when sequence similarity was very low (for example, antibody variable regions. Furthermore, the alignment has the potential to infer protein domain function, indicating that domains with similar functions are subject to similar evolutionary constraints. These data suggest that an evolutionary rate-based approach to sequence analysis (particularly when combined with structural data may be used to study cases of convergent evolution or when sequences have very low similarity. However, when aligning homologous gene sets with sequence similarity, FIRE did not perform as well as the best traditional alignment algorithms indicating that the conventional approach of aligning residues as opposed to evolutionary rates remains the
Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences

Directory of Open Access Journals (Sweden)

Lee DT

2007-02-01

Full Text Available Abstract Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL http://biocomp.iis.sinica.edu.tw/phylomlogo.
Enhanced Dynamic Algorithm of Genome Sequence Alignments

OpenAIRE

Arabi E. keshk

2014-01-01

The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between se...
Measuring the distance between multiple sequence alignments.

Science.gov (United States)

Blackburne, Benjamin P; Whelan, Simon

2012-02-15

Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.
pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment.

Science.gov (United States)

Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

2018-01-01

Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python. The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS. pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.
Fractal MapReduce decomposition of sequence alignment

Directory of Open Access Journals (Sweden)

Almeida Jonas S

2012-05-01

Full Text Available Abstract Background The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required. Results In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming. Conclusions The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp, highlighting the browser's emergence as an environment for high performance distributed computing. Availability Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm".
Aligning protein sequence and analysing substitution pattern using ...

Indian Academy of Sciences (India)

Prakash

Aligning protein sequences using a score matrix has became a routine but valuable method in modern biological ..... the amino acids according to their substitution behaviour ...... which may cause great change (e.g. prolonging the helix) in.
Sequence Similarity Presenter: a tool for the graphic display of similarities of long sequences for use in presentations.

Science.gov (United States)

Fröhlich, K U

1994-04-01

A new method for the presentation of alignments of long sequences is described. The degree of identity for the aligned sequences is averaged for sections of a fixed number of residues. The resulting values are converted to shades of gray, with white corresponding to lack of identity and black corresponding to perfect identity. A sequence alignment is represented as a bar filled with varying shades of gray. The display is compact and allows for a fast and intuitive recognition of the distribution of regions with a high similarity. It is well suited for the presentation of alignments of long sequences, e.g. of protein superfamilies, in plenary lectures. The method is implemented as a HyperCard stack for Apple Macintosh computers. Several options for the modification of the output are available (e.g. background reduction, size of the summation window, consideration of amino acid similarity, inclusion of graphic markers to indicate specific domains). The output is a PostScript file which can be printed, imported as EPS or processed further with Adobe Illustrator.
K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

Science.gov (United States)

Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

2018-05-15

Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.
Using hidden Markov models to align multiple sequences.

Science.gov (United States)

Mount, David W

2009-07-01

A hidden Markov model (HMM) is a probabilistic model of a multiple sequence alignment (msa) of proteins. In the model, each column of symbols in the alignment is represented by a frequency distribution of the symbols (called a "state"), and insertions and deletions are represented by other states. One moves through the model along a particular path from state to state in a Markov chain (i.e., random choice of next move), trying to match a given sequence. The next matching symbol is chosen from each state, recording its probability (frequency) and also the probability of going to that state from a previous one (the transition probability). State and transition probabilities are multiplied to obtain a probability of the given sequence. The hidden nature of the HMM is due to the lack of information about the value of a specific state, which is instead represented by a probability distribution over all possible values. This article discusses the advantages and disadvantages of HMMs in msa and presents algorithms for calculating an HMM and the conditions for producing the best HMM.
Effect of stacking sequence and surface treatment on the thermal conductivity of multilayered hybrid nano-composites

Science.gov (United States)

Papanicolaou, G. C.; Pappa, E. J.; Portan, D. V.; Kotrotsos, A.; Kollia, E.

2018-02-01

The aim of the present investigation was to study the effect of both the stacking sequence and surface treatment on the thermal conductivity of multilayered hybrid nano-composites. Four types of multilayered hybrid nanocomposites were manufactured and tested: Nitinol- CNTs (carbon nanotubes)- Acrylic resin; Nitinol- Acrylic resin- CNTs; Surface treated Nitinol- CNTs- Acrylic resin and Surface treated Nitinol- Acrylic resin- CNTs. Surface treatment of Nitinol plies was realized by means of the electrochemical anodization. Surface topography of the anodized nitinol sheets was investigated through Scanning Electron Microscopy (SEM). It was found that the overall thermal response of the manufactured multilayered nano-composites was greatly influenced by both the anodization and the stacking sequence. A theoretical model for the prediction of the overall thermal conductivity has been developed considering the nature of the different layers, their stacking sequence as well as the interfacial thermal resistance. Thermal conductivity and Differential Scanning Calorimetry (DSC) measurements were conducted, to verify the predicted by the model overall thermal conductivities. In all cases, a good agreement between theoretical predictions and experimental results was found.
Fabrication of high gradient insulators by stack compression

Science.gov (United States)

Harris, John Richardson; Sanders, Dave; Hawkins, Steven Anthony; Norona, Marcelo

2014-04-29

Individual layers of a high gradient insulator (HGI) are first pre-cut to their final dimensions. The pre-cut layers are then stacked to form an assembly that is subsequently pressed into an HGI unit with the desired dimension. The individual layers are stacked, and alignment is maintained, using a sacrificial alignment tube that is removed after the stack is hot pressed. The HGI's are used as high voltage vacuum insulators in energy storage and transmission structures or devices, e.g. in particle accelerators and pulsed power systems.
OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy

Directory of Open Access Journals (Sweden)

Searle Stephen MJ

2003-10-01

Full Text Available Abstract Background The alignment of two or more protein sequences provides a powerful guide in the prediction of the protein structure and in identifying key functional residues, however, the utility of any prediction is completely dependent on the accuracy of the alignment. In this paper we describe a suite of reference alignments derived from the comparison of protein three-dimensional structures together with evaluation measures and software that allow automatically generated alignments to be benchmarked. We test the OXBench benchmark suite on alignments generated by the AMPS multiple alignment method, then apply the suite to compare eight different multiple alignment algorithms. The benchmark shows the current state-of-the art for alignment accuracy and provides a baseline against which new alignment algorithms may be judged. Results The simple hierarchical multiple alignment algorithm, AMPS, performed as well as or better than more modern methods such as CLUSTALW once the PAM250 pair-score matrix was replaced by a BLOSUM series matrix. AMPS gave an accuracy in Structurally Conserved Regions (SCRs of 89.9% over a set of 672 alignments. The T-COFFEE method on a data set of families with http://www.compbio.dundee.ac.uk. Conclusions The OXBench suite of reference alignments, evaluation software and results database provide a convenient method to assess progress in sequence alignment techniques. Evaluation measures that were dependent on comparison to a reference alignment were found to give good discrimination between methods. The STAMP Sc Score which is independent of a reference alignment also gave good discrimination. Application of OXBench in this paper shows that with the exception of T-COFFEE, the majority of the improvement in alignment accuracy seen since 1985 stems from improved pair-score matrices rather than algorithmic refinements. The maximum theoretical alignment accuracy obtained by pooling results over all methods was 94
MISTICA: Minimum Spanning Tree-Based Coarse Image Alignment for Microscopy Image Sequences.

Science.gov (United States)

Ray, Nilanjan; McArdle, Sara; Ley, Klaus; Acton, Scott T

2016-11-01

Registration of an in vivo microscopy image sequence is necessary in many significant studies, including studies of atherosclerosis in large arteries and the heart. Significant cardiac and respiratory motion of the living subject, occasional spells of focal plane changes, drift in the field of view, and long image sequences are the principal roadblocks. The first step in such a registration process is the removal of translational and rotational motion. Next, a deformable registration can be performed. The focus of our study here is to remove the translation and/or rigid body motion that we refer to here as coarse alignment. The existing techniques for coarse alignment are unable to accommodate long sequences often consisting of periods of poor quality images (as quantified by a suitable perceptual measure). Many existing methods require the user to select an anchor image to which other images are registered. We propose a novel method for coarse image sequence alignment based on minimum weighted spanning trees (MISTICA) that overcomes these difficulties. The principal idea behind MISTICA is to reorder the images in shorter sequences, to demote nonconforming or poor quality images in the registration process, and to mitigate the error propagation. The anchor image is selected automatically making MISTICA completely automated. MISTICA is computationally efficient. It has a single tuning parameter that determines graph width, which can also be eliminated by the way of additional computation. MISTICA outperforms existing alignment methods when applied to microscopy image sequences of mouse arteries.
ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

Directory of Open Access Journals (Sweden)

Kim Taeho

2010-09-01

Full Text Available Abstract Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA. The new editing option and the graphical user interface (GUI provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1 the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2 Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3 Support for both single PC and distributed cluster systems.
Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

KAUST Repository

Odat, Enas M.

2011-01-01

The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.
Accelerated convergence and robust asymptotic regression of the Gumbel scale parameter for gapped sequence alignment

International Nuclear Information System (INIS)

Park, Yonil; Sheetlin, Sergey; Spouge, John L

2005-01-01

Searches through biological databases provide the primary motivation for studying sequence alignment statistics. Other motivations include physical models of annealing processes or mathematical similarities to, e.g., first-passage percolation and interacting particle systems. Here, we investigate sequence alignment statistics, partly to explore two general mathematical methods. First, we model the global alignment of random sequences heuristically with Markov additive processes. In sequence alignment, the heuristic suggests a numerical acceleration scheme for simulating an important asymptotic parameter (the Gumbel scale parameter λ). The heuristic might apply to similar mathematical theories. Second, we extract the asymptotic parameter λ from simulation data with the statistical technique of robust regression. Robust regression is admirably suited to 'asymptotic regression' and deserves to be better known for it
MACSIMS : multiple alignment of complete sequences information management system

Directory of Open Access Journals (Sweden)

Plewniak Frédéric

2006-06-01

Full Text Available Abstract Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at http://bips.u-strasbg.fr/MACSIMS/.

Alignment of Escherichia coli K12 DNA sequences to a genomic restriction map.

Science.gov (United States)

Rudd, K E; Miller, W; Ostell, J; Benson, D A

1990-01-25

We use the extensive published information describing the genome of Escherichia coli and new restriction map alignment software to align DNA sequence, genetic, and physical maps. Restriction map alignment software is used which considers restriction maps as strings analogous to DNA or protein sequences except that two values, enzyme name and DNA base address, are associated with each position on the string. The resulting alignments reveal a nearly linear relationship between the physical and genetic maps of the E. coli chromosome. Physical map comparisons with the 1976, 1980, and 1983 genetic maps demonstrate a better fit with the more recent maps. The results of these alignments are genomic kilobase coordinates, orientation and rank of the alignment that best fits the genetic data. A statistical measure based on extreme value distribution is applied to the alignments. Additional computer analyses allow us to estimate the accuracy of the published E. coli genomic restriction map, simulate rearrangements of the bacterial chromosome, and search for repetitive DNA. The procedures we used are general enough to be applicable to other genome mapping projects.
An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

KAUST Repository

Bonny, Talal

2012-07-28

Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.
Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

Science.gov (United States)

Soares, Inês; Goios, Ana; Amorim, António

2012-01-01

The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

Science.gov (United States)

Liu, Kevin; Warnow, Tandy J; Holder, Mark T; Nelesen, Serita M; Yu, Jiaye; Stamatakis, Alexandros P; Linder, C Randal

2012-01-01

Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of
High-Stacking-Density, Superior-Roughness LDH Bridged with Vertically Aligned Graphene for High-Performance Asymmetric Supercapacitors.

Science.gov (United States)

Guo, Wei; Yu, Chang; Li, Shaofeng; Yang, Juan; Liu, Zhibin; Zhao, Changtai; Huang, Huawei; Zhang, Mengdi; Han, Xiaotong; Niu, Yingying; Qiu, Jieshan

2017-10-01

The high-performance electrode materials with tuned surface and interface structure and functionalities are highly demanded for advanced supercapacitors. A novel strategy is presented to conFigure high-stacking-density, superior-roughness nickel manganese layered double hydroxide (LDH) bridged by vertically aligned graphene (VG) with nickel foam (NF) as the conductive collector, yielding the LDH-NF@VG hybrids for asymmetric supercapacitors. The VG nanosheets provide numerous electron transfer channels for quick redox reactions, and well-developed open structure for fast mass transport. Moreover, the high-stacking-density LDH grown and assembled on VG nanosheets result in a superior hydrophilicity derived from the tuned nano/microstructures, especially microroughness. Such a high stacking density with abundant active sites and superior wettability can be easily accessed by aqueous electrolytes. Benefitting from the above features, the LDH-NF@VG can deliver a high capacitance of 2920 F g -1 at a current density of 2 A g -1 , and the asymmetric supercapacitor with the LDH-NF@VG as positive electrode and activated carbon as negative electrode can deliver a high energy density of 56.8 Wh kg -1 at a power density of 260 W kg -1 , with a high specific capacitance retention rate of 87% even after 10 000 cycles. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Science.gov (United States)

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Band engineering in transition metal dichalcogenides: Stacked versus lateral heterostructures

International Nuclear Information System (INIS)

Guo, Yuzheng; Robertson, John

2016-01-01

We calculate a large difference in the band alignments for transition metal dichalcogenide (TMD) heterojunctions when arranged in the stacked layer or lateral (in-plane) geometries, using direct supercell calculations. The stacked case follows the unpinned limit of the electron affinity rule, whereas the lateral geometry follows the strongly pinned limit of alignment of charge neutrality levels. TMDs therefore provide one of the few clear tests of band alignment models, whereas three-dimensional semiconductors give less stringent tests because of accidental chemical trends in their properties.
Heuristic for Solving the Multiple Alignment Sequence Problem

Directory of Open Access Journals (Sweden)

Roman Anselmo Mora Gutiérrez

2011-03-01

Full Text Available In this paper we developed a new algorithm for solving the problem of multiple sequence alignment (AM S, which is a hybrid metaheuristic based on harmony search and simulated annealing. The hybrid was validated with the methodology of Julie Thompson. This is a basic algorithm and and results obtained during this stage are encouraging.
Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

Directory of Open Access Journals (Sweden)

Inês Soares

2012-01-01

Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.
Use of cycle stacking patterns to define third-order depositional sequences: Middle to Late Cambrian Bonanza King Formation, southern Great basin

Energy Technology Data Exchange (ETDEWEB)

Montanez, I.P.; Droser, M.L. (Univ. of California, Riverside (United States))

1991-03-01

The Middle to Late Cambrian Bonanza King Formation (CA, NV) is characterized by superimposed scales of cyclicity. Small-scale cycles (0.5 to 10m) occur as shallowing-upward peritidal and subtidal cycles that repeat at high frequencies (10{sup 4} to 10{sup 5}). Systematic changes in stacking patterns of meter-scale cycles define several large-scale (50-250 m) third-order depositional sequences in the Bonanza King Formation. Third-order depositional sequences can be traced within ranges and correlated regionally across the platform. Peritidal cycles in the Bonanza King Formation are both subtidal- and tidal flat-dominated. Tidal flat-dominated cycles consist of muddy bases grading upward into thrombolites or columnar stromatolites all capped by planar stromatolites. Subtidal cycles in the Bonanza King Formation consist of grainstone bases that commonly fine upward and contain stacked hardgrounds. These are overlain by digitate-algal bioherms with grainstone channel fills and/or bioturbated ribbon carbonates with grainstone lenses. Transgressive depositional facies of third-order depositional sequences consist primarily of stacks of subtidal-dominated pertidial cycles and subtidal cycles, whereas regressive depositional facies are dominated by stacks of tidal flat-dominated peritidal cycles and regoliths developed over laminite cycle caps. The use of high frequency cycles in the Bonanza King Formation to delineate regionally developed third-order depositional sequences thus provides a link between cycle stratigraphy and sequence stratigraphy.
A rank-based sequence aligner with applications in phylogenetic analysis.

Directory of Open Access Journals (Sweden)

Liviu P Dinu

Full Text Available Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD. The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.
Subfamily logos: visualization of sequence deviations at alignment positions with high information content

Directory of Open Access Journals (Sweden)

Beitz Eric

2006-06-01

Full Text Available Abstract Background Recognition of relevant sequence deviations can be valuable for elucidating functional differences between protein subfamilies. Interesting residues at highly conserved positions can then be mutated and experimentally analyzed. However, identification of such sites is tedious because automated approaches are scarce. Results Subfamily logos visualize subfamily-specific sequence deviations. The display is similar to classical sequence logos but extends into the negative range. Positive, upright characters correspond to residues which are characteristic for the subfamily, negative, upside-down characters to residues typical for the remaining sequences. The symbol height is adjusted to the information content of the alignment position. Residues which are conserved throughout do not appear. Conclusion Subfamily logos provide an intuitive display of relevant sequence deviations. The method has proven to be valid using a set of 135 aligned aquaporin sequences in which established subfamily-specific positions were readily identified by the algorithm.
Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

Science.gov (United States)

Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

2016-11-23

The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.
Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

Directory of Open Access Journals (Sweden)

von Reumont Björn M

2010-03-01

Full Text Available Abstract Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment
BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm.

Science.gov (United States)

Loving, Joshua; Hernandez, Yozen; Benson, Gary

2014-11-15

Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, shift and addition. Bit-parallelism has been successfully applied to the longest common subsequence (LCS) and edit-distance problems, producing fast algorithms in practice. We have developed BitPAl, a bit-parallel algorithm for general, integer-scoring global alignment. Integer-scoring schemes assign integer weights for match, mismatch and insertion/deletion. The BitPAl method uses structural properties in the relationship between adjacent scores in the scoring matrix to construct classes of efficient algorithms, each designed for a particular set of weights. In timed tests, we show that BitPAl runs 7-25 times faster than a standard iterative algorithm. Source code is freely available for download at http://lobstah.bu.edu/BitPAl/BitPAl.html. BitPAl is implemented in C and runs on all major operating systems. jloving@bu.edu or yhernand@bu.edu or gbenson@bu.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Effect of fiber angle orientation and stacking sequence on mixed mode fracture toughness of carbon fiber reinforced plastics: Numerical and experimental investigations

International Nuclear Information System (INIS)

Naghipour, P.; Bartsch, M.; Chernova, L.; Hausmann, J.; Voggenreiter, H.

2010-01-01

This paper focuses on the effect of fiber orientation and stacking sequence on the progressive mixed mode delamination failure in composite laminates using fracture experiments and finite element (FE) simulations. Every laminate is modelled numerically combining damageable layers with defined fiber orientations and cohesive zone interface elements, subjected to mixed mode bending. The numerical simulations are then calibrated and validated through experiments, conducted following standardized mixed mode delamination tests. The numerical model is able to successfully capture the experimentally observed effects of fiber angle orientations and variable stacking sequences on the global load-displacement response and mixed mode inter-laminar fracture toughness of the various laminates. For better understanding of the failure mechanism, fracture surfaces of laminates with different stacking sequences are also studied using scanning electron microscopy (SEM).
SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

Science.gov (United States)

Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

2010-07-01

We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.
Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

Directory of Open Access Journals (Sweden)

Sadreyev Ruslan I

2004-08-01

Full Text Available Abstract Background Profile-based analysis of multiple sequence alignments (MSA allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1 MSA position and a set of predicted residue frequencies, and (2 between two MSA positions. These problems are important for (i evaluation and optimization of methods predicting residue occurrence at protein positions; (ii detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii detection of sites that determine functional or structural specificity in two related families. Results For problems (1 and (2, we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion The proposed computational method is of significant potential value for the analysis of protein families.
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS

NARCIS (Netherlands)

Warris, S.; Yalcin, F.; Jackson, K.J.; Nap, J.P.H.

2015-01-01

Motivation To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate
A direct method for computing extreme value (Gumbel) parameters for gapped biological sequence alignments.

Science.gov (United States)

Quinn, Terrance; Sinkala, Zachariah

2014-01-01

We develop a general method for computing extreme value distribution (Gumbel, 1958) parameters for gapped alignments. Our approach uses mixture distribution theory to obtain associated BLOSUM matrices for gapped alignments, which in turn are used for determining significance of gapped alignment scores for pairs of biological sequences. We compare our results with parameters already obtained in the literature.

SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

Directory of Open Access Journals (Sweden)

Brejnev Muhizi Muhire

Full Text Available The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV. There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT, a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms.
Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

Science.gov (United States)

Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

2017-07-01

DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Introducing difference recurrence relations for faster semi-global alignment of long sequences.

Science.gov (United States)

Suzuki, Hajime; Kasahara, Masahiro

2018-02-19

The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .
pyPaSWAS : Python-based multi-core CPU and GPU sequence alignment

NARCIS (Netherlands)

Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

2018-01-01

BACKGROUND: Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of
Optical properties and electronic transitions of DNA oligonucleotides as a function of composition and stacking sequence.

Science.gov (United States)

Schimelman, Jacob B; Dryden, Daniel M; Poudel, Lokendra; Krawiec, Katherine E; Ma, Yingfang; Podgornik, Rudolf; Parsegian, V Adrian; Denoyer, Linda K; Ching, Wai-Yim; Steinmetz, Nicole F; French, Roger H

2015-02-14

The role of base pair composition and stacking sequence in the optical properties and electronic transitions of DNA is of fundamental interest. We present and compare the optical properties of DNA oligonucleotides (AT)10, (AT)5(GC)5, and (AT-GC)5 using both ab initio methods and UV-vis molar absorbance measurements. Our data indicate a strong dependence of both the position and intensity of UV absorbance features on oligonucleotide composition and stacking sequence. The partial densities of states for each oligonucleotide indicate that the valence band edge arises from a feature associated with the PO4(3-) complex anion, and the conduction band edge arises from anti-bonding states in DNA base pairs. The results show a strong correspondence between the ab initio and experimentally determined optical properties. These results highlight the benefit of full spectral analysis of DNA, as opposed to reductive methods that consider only the 260 nm absorbance (A260) or simple purity ratios, such as A260/A230 or A260/A280, and suggest that the slope of the absorption edge onset may provide a useful metric for the degree of base pair stacking in DNA. These insights may prove useful for applications in biology, bioelectronics, and mesoscale self-assembly.
BuddySuite: Command-Line Toolkits for Manipulating Sequences, Alignments, and Phylogenetic Trees.

Science.gov (United States)

Bond, Stephen R; Keat, Karl E; Barreira, Sofia N; Baxevanis, Andreas D

2017-06-01

The ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.
Revisiting the phylogeny of Zoanthidea (Cnidaria: Anthozoa): Staggered alignment of hypervariable sequences improves species tree inference.

Science.gov (United States)

Swain, Timothy D

2018-01-01

The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.
Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

Directory of Open Access Journals (Sweden)

Hao Ye

2015-11-01

Full Text Available Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.
High-speed all-optical DNA local sequence alignment based on a three-dimensional artificial neural network.

Science.gov (United States)

Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra

2017-07-01

This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.
Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

Science.gov (United States)

Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

2004-01-01

Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377
CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

Directory of Open Access Journals (Sweden)

Shi Weisong

2011-06-01

Full Text Available Abstract Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS. However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80% mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http
CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.

Science.gov (United States)

Nguyen, Tung; Shi, Weisong; Ruden, Douglas

2011-06-06

Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http
MSAViewer: interactive JavaScript visualization of multiple sequence alignments.

Science.gov (United States)

Yachdav, Guy; Wilzbach, Sebastian; Rauscher, Benedikt; Sheridan, Robert; Sillitoe, Ian; Procter, James; Lewis, Suzanna E; Rost, Burkhard; Goldberg, Tatyana

2016-11-15

The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is 'web ready': written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components. The MSAViewer is released as open source software under the Boost Software License 1.0. Documentation, source code and the viewer are available at http://msa.biojs.net/Supplementary information: Supplementary data are available at Bioinformatics online. msa@bio.sh. © The Author 2016. Published by Oxford University Press.
Influence of Layup Sequence on the Surface Accuracy of Carbon Fiber Composite Space Mirrors

Science.gov (United States)

Yang, Zhiyong; Liu, Qingnian; Zhang, Boming; Xu, Liang; Tang, Zhanwen; Xie, Yongjie

2018-04-01

Layup sequence is directly related to stiffness and deformation resistance of the composite space mirror, and error caused by layup sequence can affect the surface precision of composite mirrors evidently. Variation of layup sequence with the same total thickness of composite space mirror changes surface form of the composite mirror, which is the focus of our study. In our research, the influence of varied quasi-isotropic stacking sequences and random angular deviation on the surface accuracy of composite space mirrors was investigated through finite element analyses (FEA). We established a simulation model for the studied concave mirror with 500 mm diameter, essential factors of layup sequences and random angular deviations on different plies were discussed. Five guiding findings were described in this study. Increasing total plies, optimizing stacking sequence and keeping consistency of ply alignment in ply placement are effective to improve surface accuracy of composite mirror.
Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

Science.gov (United States)

Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

2013-09-01

Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.
DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

Science.gov (United States)

Kelly, Steven; Maini, Philip K

2013-01-01

The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.
DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

Directory of Open Access Journals (Sweden)

Steven Kelly

Full Text Available The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

Science.gov (United States)

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Pattern recognition in complex activity travel patterns : comparison of Euclidean distance, signal-processing theoretical, and multidimensional sequence alignment methods

NARCIS (Netherlands)

Joh, C.H.; Arentze, T.A.; Timmermans, H.J.P.

2001-01-01

The application of a multidimensional sequence alignment method for classifying activity travel patterns is reported. The method was developed as an alternative to the existing classification methods suggested in the transportation literature. The relevance of the multidimensional sequence alignment
Alignment of high-throughput sequencing data inside in-memory databases.

Science.gov (United States)

Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

2014-01-01

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

Interactive software tool to comprehend the calculation of optimal sequence alignments with dynamic programming.

Science.gov (United States)

Ibarra, Ignacio L; Melo, Francisco

2010-07-01

Dynamic programming (DP) is a general optimization strategy that is successfully used across various disciplines of science. In bioinformatics, it is widely applied in calculating the optimal alignment between pairs of protein or DNA sequences. These alignments form the basis of new, verifiable biological hypothesis. Despite its importance, there are no interactive tools available for training and education on understanding the DP algorithm. Here, we introduce an interactive computer application with a graphical interface, for the purpose of educating students about DP. The program displays the DP scoring matrix and the resulting optimal alignment(s), while allowing the user to modify key parameters such as the values in the similarity matrix, the sequence alignment algorithm version and the gap opening/extension penalties. We hope that this software will be useful to teachers and students of bioinformatics courses, as well as researchers who implement the DP algorithm for diverse applications. The software is freely available at: http:/melolab.org/sat. The software is written in the Java computer language, thus it runs on all major platforms and operating systems including Windows, Mac OS X and LINUX. All inquiries or comments about this software should be directed to Francisco Melo at fmelo@bio.puc.cl.
TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

Science.gov (United States)

Chang, Jia-Ming; Di Tommaso, Paolo; Notredame, Cedric

2014-06-01

Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. For this purpose, we describe a novel lossless alternative to site filtering that involves overweighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. The software is available from www.tcoffee.org/Projects/tcs. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Aptaligner: automated software for aligning pseudorandom DNA X-aptamers from next-generation sequencing data.

Science.gov (United States)

Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E

2014-06-10

Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.
Micropatterning stretched and aligned DNA for sequence-specific nanolithography

Science.gov (United States)

Petit, Cecilia Anna Paulette

Techniques for fabricating nanostructured materials can be categorized as either "top-down" or "bottom-up". Top-down techniques use lithography and contact printing to create patterned surfaces and microfluidic channels that can corral and organize nanoscale structures, such as molecules and nanorods in contrast; bottom-up techniques use self-assembly or molecular recognition to direct the organization of materials. A central goal in nanotechnology is the integration of bottom-up and top-down assembly strategies for materials development, device design; and process integration. With this goal in mind, we have developed strategies that will allow this integration by using DNA as a template for nanofabrication; two top-down approaches allow the placement of these templates, while the bottom-up technique uses the specific sequence of bases to pattern materials along each strand of DNA. Our first top-down approach, termed combing of molecules in microchannels (COMMIC), produces microscopic patterns of stretched and aligned molecules of DNA on surfaces. This process consists of passing an air-water interface over end adsorbed molecules inside microfabricated channels. The geometry of the microchannel directs the placement of the DNA molecules, while the geometry of the airwater interface directs the local orientation and curvature of the molecules. We developed another top-down strategy for creating micropatterns of stretched and aligned DNA using surface chemistry. Because DNA stretching occurs on hydrophobic surfaces, this technique uses photolithography to pattern vinyl-terminated silanes on glass When these surface-, are immersed in DNA solution, molecules adhere preferentially to the silanized areas. This approach has also proven useful in patterning protein for cell adhesion studies. Finally, we describe the use of these stretched and aligned molecules of DNA as templates for the subsequent bottom-up construction of hetero-structures through hybridization
Self-aligned top-gate InGaZnO thin film transistors using SiO{sub 2}/Al{sub 2}O{sub 3} stack gate dielectric

Energy Technology Data Exchange (ETDEWEB)

Chen, Rongsheng; Zhou, Wei; Zhang, Meng; Wong, Man; Kwok, Hoi Sing

2013-12-02

Self-aligned top-gate amorphous indium–gallium–zinc oxide (a-IGZO) thin film transistors (TFTs) utilizing SiO{sub 2}/Al{sub 2}O{sub 3} stack thin films as gate dielectric are developed in this paper. Due to high quality of the high-k Al{sub 2}O{sub 3} and good interface between active layer and gate dielectric, the resulting a-IGZO TFT exhibits good electrical performance including field-effect mobility of 9 cm{sup 2}/Vs, threshold voltage of 2.2 V, subthreshold swing of 0.2 V/decade, and on/off current ratio of 1 × 10{sup 7}. With scaling down of the channel length, good characteristics are also obtained with a small shift of the threshold voltage and no degradation of subthreshold swing. - Highlights: • Self-aligned top-gate indium–gallium–zinc oxide thin-film transistor is proposed. • SiO{sub 2}/Al{sub 2}O{sub 3} stack gate dielectric is proposed. • The source/drain areas are hydrogen-doped by CHF{sub 3} plasma. • The devices show good electrical performance and scaling down behavior.
Harnessing cross-species alignment to discover SNPs and generate a draft genome sequence of a bighorn sheep (Ovis canadensis).

Science.gov (United States)

Miller, Joshua M; Moore, Stephen S; Stothard, Paul; Liao, Xiaoping; Coltman, David W

2015-05-20

Whole genome sequences (WGS) have proliferated as sequencing technology continues to improve and costs decline. While many WGS of model or domestic organisms have been produced, a growing number of non-model species are also being sequenced. In the absence of a reference, construction of a genome sequence necessitates de novo assembly which may be beyond the ability of many labs due to the large volumes of raw sequence data and extensive bioinformatics required. In contrast, the presence of a reference WGS allows for alignment which is more tractable than assembly. Recent work has highlighted that the reference need not come from the same species, potentially enabling a wide array of species WGS to be constructed using cross-species alignment. Here we report on the creation a draft WGS from a single bighorn sheep (Ovis canadensis) using alignment to the closely related domestic sheep (Ovis aries). Two sequencing libraries on SOLiD platforms yielded over 865 million reads, and combined alignment to the domestic sheep reference resulted in a nearly complete sequence (95% coverage of the reference) at an average of 12x read depth (104 SD). From this we discovered over 15 million variants and annotated them relative to the domestic sheep reference. We then conducted an enrichment analysis of those SNPs showing fixed differences between the reference and sequenced individual and found significant differences in a number of gene ontology (GO) terms, including those associated with reproduction, muscle properties, and bone deposition. Our results demonstrate that cross-species alignment enables the creation of novel WGS for non-model organisms. The bighorn sheep WGS will provide a resource for future resequencing studies or comparative genomics.
Influence of the stacking sequence of layers on the mechanical behavior of polymeric composite cylinders; Influencia da configuracao de bobinagem no comportamento mecanico de cilindros de composito polimerico

Energy Technology Data Exchange (ETDEWEB)

Carvalho, Osni de

2006-07-01

This work evaluated experimentally the influence of the stacking sequence of layers symmetrical and asymmetrical on the mechanical behavior of polymeric composite cylinders. For so much, two open-ended cylinders groups were manufactured by filament winding process, which had different stacking sequence related to the laminate midplane, characterizing symmetrical and asymmetrical laminates. The composite cylinders were made with epoxy matrix and carbon fiber as reinforcement. For evaluation of the mechanical strength, the cylinders were tested hydrostatically, which consisted of internal pressurization in a hydrostatic device through the utilization of a fluid until the cylinders burst. Additionally, were compared the strains and failure modes between the cylinders groups. The utilization of a finite element program allowed to conclude that this tool, very used in design, does not get to identify tensions in the fiber direction in each composite layer, as well as interlaminar shear stress, that appears in the cylinders with asymmetrical stacking sequence. The tests results showed that the stacking sequence had influence in the mechanical behavior of the composite cylinders, favoring the symmetrical construction. (author)
Measuring covariation in RNA alignments: Physical realism improves information measures

DEFF Research Database (Denmark)

Lindgreen, Stinus; Gardner, Paul Phillip; Krogh, Anders

2006-01-01

Motivation: The importance of non-coding RNAs is becoming increasingly evident, and often the function of these molecules depends on the structure. It is common to use alignments of related RNA sequences to deduce the consensus secondary structure by detecting patterns of co-evolution. A central...... part of such an analysis is to measure covariation between two positions in an alignment. Here, we rank various measures ranging from simple mutual information to more advanced covariation measures. Results: Mutual information is still used for secondary structure prediction, but the results...... of this study indicate which measures are useful. Incorporating more structural information by considering e.g. indels and stacking improves accuracy, suggesting that physically realistic measures yield improved predictions. This can be used to improve both current and future programs for secondary structure...
A generalized global alignment algorithm.

Science.gov (United States)

Huang, Xiaoqiu; Chao, Kun-Mao

2003-01-22

Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.
CHROMATOGATE: A TOOL FOR DETECTING BASE MIS-CALLS IN MULTIPLE SEQUENCE ALIGNMENTS BY SEMI-AUTOMATIC CHROMATOGRAM INSPECTION

Directory of Open Access Journals (Sweden)

Nikolaos Alachiotis

2013-03-01

Full Text Available Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG, an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.
Phylo: a citizen science approach for improving multiple sequence alignment.

Directory of Open Access Journals (Sweden)

Alexander Kawrykow

Full Text Available BACKGROUND: Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. METHODOLOGY/PRINCIPAL FINDINGS: We introduce Phylo, a human-based computing framework applying "crowd sourcing" techniques to solve the Multiple Sequence Alignment (MSA problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. CONCLUSIONS/SIGNIFICANCE: We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of "human-brain peta-flops" of computation that are spent every day playing games
Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

Energy Technology Data Exchange (ETDEWEB)

Ovacik, Meric A. [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Androulakis, Ioannis P., E-mail: yannis@rci.rutgers.edu [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States)

2013-09-15

Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.
Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

International Nuclear Information System (INIS)

Ovacik, Meric A.; Androulakis, Ioannis P.

2013-01-01

Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy
Optimum stacking sequence design of laminated composite circular plates with curvilinear fibres by a layer-wise optimization method

Science.gov (United States)

Guenanou, A.; Houmat, A.

2018-05-01

The optimum stacking sequence design for the maximum fundamental frequency of symmetrically laminated composite circular plates with curvilinear fibres is investigated for the first time using a layer-wise optimization method. The design variables are two fibre orientation angles per layer. The fibre paths are constructed using the method of shifted paths. The first-order shear deformation plate theory and a curved square p-element are used to calculate the objective function. The blending function method is used to model accurately the geometry of the circular plate. The equations of motion are derived using Lagrange's method. The numerical results are validated by means of a convergence test and comparison with published values for symmetrically laminated composite circular plates with rectilinear fibres. The material parameters, boundary conditions, number of layers and thickness are shown to influence the optimum solutions to different extents. The results should serve as a benchmark for optimum stacking sequences of symmetrically laminated composite circular plates with curvilinear fibres.
TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction.

Science.gov (United States)

Chang, Jia-Ming; Di Tommaso, Paolo; Lefort, Vincent; Gascuel, Olivier; Notredame, Cedric

2015-07-01

This article introduces the Transitive Consistency Score (TCS) web server; a service making it possible to estimate the local reliability of protein multiple sequence alignments (MSAs) using the TCS index. The evaluation can be used to identify the aligned positions most likely to contain structurally analogous residues and also most likely to support an accurate phylogenetic reconstruction. The TCS scoring scheme has been shown to be accurate predictor of structural alignment correctness among commonly used methods. It has also been shown to outperform common filtering schemes like Gblocks or trimAl when doing MSA post-processing prior to phylogenetic tree reconstruction. The web server is available from http://tcoffee.crg.cat/tcs. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads.

Science.gov (United States)

Huson, Daniel H; Tappu, Rewati; Bazinet, Adam L; Xie, Chao; Cummings, Michael P; Nieselt, Kay; Williams, Rohan

2017-01-25

Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way.
Band alignments and improved leakage properties of (La2O3)0.5(SiO2)0.5/SiO2/GaN stacks for high-temperature metal-oxide-semiconductor field-effect transistor applications

Science.gov (United States)

Gao, L. G.; Xu, B.; Guo, H. X.; Xia, Y. D.; Yin, J.; Liu, Z. G.

2009-06-01

The band alignments of (La2O3)0.5(SiO2)0.5(LSO)/GaN and LSO/SiO2/GaN gate dielectric stacks were investigated comparatively by using x-ray photoelectron spectroscopy. The valence band offsets for LSO/GaN stack and LSO/SiO2/GaN stack are 0.88 and 1.69 eV, respectively, while the corresponding conduction band offsets are found to be 1.40 and 1.83 eV, respectively. Measurements of the leakage current density as function of temperature revealed that the LSO/SiO2/GaN stack has much lower leakage current density than that of the LSO/GaN stack, especially at high temperature. It is concluded that the presence of a SiO2 buffer layer increases band offsets and reduces the leakage current density effectively.
Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

Science.gov (United States)

Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

2016-07-01

Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.
Crystallite size effects in stacking faulted nickel hydroxide and its electrochemical behaviour

International Nuclear Information System (INIS)

Ramesh, T.N.

2009-01-01

β-Nickel hydroxide comprises a long range periodic arrangement of atoms with a stacking sequence of AC AC AC-having an ideal composition Ni(OH) 2 . Variation in the preparative conditions can lead to the changes in the stacking sequence (AC AC BA CB AC AC or AC AC AB AC AC). This type of variation in stacking sequence can result in the formation of stacking fault in nickel hydroxide. The stability of the stacking fault depends on the free energy content of the sample. Stacking faults in nickel hydroxide is essential for better electrochemical activity. Also there are reports correlating particle size to the better electrochemical activity. Here we present the effect of crystallite size on the stacking faulted nickel hydroxide samples. The electrochemical performance of stacking faulted nickel hydroxide with small crystallite size exchanges 0.8e/Ni, while the samples with larger crystallite size exchange 0.4e/Ni. Hence a right combination of crystallite size and stacking fault content has to be controlled for good electrochemical activity of nickel hydroxide
3D tissue formation by stacking detachable cell sheets formed on nanofiber mesh.

Science.gov (United States)

Kim, Min Sung; Lee, Byungjun; Kim, Hong Nam; Bang, Seokyoung; Yang, Hee Seok; Kang, Seong Min; Suh, Kahp-Yang; Park, Suk-Hee; Jeon, Noo Li

2017-03-23

We present a novel approach for assembling 3D tissue by layer-by-layer stacking of cell sheets formed on aligned nanofiber mesh. A rigid frame was used to repeatedly collect aligned electrospun PCL (polycaprolactone) nanofiber to form a mesh structure with average distance between fibers 6.4 µm. When human umbilical vein endothelial cells (HUVECs), human foreskin dermal fibroblasts, and skeletal muscle cells (C2C12) were cultured on the nanofiber mesh, they formed confluent monolayers and could be handled as continuous cell sheets with areas 3 × 3 cm 2 or larger. Thicker 3D tissues have been formed by stacking multiple cell sheets collected on frames that can be nested (i.e. Matryoshka dolls) without any special tools. When cultured on the nanofiber mesh, skeletal muscle, C2C12 cells oriented along the direction of the nanofibers and differentiated into uniaxially aligned multinucleated myotube. Myotube cell sheets were stacked (upto 3 layers) in alternating or aligned directions to form thicker tissue with ∼50 µm thickness. Sandwiching HUVEC cell sheets with two dermal fibroblast cell sheets resulted in vascularized 3D tissue. HUVECs formed extensive networks and expressed CD31, a marker of endothelial cells. Cell sheets formed on nanofiber mesh have a number of advantages, including manipulation and stacking of multiple cell sheets for constructing 3D tissue and may find applications in a variety of tissue engineering applications.

An alignment-free method to find similarity among protein sequences via the general form of Chou's pseudo amino acid composition.

Science.gov (United States)

Gupta, M K; Niyogi, R; Misra, M

2013-01-01

In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.
Vertically aligned CNT-Cu nano-composite material for stacked through-silicon-via interconnects.

Science.gov (United States)

Sun, Shuangxi; Mu, Wei; Edwards, Michael; Mencarelli, Davide; Pierantoni, Luca; Fu, Yifeng; Jeppson, Kjell; Liu, Johan

2016-08-19

For future miniaturization of electronic systems using 3D chip stacking, new fine-pitch materials for through-silicon-via (TSV) applications are likely required. In this paper, we propose a novel carbon nanotube (CNT)/copper nanocomposite material consisting of high aspect ratio, vertically aligned CNT bundles coated with copper. These bundles, consisting of hundreds of tiny CNTs, were uniformly coated by copper through electroplating, and aspect ratios as high as 300:1 were obtained. The resistivity of this nanomaterial was found to be as low as ∼10(-8) Ω m, which is of the same order of magnitude as the resistivity of copper, and its temperature coefficient was found to be only half of that of pure copper. The main advantage of the composite TSV nanomaterial is that its coefficient of thermal expansion (CTE) is similar to that of silicon, a key reliability factor. A finite element model was set up to demonstrate the reliability of this composite material and thermal cycle simulations predicted very promising results. In conclusion, this composite nanomaterial appears to be a very promising material for future 3D TSV applications offering both a low resistivity and a low CTE similar to that of silicon.
Local sequence alignments statistics: deviations from Gumbel statistics in the rare-event tail

Directory of Open Access Journals (Sweden)

Burghardt Bernd

2007-07-01

Full Text Available Abstract Background The optimal score for ungapped local alignments of infinitely long random sequences is known to follow a Gumbel extreme value distribution. Less is known about the important case, where gaps are allowed. For this case, the distribution is only known empirically in the high-probability region, which is biologically less relevant. Results We provide a method to obtain numerically the biologically relevant rare-event tail of the distribution. The method, which has been outlined in an earlier work, is based on generating the sequences with a parametrized probability distribution, which is biased with respect to the original biological one, in the framework of Metropolis Coupled Markov Chain Monte Carlo. Here, we first present the approach in detail and evaluate the convergence of the algorithm by considering a simple test case. In the earlier work, the method was just applied to one single example case. Therefore, we consider here a large set of parameters: We study the distributions for protein alignment with different substitution matrices (BLOSUM62 and PAM250 and affine gap costs with different parameter values. In the logarithmic phase (large gap costs it was previously assumed that the Gumbel form still holds, hence the Gumbel distribution is usually used when evaluating p-values in databases. Here we show that for all cases, provided that the sequences are not too long (L > 400, a "modified" Gumbel distribution, i.e. a Gumbel distribution with an additional Gaussian factor is suitable to describe the data. We also provide a "scaling analysis" of the parameters used in the modified Gumbel distribution. Furthermore, via a comparison with BLAST parameters, we show that significance estimations change considerably when using the true distributions as presented here. Finally, we study also the distribution of the sum statistics of the k best alignments. Conclusion Our results show that the statistics of gapped and ungapped local
Cover song identification by sequence alignment algorithms

Science.gov (United States)

Wang, Chih-Li; Zhong, Qian; Wang, Szu-Ying; Roychowdhury, Vwani

2011-10-01

Content-based music analysis has drawn much attention due to the rapidly growing digital music market. This paper describes a method that can be used to effectively identify cover songs. A cover song is a song that preserves only the crucial melody of its reference song but different in some other acoustic properties. Hence, the beat/chroma-synchronous chromagram, which is insensitive to the variation of the timber or rhythm of songs but sensitive to the melody, is chosen. The key transposition is achieved by cyclically shifting the chromatic domain of the chromagram. By using the Hidden Markov Model (HMM) to obtain the time sequences of songs, the system is made even more robust. Similar structure or length between the cover songs and its reference are not necessary by the Smith-Waterman Alignment Algorithm.
Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.

Science.gov (United States)

Newberg, Lee A

2008-08-15

A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.
Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

Science.gov (United States)

Tan, Yen Hock; Huang, He; Kihara, Daisuke

2006-08-15

Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.
VCFtoTree: a user-friendly tool to construct locus-specific alignments and phylogenies from thousands of anthropologically relevant genome sequences.

Science.gov (United States)

Xu, Duo; Jaber, Yousef; Pavlidis, Pavlos; Gokcumen, Omer

2017-09-26

Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/VCFtoTree_3.0.0 .
DNAAlignEditor: DNA alignment editor tool

Directory of Open Access Journals (Sweden)

Guill Katherine E

2008-03-01

Full Text Available Abstract Background With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor. Results We have generated a nucleotide sequence alignment editor (DNAAlignEditor that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected. Conclusion We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism.
Nucleotide sequence alignment of hdcA from Gram-positive bacteria.

Science.gov (United States)

Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; Del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A

2016-03-01

The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4].
Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Science.gov (United States)

Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

2014-01-01

Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.
Influence of stacking sequence on scattering characteristics of the fundamental anti-symmetric Lamb wave at through holes in composite laminates.

Science.gov (United States)

Veidt, Martin; Ng, Ching-Tai

2011-03-01

This paper investigates the scattering characteristics of the fundamental anti-symmetric (A(0)) Lamb wave at through holes in composite laminates. Three-dimensional (3D) finite element (FE) simulations and experimental measurements are used to study the physical phenomenon. Unidirectional, bidirectional, and quasi-isotropic composite laminates are considered in the study. The influence of different hole diameter to wavelength aspect ratios and different stacking sequences on wave scattering characteristics are investigated. The results show that amplitudes and directivity distribution of the scattered Lamb wave depend on these parameters. In the case of quasi-isotropic composite laminates, the scattering directivity patterns are dominated by the fiber orientation of the outer layers and are quite different for composite laminates with the same number of laminae but different stacking sequence. The study provides improved physical insight into the scattering phenomena at through holes in composite laminates, which is essential to develop, validate, and optimize guided wave damage detection and characterization techniques. © 2011 Acoustical Society of America
Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

Directory of Open Access Journals (Sweden)

Arthur W Pightling

Full Text Available The wide availability of whole-genome sequencing (WGS and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i depth of sequencing coverage, ii choice of reference-guided short-read sequence assembler, iii choice of reference genome, and iv whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT, using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming. We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers
Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

Directory of Open Access Journals (Sweden)

Shade Larry L

2006-06-01

Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

Science.gov (United States)

Bastien, Olivier; Maréchal, Eric

2008-08-07

Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the
Lower Cretaceous paleo-Vertisols and sedimentary interrelationships in stacked alluvial sequences, Utah, USA

Science.gov (United States)

Joeckel, R. M.; Ludvigson, G. A.; Kirkland, J. I.

2017-11-01

The Yellow Cat Member of the Cedar Mountain Formation in Poison Strip, Utah, USA, consists of stacked, erosionally bounded alluvial sequences dominated by massive mudstones (lithofacies Fm) with paleo-Vertisols. Sediment bodies within these sequences grade vertically and laterally into each other at pedogenic boundaries, across which color, texture, and structures (sedimentary vs. pedogenic) change. Slickensides, unfilled (sealed) cracks, carbonate-filled cracks, and deeper cracks filled with sandstone; the latter features suggest thorough desiccation during aridification. Thin sandstones (Sms) in some sequences, typically as well as laminated to massive mudstones (Flm) with which they are interbedded in some cases, are interpreted as avulsion deposits. The termini of many beds of these lithofacies curve upward, parallel to nearby pedogenic slickensides, as the features we call ;turnups.; Turnups are overlain or surrounded by paleosols, but strata sheltered underneath beds with turnups retain primary sedimentary fabrics. Turnups were produced by movement along slickensides during pedogenesis, by differential compaction alongside pre-existing gilgai microhighs, or by a combination of both. Palustrine carbonates (lithofacies C) appear only in the highest or next-highest alluvial sequences, along with a deep paleo-Vertisol that exhibits partially preserved microrelief at the base of the overlying Poison Strip Member. The attributes of the Yellow Cat Member suggest comparatively low accommodation, slow accumulation, long hiatuses in clastic sedimentation, and substantial time intervals of subaerial exposure and pedogenesis; it appears to be distinct among the members of the Cedar Mountain Formation in these respects.
eMatchSite: sequence order-independent structure alignments of ligand binding pockets in protein models.

Directory of Open Access Journals (Sweden)

Michal Brylinski

2014-09-01

Full Text Available Detecting similarities between ligand binding sites in the absence of global homology between target proteins has been recognized as one of the critical components of modern drug discovery. Local binding site alignments can be constructed using sequence order-independent techniques, however, to achieve a high accuracy, many current algorithms for binding site comparison require high-quality experimental protein structures, preferably in the bound conformational state. This, in turn, complicates proteome scale applications, where only various quality structure models are available for the majority of gene products. To improve the state-of-the-art, we developed eMatchSite, a new method for constructing sequence order-independent alignments of ligand binding sites in protein models. Large-scale benchmarking calculations using adenine-binding pockets in crystal structures demonstrate that eMatchSite generates accurate alignments for almost three times more protein pairs than SOIPPA. More importantly, eMatchSite offers a high tolerance to structural distortions in ligand binding regions in protein models. For example, the percentage of correctly aligned pairs of adenine-binding sites in weakly homologous protein models is only 4-9% lower than those aligned using crystal structures. This represents a significant improvement over other algorithms, e.g. the performance of eMatchSite in recognizing similar binding sites is 6% and 13% higher than that of SiteEngine using high- and moderate-quality protein models, respectively. Constructing biologically correct alignments using predicted ligand binding sites in protein models opens up the possibility to investigate drug-protein interaction networks for complete proteomes with prospective systems-level applications in polypharmacology and rational drug repositioning. eMatchSite is freely available to the academic community as a web-server and a stand-alone software distribution at http://www.brylinski.org/ematchsite.
An intuitive graphical webserver for multiple-choice protein sequence search.

Science.gov (United States)

Banky, Daniel; Szalkai, Balazs; Grolmusz, Vince

2014-04-10

Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org. Copyright © 2014 Elsevier B.V. All rights reserved.
CBESW: sequence alignment on the Playstation 3.

Science.gov (United States)

Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil

2008-09-17

The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. For large datasets, our implementation on the PlayStation 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. The results from our experiments demonstrate that the PlayStation 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.
Aligning the unalignable: bacteriophage whole genome alignments.

Science.gov (United States)

Bérard, Sèverine; Chateau, Annie; Pompidor, Nicolas; Guertin, Paul; Bergeron, Anne; Swenson, Krister M

2016-01-13

In recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences. In this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressive Mauve aligner - which implements a partial order strategy, but whose alignments are linearized - shows a greatly improved interactive graphic display, while avoiding misalignments. Multiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains. A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://bitbucket.org/thekswenson/alpha).
Importance of the alignment of polar π conjugated molecules inside carbon nanotubes in determining second-order non-linear optical properties.

Science.gov (United States)

Yumura, Takashi; Yamamoto, Wataru

2017-09-20

We employed density functional theory (DFT) calculations with dispersion corrections to investigate energetically preferred alignments of certain p,p'-dimethylaminonitrostilbene (DANS) molecules inside an armchair (m,m) carbon nanotube (n × DANS@(m,m)), where the number of inner molecules (n) is no greater than 3. Here, three types of alignments of DANS are considered: a linear alignment in a parallel fashion and stacking alignments in parallel and antiparallel fashions. According to DFT calculations, a threshold tube diameter for containing DANS molecules in linear or stacking alignments was found to be approximately 1.0 nm. Nanotubes with diameters smaller than 1.0 nm result in the selective formation of linearly aligned DANS molecules due to strong confinement effects within the nanotubes. By contrast, larger diameter nanotubes allow DANS molecules to align in a stacking and linear fashion. The type of alignment adopted by the DANS molecules inside a nanotube is responsible for their second-order non-linear optical properties represented by their static hyperpolarizability (β 0 values). In fact, we computed β 0 values of DANS assemblies taken from optimized n × DANS@(m,m) structures, and their values were compared with those of a single DANS molecule. DFT calculations showed that β 0 values of DANS molecules depend on their alignment, which decrease in the following order: linear alignment > parallel stacking alignment > antiparallel stacking alignment. In particular, a linear alignment has a β 0 value more significant than that of the same number of isolated molecules. Therefore, the linear alignment of DANS molecules, which is only allowed inside smaller diameter nanotubes, can strongly enhance their second-order non-linear optical properties. Since the nanotube confinement determines the alignment of DANS molecules, a restricted nanospace can be utilized to control their second-order non-linear optical properties. These DFT findings can assist in the

PriFi - Using a Multiple Alignment of Related Sequences to Find Primers for Amplification of Homologs

DEFF Research Database (Denmark)

Fredslund, Jakob; Schauser, Leif; Madsen, Lene Heegaard

2005-01-01

Using a comparative approach, the web program PriFi (http://cgi-www.daimi.au.dk/cgi-chili/PriFi/main) designs pairs of primers useful for PCR amplification of genomic DNA in species where prior sequence information is not available. The program works with an alignment of DNA sequences from phylog...
An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids.

Science.gov (United States)

Li, Yushuang; Song, Tian; Yang, Jiasheng; Zhang, Yi; Yang, Jialiang

2016-01-01

In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector.
Gapped sequence alignment using artificial neural networks: application to the MHC class I system

DEFF Research Database (Denmark)

Andreatta, Massimo; Nielsen, Morten

2016-01-01

. On this relatively simple system, we developed a sequence alignment method based on artificial neural networks that allows insertions and deletions in the alignment. Results: We show that prediction methods based on alignments that include insertions and deletions have significantly higher performance than methods...... trained on peptides of single lengths. Also, we illustrate how the location of deletions can aid the interpretation of the modes of binding of the peptide-MHC, as in the case of long peptides bulging out of the MHC groove or protruding at either terminus. Finally, we demonstrate that the method can learn...... the length profile of different MHC molecules, and quantified the reduction of the experimental effort required to identify potential epitopes using our prediction algorithm. Availability and implementation: The NetMHC-4.0 method for the prediction of peptide-MHC class I binding affinity using gapped...
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

Directory of Open Access Journals (Sweden)

Maréchal Eric

2008-08-01

Full Text Available Abstract Background Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2 following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. Results We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure. Homologous sequences were considered as systems 1 having a high redundancy of information reflected by the magnitude of their alignment scores, 2 which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a
Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

Science.gov (United States)

Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

2016-01-01

The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs
Monte Carlo simulation of a statistical mechanical model of multiple protein sequence alignment.

Science.gov (United States)

Kinjo, Akira R

2017-01-01

A grand canonical Monte Carlo (MC) algorithm is presented for studying the lattice gas model (LGM) of multiple protein sequence alignment, which coherently combines long-range interactions and variable-length insertions. MC simulations are used for both parameter optimization of the model and production runs to explore the sequence subspace around a given protein family. In this Note, I describe the details of the MC algorithm as well as some preliminary results of MC simulations with various temperatures and chemical potentials, and compare them with the mean-field approximation. The existence of a two-state transition in the sequence space is suggested for the SH3 domain family, and inappropriateness of the mean-field approximation for the LGM is demonstrated.
Vertically aligned carbon nanotube field-effect transistors

KAUST Repository

Li, Jingqi; Zhao, Chao; Wang, Qingxiao; Zhang, Qiang; Wang, Zhihong; Zhang, Xixiang; Abutaha, Anas I.; Alshareef, Husam N.

2012-01-01

Vertically aligned carbon nanotube field-effect transistors (CNTFETs) have been developed using pure semiconducting carbon nanotubes. The source and drain were vertically stacked, separated by a dielectric, and the carbon nanotubes were placed
Multiple amino acid sequence alignment nitrogenase component 1: insights into phylogenetics and structure-function relationships.

Directory of Open Access Journals (Sweden)

James B Howard

Full Text Available Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as "core" for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification
JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures.

Science.gov (United States)

Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio

2012-02-15

We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.
Fine-tuning structural RNA alignments in the twilight zone

Directory of Open Access Journals (Sweden)

Schirmer Stefanie

2010-04-01

Full Text Available Abstract Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.
FocusStack and StimServer: a new open source MATLAB toolchain for visual stimulation and analysis of two-photon calcium neuronal imaging data.

Science.gov (United States)

Muir, Dylan R; Kampa, Björn M

2014-01-01

Two-photon calcium imaging of neuronal responses is an increasingly accessible technology for probing population responses in cortex at single cell resolution, and with reasonable and improving temporal resolution. However, analysis of two-photon data is usually performed using ad-hoc solutions. To date, no publicly available software exists for straightforward analysis of stimulus-triggered two-photon imaging experiments. In addition, the increasing data rates of two-photon acquisition systems imply increasing cost of computing hardware required for in-memory analysis. Here we present a Matlab toolbox, FocusStack, for simple and efficient analysis of two-photon calcium imaging stacks on consumer-level hardware, with minimal memory footprint. We also present a Matlab toolbox, StimServer, for generation and sequencing of visual stimuli, designed to be triggered over a network link from a two-photon acquisition system. FocusStack is compatible out of the box with several existing two-photon acquisition systems, and is simple to adapt to arbitrary binary file formats. Analysis tools such as stack alignment for movement correction, automated cell detection and peri-stimulus time histograms are already provided, and further tools can be easily incorporated. Both packages are available as publicly-accessible source-code repositories.
Alignment of whole genomes.

Science.gov (United States)

Delcher, A L; Kasif, S; Fleischmann, R D; Peterson, J; White, O; Salzberg, S L

1999-01-01

A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications. PMID:10325427
Electronic States of High-k Oxides in Gate Stack Structures

Science.gov (United States)

Zhu, Chiyu

In this dissertation, in-situ X-ray and ultraviolet photoemission spectroscopy have been employed to study the interface chemistry and electronic structure of potential high-k gate stack materials. In these gate stack materials, HfO2 and La2O3 are selected as high-k dielectrics, VO2 and ZnO serve as potential channel layer materials. The gate stack structures have been prepared using a reactive electron beam system and a plasma enhanced atomic layer deposition system. Three interrelated issues represent the central themes of the research: 1) the interface band alignment, 2) candidate high-k materials, and 3) band bending, internal electric fields, and charge transfer. 1) The most highlighted issue is the band alignment of specific high-k structures. Band alignment relationships were deduced by analysis of XPS and UPS spectra for three different structures: a) HfO2/VO2/SiO2/Si, b) HfO 2-La2O3/ZnO/SiO2/Si, and c) HfO 2/VO2/ HfO2/SiO2/Si. The valence band offset of HfO2/VO2, ZnO/SiO2 and HfO 2/SiO2 are determined to be 3.4 +/- 0.1, 1.5 +/- 0.1, and 0.7 +/- 0.1 eV. The valence band offset between HfO2-La2O3 and ZnO was almost negligible. Two band alignment models, the electron affinity model and the charge neutrality level model, are discussed. The results show the charge neutrality model is preferred to describe these structures. 2) High-k candidate materials were studied through comparison of pure Hf oxide, pure La oxide, and alloyed Hf-La oxide films. An issue with the application of pure HfO2 is crystallization which may increase the leakage current in gate stack structures. An issue with the application of pure La2O3 is the presence of carbon contamination in the film. Our study shows that the alloyed Hf-La oxide films exhibit an amorphous structure along with reduced carbon contamination. 3) Band bending and internal electric fields in the gate stack structure were observed by XPS and UPS and indicate the charge transfer during the growth and process. The oxygen
CBESW: Sequence Alignment on the Playstation 3

Directory of Open Access Journals (Sweden)

Hieu Nim

2008-09-01

Full Text Available Abstract Background The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation® 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. Results For large datasets, our implementation on the PlayStation® 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. Conclusion The results from our experiments demonstrate that the PlayStation® 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.
Alignment efficiency and discomfort of three orthodontic archwire sequences: a randomized clinical trial.

Science.gov (United States)

Ong, Emily; Ho, Christopher; Miles, Peter

2011-03-01

To compare the efficiency of orthodontic archwire sequences produced by three manufacturers. Prospective, randomized clinical trial with three parallel groups. Private orthodontic practice in Caloundra, QLD, Australia. One hundred and thirty-two consecutive patients were randomized to one of three archwire sequence groups: (i) 3M Unitek, 0·014 inch Nitinol, 0·017 inch × 0·017 inch heat activated Ni-Ti; (ii) GAC international, 0·014 inch Sentalloy, 0·016 × 0·022 inch Bioforce; and (iii) Ormco corporation, 0·014 inch Damon Copper Ni-Ti, 0·014 × 0·025 inch Damon Copper Ni-Ti. All patients received 0·018 × 0·025 inch slot Victory Series™ brackets. Mandibular impressions were taken before the insertion of each archwire. Patients completed discomfort surveys according to a seven-point Likert Scale at 4 h, 24 h, 3 days and 7 days after the insertion of each archwire. Efficiency was measured by time required to reach the working archwire, mandibular anterior alignment and level of discomfort. No significant differences were found in the reduction of irregularity between the archwire sequences at any time-point (T1: P = 0·12; T2: P = 0·06; T3: P = 0·21) or in the time to reach the working archwire (P = 0·28). No significant differences were found in the overall discomfort scores between the archwire sequences (4 h: P = 0·30; 24 h: P = 0·18; 3 days: P = 0·53; 7 days: P = 0·47). When the time-points were analysed individually, the 3M Unitek archwire sequence induced significantly less discomfort than GAC and Ormco archwires 24 h after the insertion of the third archwire (P = 0·02). This could possibly be attributed to the progression in archwire material and archform. The archwire sequences were similar in alignment efficiency and overall discomfort. Progression in archwire dimension and archform may contribute to discomfort levels. This study provides clinical justification for three common archwire sequences in 0·018 × 0·025 inch slot brackets.
MaxAlign: maximizing usable data in an alignment

DEFF Research Database (Denmark)

Oliveira, Rodrigo Gouveia; Sackett, Peter Wad; Pedersen, Anders Gorm

2007-01-01

Align. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. CONCLUSION: We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also...
A mathematical model of an automatic assembler to stack fuel pellets

International Nuclear Information System (INIS)

Jarvis, R.G.; Joynes, R.; Bretzlaff, C.I.

1980-11-01

Fuel elements for CANDU reactors are assembled from stacks of cylindrical UO 2 pellets, with close tolerances on lengths and diameters. Present stacking techniques involve extensive manual operations and they can be speeded up and reduced in cost by an automated device. If gamma-active fuel is handled such a device is essential. An automatic fuel pellet assembly process was modelled mathematically. The model indicated a suitable sequence of pellet manipulations to arrive at a stack length that was always within tolerance. This sequence was used as the inital input for the design of mechanical hardware. The mechanical design and the refinement of the mathematical model proceeded simultaneously. Mechanical constraints were allowed for in the model, and its optimized sequence of operations was incorporated in a microcomputer program to control the mechanical hardware. (auth)
SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment

Directory of Open Access Journals (Sweden)

Scott Barlowe

2017-06-01

Full Text Available Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment
Tablet—next generation sequence assembly visualization

Science.gov (United States)

Milne, Iain; Bayer, Micha; Cardle, Linda; Shaw, Paul; Stephen, Gordon; Wright, Frank; Marshall, David

2010-01-01

Summary: Tablet is a lightweight, high-performance graphical viewer for next-generation sequence assemblies and alignments. Supporting a range of input assembly formats, Tablet provides high-quality visualizations showing data in packed or stacked views, allowing instant access and navigation to any region of interest, and whole contig overviews and data summaries. Tablet is both multi-core aware and memory efficient, allowing it to handle assemblies containing millions of reads, even on a 32-bit desktop machine. Availability: Tablet is freely available for Microsoft Windows, Apple Mac OS X, Linux and Solaris. Fully bundled installers can be downloaded from http://bioinf.scri.ac.uk/tablet in 32- and 64-bit versions. Contact: tablet@scri.ac.uk PMID:19965881
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

Science.gov (United States)

O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

2015-04-01

The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

CSA: An efficient algorithm to improve circular DNA multiple alignment

Directory of Open Access Journals (Sweden)

Pereira Luísa

2009-07-01

Full Text Available Abstract Background The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools. Results In this paper we propose an efficient algorithm that identifies the most interesting region to cut circular genomes in order to improve phylogenetic analysis when using standard multiple sequence alignment algorithms. This algorithm identifies the largest chain of non-repeated longest subsequences common to a set of circular mitochondrial DNA sequences. All the sequences are then rotated and made linear for multiple alignment purposes. To evaluate the effectiveness of this new tool, three different sets of mitochondrial DNA sequences were considered. Other tests considering randomly rotated sequences were also performed. The software package Arlequin was used to evaluate the standard genetic measures of the alignments obtained with and without the use of the CSA algorithm with two well known multiple alignment algorithms, the CLUSTALW and the MAVID tools, and also the visualization tool SinicView. Conclusion The results show that a circularization and rotation pre-processing step significantly improves the efficiency of public available multiple sequence alignment
Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

Science.gov (United States)

Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R

2009-07-01

The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/
BinAligner: a heuristic method to align biological networks.

Science.gov (United States)

Yang, Jialiang; Li, Jun; Grünewald, Stefan; Wan, Xiu-Feng

2013-01-01

The advances in high throughput omics technologies have made it possible to characterize molecular interactions within and across various species. Alignments and comparison of molecular networks across species will help detect orthologs and conserved functional modules and provide insights on the evolutionary relationships of the compared species. However, such analyses are not trivial due to the complexity of network and high computational cost. Here we develop a mixture of global and local algorithm, BinAligner, for network alignments. Based on the hypotheses that the similarity between two vertices across networks would be context dependent and that the information from the edges and the structures of subnetworks can be more informative than vertices alone, two scoring schema, 1-neighborhood subnetwork and graphlet, were introduced to derive the scoring matrices between networks, besides the commonly used scoring scheme from vertices. Then the alignment problem is formulated as an assignment problem, which is solved by the combinatorial optimization algorithm, such as the Hungarian method. The proposed algorithm was applied and validated in aligning the protein-protein interaction network of Kaposi's sarcoma associated herpesvirus (KSHV) and that of varicella zoster virus (VZV). Interestingly, we identified several putative functional orthologous proteins with similar functions but very low sequence similarity between the two viruses. For example, KSHV open reading frame 56 (ORF56) and VZV ORF55 are helicase-primase subunits with sequence identity 14.6%, and KSHV ORF75 and VZV ORF44 are tegument proteins with sequence identity 15.3%. These functional pairs can not be identified if one restricts the alignment into orthologous protein pairs. In addition, BinAligner identified a conserved pathway between two viruses, which consists of 7 orthologous protein pairs and these proteins are connected by conserved links. This pathway might be crucial for virus packing and
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

Science.gov (United States)

González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

2016-12-15

MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Stochastic sampling of the RNA structural alignment space.

Science.gov (United States)

Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H

2009-07-01

A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the 'structural alignment' space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The 'best' centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.
The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies.

Directory of Open Access Journals (Sweden)

Patrick D Schloss

Full Text Available Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of beta-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results
Alignment methods: strategies, challenges, benchmarking, and comparative overview.

Science.gov (United States)

Löytynoja, Ari

2012-01-01

Comparative evolutionary analyses of molecular sequences are solely based on the identities and differences detected between homologous characters. Errors in this homology statement, that is errors in the alignment of the sequences, are likely to lead to errors in the downstream analyses. Sequence alignment and phylogenetic inference are tightly connected and many popular alignment programs use the phylogeny to divide the alignment problem into smaller tasks. They then neglect the phylogenetic tree, however, and produce alignments that are not evolutionarily meaningful. The use of phylogeny-aware methods reduces the error but the resulting alignments, with evolutionarily correct representation of homology, can challenge the existing practices and methods for viewing and visualising the sequences. The inter-dependency of alignment and phylogeny can be resolved by joint estimation of the two; methods based on statistical models allow for inferring the alignment parameters from the data and correctly take into account the uncertainty of the solution but remain computationally challenging. Widely used alignment methods are based on heuristic algorithms and unlikely to find globally optimal solutions. The whole concept of one correct alignment for the sequences is questionable, however, as there typically exist vast numbers of alternative, roughly equally good alignments that should also be considered. This uncertainty is hidden by many popular alignment programs and is rarely correctly taken into account in the downstream analyses. The quest for finding and improving the alignment solution is complicated by the lack of suitable measures of alignment goodness. The difficulty of comparing alternative solutions also affects benchmarks of alignment methods and the results strongly depend on the measure used. As the effects of alignment error cannot be predicted, comparing the alignments' performance in downstream analyses is recommended.
Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment.

Science.gov (United States)

Baichoo, Shakuntala; Ouzounis, Christos A

A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.
GenNon-h: Generating multiple sequence alignments on nonhomogeneous phylogenetic trees

Directory of Open Access Journals (Sweden)

Kedzierska Anna M

2012-08-01

Full Text Available Abstract Background A number of software packages are available to generate DNA multiple sequence alignments (MSAs evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages. Results We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site, the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. Conclusion The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.
STELLAR: fast and exact local alignments

Directory of Open Access Journals (Sweden)

Weese David

2011-10-01

Full Text Available Abstract Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de.
PriFi - Using a Multiple Alignment of Related Sequences to Find Primers for Amplification of Homologs

DEFF Research Database (Denmark)

Fredslund, Jakob; Schauser, Leif; Madsen, Lene Heegaard

2005-01-01

Using a comparative approach, the web program PriFi (http://cgi-www.daimi.au.dk/cgi-chili/PriFi/main) designs pairs of primers useful for PCR amplification of genomic DNA in species where prior sequence information is not available. The program works with an alignment of DNA sequences from...... of a procedure for developing general markers serving as common anchor loci across species. To accommodate users with special preferences, configuration settings and criteria can be customized....
Rigid-body motion correction of the liver in image reconstruction for golden-angle stack-of-stars DCE MRI.

Science.gov (United States)

Johansson, Adam; Balter, James; Cao, Yue

2018-03-01

Respiratory motion can affect pharmacokinetic perfusion parameters quantified from liver dynamic contrast-enhanced MRI. Image registration can be used to align dynamic images after reconstruction. However, intra-image motion blur remains after alignment and can alter the shape of contrast-agent uptake curves. We introduce a method to correct for inter- and intra-image motion during image reconstruction. Sixteen liver dynamic contrast-enhanced MRI examinations of nine subjects were performed using a golden-angle stack-of-stars sequence. For each examination, an image time series with high temporal resolution but severe streak artifacts was reconstructed. Images were aligned using region-limited rigid image registration within a region of interest covering the liver. The transformations resulting from alignment were used to correct raw data for motion by modulating and rotating acquired lines in k-space. The corrected data were then reconstructed using view sharing. Portal-venous input functions extracted from motion-corrected images had significantly greater peak signal enhancements (mean increase: 16%, t-test, P < 0.001) than those from images aligned using image registration after reconstruction. In addition, portal-venous perfusion maps estimated from motion-corrected images showed fewer artifacts close to the edge of the liver. Motion-corrected image reconstruction restores uptake curves distorted by motion. Motion correction also reduces motion artifacts in estimated perfusion parameter maps. Magn Reson Med 79:1345-1353, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Multi-Stacked Supported Lipid Bilayer Micropatterning through Polymer Stencil Lift-Off

Directory of Open Access Journals (Sweden)

Yujie Zhu

2015-08-01

Full Text Available Complex multi-lamellar structures play a critical role in biological systems, where they are present as lamellar bodies, and as part of biological assemblies that control energy transduction processes. Multi-lamellar lipid layers not only provide interesting systems for fundamental research on membrane structure and bilayer-associated polypeptides, but can also serve as components in bioinspired materials or devices. Although the ability to pattern stacked lipid bilayers at the micron scale is of importance for these purposes, limited work has been done in developing such patterning techniques. Here, we present a simple and direct approach to pattern stacked supported lipid bilayers (SLBs using polymer stencil lift-off and the electrostatic interactions between cationic and anionic lipids. Both homogeneous and phase-segregated stacked SLB patterns were produced, demonstrating that the stacked lipid bilayers retain lateral diffusivity. We demonstrate patterned SLB stacks of up to four bilayers, where fluorescence resonance energy transfer (FRET and quenching was used to probe the interactions between lipid bilayers. Furthermore, the study of lipid phase behaviour showed that gel phase domains align between adjacent layers. The proposed stacked SLB pattern platform provides a robust model for studying lipid behaviour with a controlled number of bilayers, and an attractive means towards building functional bioinspired materials or devices.
Mango: multiple alignment with N gapped oligos.

Science.gov (United States)

Zhang, Zefeng; Lin, Hao; Li, Ming

2008-06-01

Multiple sequence alignment is a classical and challenging task. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state-of-the-art works suffer from the "once a gap, always a gap" phenomenon. Is there a radically new way to do multiple sequence alignment? In this paper, we introduce a novel and orthogonal multiple sequence alignment method, using both multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole and tries to build the alignment vertically, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds have proved significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks, showing that MANGO compares favorably, in both accuracy and speed, against state-of-the-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, ProbConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0, and Kalign 2.0. We have further demonstrated the scalability of MANGO on very large datasets of repeat elements. MANGO can be downloaded at http://www.bioinfo.org.cn/mango/ and is free for academic usage.
Improving your target-template alignment with MODalign.

KAUST Repository

Barbato, Alessandro

2012-02-04

SUMMARY: MODalign is an interactive web-based tool aimed at helping protein structure modelers to inspect and manually modify the alignment between the sequences of a target protein and of its template(s). It interactively computes, displays and, upon modification of the target-template alignment, updates the multiple sequence alignments of the two protein families, their conservation score, secondary structure and solvent accessibility values, and local quality scores of the implied three-dimensional model(s). Although it has been designed to simplify the target-template alignment step in modeling, it is suitable for all cases where a sequence alignment needs to be inspected in the context of other biological information. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modalign. Website implemented in HTML and JavaScript with all major browsers supported. CONTACT: jan.kosinski@uniroma1.it.
Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

Directory of Open Access Journals (Sweden)

Perry Evans

Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.
Long Read Alignment with Parallel MapReduce Cloud Platform

Science.gov (United States)

Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

2015-01-01

Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887
DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

Science.gov (United States)

Ma, Wenxiu; Yang, Lin; Rohs, Remo; Noble, William Stafford

2017-10-01

Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values. The software is available at https://bitbucket.org/wenxiu/sequence-shape.git. rohs@usc.edu or william-noble@uw.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.

Directory of Open Access Journals (Sweden)

Qi Zheng

2016-10-01

Full Text Available Accurate mapping of next-generation sequencing (NGS reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.
AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.

Science.gov (United States)

Zheng, Qi; Grice, Elizabeth A

2016-10-01

Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.

A cross-species alignment tool (CAT)

DEFF Research Database (Denmark)

Li, Heng; Guan, Liang; Liu, Tao

2007-01-01

BACKGROUND: The main two sorts of automatic gene annotation frameworks are ab initio and alignment-based, the latter splitting into two sub-groups. The first group is used for intra-species alignments, among which are successful ones with high specificity and speed. The other group contains more...... sensitive methods which are usually applied in aligning inter-species sequences. RESULTS: Here we present a new algorithm called CAT (for Cross-species Alignment Tool). It is designed to align mRNA sequences to mammalian-sized genomes. CAT is implemented using C scripts and is freely available on the web...... at http://xat.sourceforge.net/. CONCLUSIONS: Examined from different angles, CAT outperforms other extant alignment tools. Tested against all available mouse-human and zebrafish-human orthologs, we demonstrate that CAT combines the specificity and speed of the best intra-species algorithms, like BLAT...
Capillary self-alignment dynamics for R2R manufacturing of mesoscopic system-in-foil devices

NARCIS (Netherlands)

Arutinov, G.; Quintero, A.V.; Smits, E.C.P.; Remoortere, B. van; Brand, J. van den; Schoo, H.F.M.; Briand, D.; Rooij, N.F. de; Dietzel, A.H.

2012-01-01

This paper reports a study on the dynamics of foil-based functional component self-alignment onto patterned test substrates and its demonstration when integrating a flexible sensor onto a printed circuitry. We investigate the dependence of alignment time and final precision of stacking of mm- and
Image stack alignment in full-field X-ray absorption spectroscopy using SIFT_PyOCL.

Science.gov (United States)

Paleo, Pierre; Pouyet, Emeline; Kieffer, Jérôme

2014-03-01

Full-field X-ray absorption spectroscopy experiments allow the acquisition of millions of spectra within minutes. However, the construction of the hyperspectral image requires an image alignment procedure with sub-pixel precision. While the image correlation algorithm has originally been used for image re-alignment using translations, the Scale Invariant Feature Transform (SIFT) algorithm (which is by design robust versus rotation, illumination change, translation and scaling) presents an additional advantage: the alignment can be limited to a region of interest of any arbitrary shape. In this context, a Python module, named SIFT_PyOCL, has been developed. It implements a parallel version of the SIFT algorithm in OpenCL, providing high-speed image registration and alignment both on processors and graphics cards. The performance of the algorithm allows online processing of large datasets.
GATA: A graphic alignment tool for comparative sequenceanalysis

Energy Technology Data Exchange (ETDEWEB)

Nix, David A.; Eisen, Michael B.

2005-01-01

Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.
DIDA: Distributed Indexing Dispatched Alignment.

Directory of Open Access Journals (Sweden)

Hamid Mohamadi

Full Text Available One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA, and is free for academic use.
Improving your target-template alignment with MODalign

OpenAIRE

Barbato, Alessandro; Benkert, Pascal; Schwede, Torsten; Tramontano, Anna; Kosinski, Jan

2012-01-01

Summary: MODalign is an interactive web-based tool aimed at helping protein structure modelers to inspect and manually modify the alignment between the sequences of a target protein and of its template(s). It interactively computes, displays and, upon modification of the target-template alignment, updates the multiple sequence alignments of the two protein families, their conservation score, secondary structure and solvent accessibility values, and local quality scores of the implied three-di...
Evaluation of flow fields on bubble removal and system performance in an ammonium bicarbonate reverse electrodialysis stack

KAUST Repository

Hatzell, Marta C.

2013-11-01

Ammonium bicarbonate has recently been demonstrated to be an excellent thermolytic solution for energy generation in reverse electrodialysis (RED) stacks. However, operating RED stacks at room temperatures can promote gaseous bubble (CO2, NH3) accumulation within the stack, reducing overall system performance. The management and minimization of bubbles formed in RED flow fields is an important operational issue which has yet to be addressed. Flow fields with and without spacers in RED stacks were analyzed to determine how both fluid flow and the buildup and removal of bubbles affected performance. In the presence of a spacer, the membrane resistance increased by ~50Ω, resulting in a decrease in power density by 30% from 0.140Wm-2 to 0.093Wm-2. Shorter channels reduced concentration polarization affects, and resulted in 3-23% higher limiting current density. Gas accumulation was minimized through the use of short vertically aligned channels, and consequently the amount of the membrane area covered by bubbles was reduced from ~20% to 7% which caused a 12% increase in power density. As ammonium bicarbonate RED systems are scaled up, attention to channel aspect ratio, length, and alignment will enable more stable performance. © 2013 Elsevier B.V.
SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

Science.gov (United States)

Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

2016-06-15

Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available
SFESA: a web server for pairwise alignment refinement by secondary structure shifts.

Science.gov (United States)

Tong, Jing; Pei, Jimin; Grishin, Nick V

2015-09-03

Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
Establishing a framework for comparative analysis of genome sequences

Energy Technology Data Exchange (ETDEWEB)

Bansal, A.K.

1995-06-01

This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.
GuiTope: an application for mapping random-sequence peptides to protein sequences.

Science.gov (United States)

Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

2012-01-03

Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.
GuiTope: an application for mapping random-sequence peptides to protein sequences

Directory of Open Access Journals (Sweden)

Halperin Rebecca F

2012-01-01

Full Text Available Abstract Background Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. Results GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. Conclusions GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.
Moving object detection in top-view aerial videos improved by image stacking

Science.gov (United States)

Teutsch, Michael; Krüger, Wolfgang; Beyerer, Jürgen

2017-08-01

Image stacking is a well-known method that is used to improve the quality of images in video data. A set of consecutive images is aligned by applying image registration and warping. In the resulting image stack, each pixel has redundant information about its intensity value. This redundant information can be used to suppress image noise, resharpen blurry images, or even enhance the spatial image resolution as done in super-resolution. Small moving objects in the videos usually get blurred or distorted by image stacking and thus need to be handled explicitly. We use image stacking in an innovative way: image registration is applied to small moving objects only, and image warping blurs the stationary background that surrounds the moving objects. Our video data are coming from a small fixed-wing unmanned aerial vehicle (UAV) that acquires top-view gray-value images of urban scenes. Moving objects are mainly cars but also other vehicles such as motorcycles. The resulting images, after applying our proposed image stacking approach, are used to improve baseline algorithms for vehicle detection and segmentation. We improve precision and recall by up to 0.011, which corresponds to a reduction of the number of false positive and false negative detections by more than 3 per second. Furthermore, we show how our proposed image stacking approach can be implemented efficiently.
Long Read Alignment with Parallel MapReduce Cloud Platform

Directory of Open Access Journals (Sweden)

Ahmed Abdulhakim Al-Absi

2015-01-01

Full Text Available Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner’s Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.
Vertically aligned carbon nanotube field-effect transistors

KAUST Repository

Li, Jingqi

2012-10-01

Vertically aligned carbon nanotube field-effect transistors (CNTFETs) have been developed using pure semiconducting carbon nanotubes. The source and drain were vertically stacked, separated by a dielectric, and the carbon nanotubes were placed on the sidewall of the stack to bridge the source and drain. Both the effective gate dielectric and gate electrode were normal to the substrate surface. The channel length is determined by the dielectric thickness between source and drain electrodes, making it easier to fabricate sub-micrometer transistors without using time-consuming electron beam lithography. The transistor area is much smaller than the planar CNTFET due to the vertical arrangement of source and drain and the reduced channel area. © 2012 Elsevier Ltd. All rights reserved.
Iridium catalyzed growth of vertically aligned CNTs by APCVD

International Nuclear Information System (INIS)

Sahoo, R.K.; Jacob, C.

2014-01-01

Highlights: • Growth of uniform-diameter vertically-aligned multi-walled CNTs by APCVD. • Use of high melting point low carbon solubility iridium nanoparticles as catalyst. • Optimization of growth time for uniform sized, uniformly aligned CNTs. • Growth model for the various features in the vertically aligned CNTs is proposed. - Abstract: Vertically aligned carbon nanotubes (VA-CNTs) have been synthesized using high temperature catalyst nanoparticles of iridium. The catalyst layer was prepared by DC sputtering. Particle density, circularity and average particle size of the catalyst were analyzed using field emission scanning electron microscopy. The alignment, morphology and the length of the as-grown CNTs were analyzed using field-emission scanning electron microscopy. High resolution transmission electron microscopy was carried out to observe the layers of graphitic stacking which form the carbon nanotubes. Micro Raman measurement was used for the analysis of the graphitic crystallinity of the as-grown carbon nano structures. Effects of growth time variation on growth morphology and alignment have been studied. The alignment has been explained on the basis of the crowding effect of the neighboring nanoparticles
Iridium catalyzed growth of vertically aligned CNTs by APCVD

Energy Technology Data Exchange (ETDEWEB)

Sahoo, R.K.; Jacob, C., E-mail: cxj14_holiday@yahoo.com

2014-07-01

Highlights: • Growth of uniform-diameter vertically-aligned multi-walled CNTs by APCVD. • Use of high melting point low carbon solubility iridium nanoparticles as catalyst. • Optimization of growth time for uniform sized, uniformly aligned CNTs. • Growth model for the various features in the vertically aligned CNTs is proposed. - Abstract: Vertically aligned carbon nanotubes (VA-CNTs) have been synthesized using high temperature catalyst nanoparticles of iridium. The catalyst layer was prepared by DC sputtering. Particle density, circularity and average particle size of the catalyst were analyzed using field emission scanning electron microscopy. The alignment, morphology and the length of the as-grown CNTs were analyzed using field-emission scanning electron microscopy. High resolution transmission electron microscopy was carried out to observe the layers of graphitic stacking which form the carbon nanotubes. Micro Raman measurement was used for the analysis of the graphitic crystallinity of the as-grown carbon nano structures. Effects of growth time variation on growth morphology and alignment have been studied. The alignment has been explained on the basis of the crowding effect of the neighboring nanoparticles.
Automated whole-genome multiple alignment of rat, mouse, and human

Energy Technology Data Exchange (ETDEWEB)

Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

2004-07-04

We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.
Relative role of transfer zones in controlling sequence stacking patterns and facies distribution: insights from the Fushan Depression, South China Sea

Science.gov (United States)

Liu, Entao; Wang, Hua; Li, Yuan; Huang, Chuanyan

2015-04-01

In sedimentary basins, a transfer zone can be defined as a coordinated system of deformational features which has good prospects for hydrocarbon exploration. Although the term 'transfer zone' has been widely applied to the study of extensional basins, little attention has been paid to its controlling effect on sequence tracking pattern and depositional facies distribution. Fushan Depression is a half-graben rift sub-basin, located in the southeast of the Beibuwan Basin, South China Sea. In this study, comparative analysis of seismic reflection, palaeogeomorphology, fault activity and depositional facies distribution in the southern slope indicates that three different types of sequence stacking patterns (i.e. multi-level step-fault belt in the western area, flexure slope belt in the central area, gentle slope belt in the eastern area) were developed along the southern slope, together with a large-scale transfer zone in the central area, at the intersection of the western and eastern fault systems. Further analysis shows that the transfer zone played an important role in the diversity of sequence stacking patterns in the southern slope by dividing the Fushan Depression into two non-interfering tectonic systems forming different sequence patterns, and leading to the formation of the flexure slope belt in the central area. The transfer zone had an important controlling effect on not only the diversity of sequence tracking patterns, but also the facies distribution on the relay ramp. During the high-stand stage, under the controlling effect of the transfer zone, the sediments contain a significant proportion of coarser material accumulated and distributed along the ramp axis. By contrast, during the low-stand stage, the transfer zone did not seem to contribute significantly to the low-stand fan distribution which was mainly controlled by the slope gradient (palaeogeomorphology). Therefore, analysis of the transfer zone can provide a new perspective for basin analysis
Dependence of intermetallic compound formation on the sublayer stacking sequence in Ag–Sn bilayer thin films

International Nuclear Information System (INIS)

Rossi, P.J.; Zotov, N.; Bischoff, E.; Mittemeijer, E.J.

2016-01-01

Intermetallic compound (IMC) formation in thermally-evaporated Ag–Sn bilayer thin films has been investigated employing especially X-ray diffraction (XRD) and (S)TEM methods. The specific IMCs that are present in the as-deposited state depend sensitively on the stacking sequence of the sublayers. In case of Sn on top of Ag, predominantly Ag 3 Sn is formed, whereas Ag 4 Sn is predominantly present in the as-deposited state for Ag on top of Sn. In the latter case this is accompanied by an extremely fast uptake of a large amount of Sn by the Ag sublayer, leaving behind macroscopic voids in the Sn sublayer. The results are discussed on the basis of the thermodynamics and kinetics of (IMC) product-layer growth in thin films. It is shown that both thermodynamic and kinetic arguments explain the contrasting phenomena observed.

Computation of the lamina stacking sequence effect on elastic moduli of a plain-weave Nicalon/SiC laminated composite with a [0/30/60] lay-up

International Nuclear Information System (INIS)

Zhao Wei; Yu Niann-i

1998-01-01

Estimation of the elastic modulus is important in engineering design. One difference between CFCCs (continuous fiber-reinforced ceramic-matrix composites), and CMCs (whisker, particulate, or short fiber-reinforced ceramic-matrix composites), is that the anisotropic behavior of CFCCs plays an important role in affecting their mechanical behavior. This feature may also contribute to the variation of elastic properties and strengths of CFCC. In this paper, a Fortran program is developed to quantify the lamina stacking sequence effect on the effective elastic moduli of the laminated CFCCs. The material for modeling is a plain-weave Nicalon fiber-reinforced silicon carbide (Nicalon/SiC) CFCCs. Results show that various stacking sequences within the CFCC (a [0/30/60] lay-up) will give different effective elastic moduli of the CFCCs. This trend leads to a variation of the slope of the linear portion on the flexural stress-strain curve, i.e., changing the position of the starting point of the non-linear portion, and the shape of the whole curve, which gives a different value of the peak stress in the curve. (orig.)
MUMmer4: A fast and versatile genome alignment system.

Directory of Open Access Journals (Sweden)

Guillaume Marçais

2018-01-01

Full Text Available The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.
The effects of stacking sequence and thermal cycling on the flexural properties of laminate composites of aluminium-epoxy/basalt-glass fibres

Science.gov (United States)

Abdollahi Azghan, Mehdi; Eslami-Farsani, Reza

2018-02-01

The current study aimed at investigating the effects of different stacking sequences and thermal cycling on the flexural properties of fibre metal laminates (FMLs). FMLs were composed of two aluminium alloy 2024-T3 sheets and epoxy polymer-matrix composites that have four layers of basalt and/or glass fibres with five different stacking sequences. For FML samples the thermal cycle time was about 6 min for temperature cycles from 25 °C to 115 °C. Flexural properties of samples evaluated after 55 thermal cycles and compared to non-exposed samples. Surface modification of aluminium performed by electrochemical treatment (anodizing) method and aluminium surfaces have been examined by scanning electron microscopy (SEM). Also, the flexural failure mechanisms investigated by the optical microscope study of fractured surfaces. SEM images indicated that the porosity of the aluminium surface increased after anodizing process. The findings of the present study showed that flexural modulus were maximum for basalt fibres based FML, minimum for glass fibres based FML while basalt/glass fibres based FML lies between them. Due to change in the failure mechanism of basalt/glass fibres based FMLs that have glass fibres at outer layer of the polymer composite, the flexural strength of this FML is lower than glass and basalt fibres based FML. After thermal cycling, due to the good thermal properties of basalt fibres, flexural properties of basalt fibres based FML structures decreased less than other composites.
HIV Sequence Compendium 2010

Energy Technology Data Exchange (ETDEWEB)

Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2010-12-31

This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.
HIV Sequence Compendium 2015

Energy Technology Data Exchange (ETDEWEB)

Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2015-10-05

This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.
libgapmis: extending short-read alignments.

Science.gov (United States)

Alachiotis, Nikolaos; Berger, Simon; Flouri, Tomáš; Pissis, Solon P; Stamatakis, Alexandros

2013-01-01

A wide variety of short-read alignment programmes have been published recently to tackle the problem of mapping millions of short reads to a reference genome, focusing on different aspects of the procedure such as time and memory efficiency, sensitivity, and accuracy. These tools allow for a small number of mismatches in the alignment; however, their ability to allow for gaps varies greatly, with many performing poorly or not allowing them at all. The seed-and-extend strategy is applied in most short-read alignment programmes. After aligning a substring of the reference sequence against the high-quality prefix of a short read--the seed--an important problem is to find the best possible alignment between a substring of the reference sequence succeeding and the remaining suffix of low quality of the read--extend. The fact that the reads are rather short and that the gap occurrence frequency observed in various studies is rather low suggest that aligning (parts of) those reads with a single gap is in fact desirable. In this article, we present libgapmis, a library for extending pairwise short-read alignments. Apart from the standard CPU version, it includes ultrafast SSE- and GPU-based implementations. libgapmis is based on an algorithm computing a modified version of the traditional dynamic-programming matrix for sequence alignment. Extensive experimental results demonstrate that the functions of the CPU version provided in this library accelerate the computations by a factor of 20 compared to other programmes. The analogous SSE- and GPU-based implementations accelerate the computations by a factor of 6 and 11, respectively, compared to the CPU version. The library also provides the user the flexibility to split the read into fragments, based on the observed gap occurrence frequency and the length of the read, thereby allowing for a variable, but bounded, number of gaps in the alignment. We present libgapmis, a library for extending pairwise short-read alignments. We
In silico site-directed mutagenesis informs species-specific predictions of chemical susceptibility derived from the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool

Science.gov (United States)

The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to address needs for rapid, cost effective methods of species extrapolation of chemical susceptibility. Specifically, the SeqAPASS tool compares the primary sequence (Level 1), functiona...
AB stacked few layer graphene growth by chemical vapor deposition on single crystal Rh(1 1 1) and electronic structure characterization

International Nuclear Information System (INIS)

Kordatos, Apostolis; Kelaidis, Nikolaos; Giamini, Sigiava Aminalragia; Marquez-Velasco, Jose; Xenogiannopoulou, Evangelia; Tsipas, Polychronis; Kordas, George; Dimoulas, Athanasios

2016-01-01

Highlights: • Growth of non-defective few layer graphene on Rh(1 1 1) substrates using an ambient- pressure CVD method. • Control of graphene stacking order via the cool-down rate. • Graphene is grown with a mainly AB-stacking geometry on single-crystalline Rhodium for a slow cool-down rate and non-AB for a very fast cool-down. • Good epitaxial orientation of the surface is presented through the RHEED data and confirmed with ARPES characterization for the lower cool-down rate, where graphene's ΓK direction a perfectly aligned with the ΓK direction of the Rh(1 1 1) single crystal. - Abstract: Graphene synthesis on single crystal Rh(1 1 1) catalytic substrates is performed by Chemical Vapor Deposition (CVD) at 1000 °C and atmospheric pressure. Raman analysis shows full substrate coverage with few layer graphene. It is found that the cool-down rate strongly affects the graphene stacking order. When lowered, the percentage of AB (Bernal) -stacked regions increases, leading to an almost full AB stacking order. When increased, the percentage of AB-stacked graphene regions decreases to a point where almost a full non AB-stacked graphene is grown. For a slow cool-down rate, graphene with AB stacking order and good epitaxial orientation with the substrate is achieved. This is indicated mainly by Raman characterization and confirmed by Reflection high-energy electron diffraction (RHEED) imaging. Additional Scanning Tunneling Microscopy (STM) topography data confirm that the grown graphene is mainly an AB-stacked structure. The electronic structure of the graphene/Rh(1 1 1) system is examined by Angle resolved Photo-Emission Spectroscopy (ARPES), where σ and π bands of graphene, are observed. Graphene's ΓK direction is aligned with the ΓK direction of the substrate, indicating no significant contribution from rotated domains.
AB stacked few layer graphene growth by chemical vapor deposition on single crystal Rh(1 1 1) and electronic structure characterization

Energy Technology Data Exchange (ETDEWEB)

Kordatos, Apostolis [National Center for Scientific Research “Demokritos”, Athens, 15310 (Greece); Kelaidis, Nikolaos, E-mail: n.kelaidis@inn.demokritos.gr [National Center for Scientific Research “Demokritos”, Athens, 15310 (Greece); Giamini, Sigiava Aminalragia [National Center for Scientific Research “Demokritos”, Athens, 15310 (Greece); University of Athens, Department of Physics, Section of Solid State Physics, Athens, 15684 Greece (Greece); Marquez-Velasco, Jose [National Center for Scientific Research “Demokritos”, Athens, 15310 (Greece); National Technical University of Athens, Department of Physics, Athens, 15784 Greece (Greece); Xenogiannopoulou, Evangelia; Tsipas, Polychronis; Kordas, George; Dimoulas, Athanasios [National Center for Scientific Research “Demokritos”, Athens, 15310 (Greece)

2016-04-30

Highlights: • Growth of non-defective few layer graphene on Rh(1 1 1) substrates using an ambient- pressure CVD method. • Control of graphene stacking order via the cool-down rate. • Graphene is grown with a mainly AB-stacking geometry on single-crystalline Rhodium for a slow cool-down rate and non-AB for a very fast cool-down. • Good epitaxial orientation of the surface is presented through the RHEED data and confirmed with ARPES characterization for the lower cool-down rate, where graphene's ΓK direction a perfectly aligned with the ΓK direction of the Rh(1 1 1) single crystal. - Abstract: Graphene synthesis on single crystal Rh(1 1 1) catalytic substrates is performed by Chemical Vapor Deposition (CVD) at 1000 °C and atmospheric pressure. Raman analysis shows full substrate coverage with few layer graphene. It is found that the cool-down rate strongly affects the graphene stacking order. When lowered, the percentage of AB (Bernal) -stacked regions increases, leading to an almost full AB stacking order. When increased, the percentage of AB-stacked graphene regions decreases to a point where almost a full non AB-stacked graphene is grown. For a slow cool-down rate, graphene with AB stacking order and good epitaxial orientation with the substrate is achieved. This is indicated mainly by Raman characterization and confirmed by Reflection high-energy electron diffraction (RHEED) imaging. Additional Scanning Tunneling Microscopy (STM) topography data confirm that the grown graphene is mainly an AB-stacked structure. The electronic structure of the graphene/Rh(1 1 1) system is examined by Angle resolved Photo-Emission Spectroscopy (ARPES), where σ and π bands of graphene, are observed. Graphene's ΓK direction is aligned with the ΓK direction of the substrate, indicating no significant contribution from rotated domains.
Predicting Consensus Structures for RNA Alignments Via Pseudo-Energy Minimization

Directory of Open Access Journals (Sweden)

Junilda Spirollari

2009-01-01

Full Text Available Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http:// datalab.njit.edu/biology/RSpredict.
A new prosthetic alignment device to read and record prosthesis alignment data.

Science.gov (United States)

Pirouzi, Gholamhossein; Abu Osman, Noor Azuan; Ali, Sadeeq; Davoodi Makinejad, Majid

2017-12-01

Prosthetic alignment is an essential process to rehabilitate patients with amputations. This study presents, for the first time, an invented device to read and record prosthesis alignment data. The digital device consists of seven main parts: the trigger, internal shaft, shell, sensor adjustment button, digital display, sliding shell, and tip. The alignment data were read and recorded by the user or a computer to replicate prosthesis adjustment for future use or examine the sequence of changes in alignment and its effect on the posture of the patient. Alignment data were recorded at the anterior/posterior and medial/lateral positions for five patients. Results show the high level of confidence to record alignment data and replicate adjustments. Therefore, the device helps patients readjust their prosthesis by themselves, or prosthetists to perform adjustment for patients and analyze the effects of malalignment.
Charge transfer in pi-stacked systems including DNA

International Nuclear Information System (INIS)

Siebbeles, L.D.A.

2003-01-01

Charge migration in DNA is a subject of intense current study motivated by long-range detection of DNA damage and the potential application of DNA as a molecular wire in nanoscale electronic devices. A key structural element, which makes DNA a medium for long-range charge transfer, is the array of stacked base pairs in the interior of the double helix. The overlapping pi-orbitals of the nucleobases provide a pathway for motion of charge carriers generated on the stack. This 'pi-pathway' resembles the columnarly stacked macrocyclic cores in discotic materials such as triphenylenes. The structure of these pi-stacked systems is highly disordered with dynamic fluctuations occurring on picosecond to nanosecond time scales. Theoretical calculations, concerning the effects of structural disorder and nucleobase sequence in DNA, on the dynamics of charge carriers are presented. Electronic couplings and localization energies of charge carriers were calculated using density functional theory (DFT). Results for columnarly stacked triphenylenes and DNA nucleobases are compared. The results are used to provide insight into the factors that control the mobility of charge carriers. Further, experimental results on the site-selective oxidation of guanine nucleobases in DNA (hot spots for DNA damage) are analyzed on basis of the theoretical results
Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

Science.gov (United States)

Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

2018-02-01

Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .
Ab initio study on stacking sequences, free energy, dynamical stability and potential energy surfaces of graphite structures

International Nuclear Information System (INIS)

Anees, P; Valsakumar, M C; Chandra, Sharat; Panigrahi, B K

2014-01-01

Ab initio simulations have been performed to study the structure, energetics and stability of several plausible stacking sequences in graphite. These calculations suggest that in addition to the standard structures, graphite can also exist in AA-simple hexagonal, AB-orthorhombic and ABC-hexagonal type stacking. The free energy difference between these structures is very small (∼1 meV/atom), and hence all the structures can coexist from purely energetic considerations. Calculated x-ray diffraction patterns are similar to those of the standard structures for 2θ ⩽ 70°. Shear elastic constant C 44 is negative in AA-simple hexagonal, AB-orthorhombic and ABC-hexagonal structures, suggesting that these structures are mechanically unstable. Phonon dispersions show that the frequencies of some modes along the Γ–A direction in the Brillouin zone are imaginary in all of the new structures, implying that these structures are dynamically unstable. Incorporation of zero point vibrational energy via the quasi-harmonic approximation does not result in the restoration of dynamical stability. Potential energy surfaces for the unstable normal modes are seen to have the topography of a potential hill for all the new structures, confirming that all of the new structures are inherently unstable. The fact that the potential energy surface is not in the form of a double well implies that the structures are linearly as well as globally unstable. (paper)
Multiple Whole Genome Alignments Without a Reference Organism

Energy Technology Data Exchange (ETDEWEB)

Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

2009-01-16

Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

Directory of Open Access Journals (Sweden)

Kierzynka Michal

2011-05-01

Full Text Available Abstract Background Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. Results In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. Conclusions The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
Structure and stacking faults in layered Mg-Zn-Y alloys: A first-principles study

International Nuclear Information System (INIS)

Datta, Aditi; Waghmare, U.V.; Ramamurty, U.

2008-01-01

We use first-principles density functional theory total energy calculations based on pseudo-potentials and plane-wave basis to assess stability of the periodic structures with different stacking sequences in Mg-Zn-Y alloys. For pure Mg, we find that the 6-layer (6l) structure with the ABACAB stacking is most stable after the lowest energy hcp (2l) structure with ABAB stacking. Addition of 2 at.% Y leads to stabilization of the structure to 6l sequence whereas the addition of 2 at.% Zn makes the 6l energetically comparable to that of the hcp. Stacking fault (SF) on the basal plane of 6l structure is higher in energy than that of the hcp 2l Mg, which further increases upon Y doping and decreases significantly with Zn doping. SF energy surface for the prismatic slip indicates activation of non-basal slip in alloys with a 6l structure. Charge density analysis shows that the 2l and 6l structures are electronically similar which might be a cause for better stability of 6l structure over a 4l sequence or other periodic structures. Thus, in an Mg-Zn-Y alloy, Y stabilizes the long periodicity, while its mechanical properties are further improved due to Zn doping
Through-Silicon-Via Underfill Dispensing for 3D Die/Interposer Stacking

Science.gov (United States)

Le, Fuliang

The next generation packaging keeps up with the increased demands of functionality by using the third dimension. 3D chip stacking with TSVs has been identified as one of the major technologies to achieve higher silicon density and shorter interconnection. In order to protect solder interconnections from hostile environments and redistribute thermal stress caused by CTE mismatch, underfill should be applied for the under-chip spaces. In this study, TSV underfill dispensing is introduced to address the underfill challenge for 3D chip stacks. The material properties are first measured and the general trend indicates viscosity and contact angle dropping significantly with an increase in temperature, and surface tension falling slightly as the temperature increases. Underfill should assure a complete encapsulation, avoiding excessive filling time that can result in substantial manufacturing delays. Typically, the inflows for TSV underfill can be free droplets or a constant flow rate. For a constant inflow, the underfill flow is driven by pressure difference and the filling time is governed by flow radius, gap clearance and the constant flow rate. For an inflow of free droplets, the underfill flow is driven by capillary action and the filling time is related to viscosity, flow radius, gap clearance, surface tension, contact angle and TSV size. In general, TSV underfill dispensing with a constant inflow has much shorter filling time than dispensing with an inflow of free droplets. TSV underfill dispensing on a 3D chip stack may induce the risk of an edge flood failure. In order to avoid an edge flood, fluid pressure around the sidewalls of a 3D chip stack cannot exceed limit equilibrium pressure. For TSV dispensing with free droplets, there is no risk of forming an edge flood. However, for a constant inflow, TSV dispensing should be carefully controlled to avoid excessive pressure. Besides, it is suggested that the TSVs in stacked chips be aligned in the vertical
Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment?

Directory of Open Access Journals (Sweden)

Hartmann Stefanie

2008-03-01

Full Text Available Abstract Background While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood and studied ways to improve the accuracy of trees obtained from such datasets. Results We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. Conclusion These results demonstrate that partial gene
Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?

Science.gov (United States)

Hartmann, Stefanie; Vision, Todd J

2008-03-26

While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood) and studied ways to improve the accuracy of trees obtained from such datasets. We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. These results demonstrate that partial gene sequences and gappy multiple sequence alignments can pose a

Probing Temperature Inside Planar SOFC Short Stack, Modules, and Stack Series

Science.gov (United States)

Yu, Rong; Guan, Wanbing; Zhou, Xiao-Dong

2017-02-01

Probing temperature inside a solid oxide fuel cell (SOFC) stack lies at the heart of the development of high-performance and stable SOFC systems. In this article, we report our recent work on the direct measurements of the temperature in three types of SOFC systems: a 5-cell short stack, a 30-cell stack module, and a stack series consisting of two 30-cell stack modules. The dependence of temperature on the gas flow rate and current density was studied under a current sweep or steady-state operation. During the current sweep, the temperature inside the 5-cell stack decreased with increasing current, while it increased significantly at the bottom and top of the 30-cell stack. During a steady-state operation, the temperature of the 5-cell stack was stable while it was increased in the 30-cell stack. In the stack series, the maximum temperature gradient reached 190°C when the gas was not preheated. If the gas was preheated and the temperature gradient was reduced to 23°C in the stack series with the presence of a preheating gas and segmented temperature control, this resulted in a low degradation rate.
The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search

DEFF Research Database (Denmark)

Havgaard, Jakob Hull; Lyngsø, Rune B.; Gorodkin, Jan

2005-01-01

FOLDALIGN is a Sankoff-based algorithm for making structural alignments of RNA sequences. Here, we present a web server for making pairwise alignments between two RNA sequences, using the recently updated version of FOLDALIGN. The server can be used to scan two sequences for a common structural RNA...... motif of limited size, or the entire sequences can be aligned locally or globally. The web server offers a graphical interface, which makes it simple to make alignments and manually browse the results. the web server can be accessed at http://foldalign.kvl.dk...
Macroscopic alignment of graphene stacks by Langmuir-Blodgett deposition of amphiphilic hexabenzocoronenes

DEFF Research Database (Denmark)

Laursen, B.W.; Nørgaard, K.; Reitzel, N.

2004-01-01

). Grazing-incidence X-ray diffraction (GIXD) and X-ray reflectivity, both utilizing synchrotron radiation, show that these amphiphilic HBCs form well-defined Langmuir monolayers at the air-water interface, with pi-stacked columnar structure where the HBC cores are rotated around the surface normal...... and tilted relative to the water surface. The intercolumnar distance is 20 A. The HBCs are confined to a layer lying on top of the layer of polar groups that are in contact with the water subphase. Efficient transfer of the monolayer of the anthraquinone-substituted HBC derivative to hydrophobic quartz...
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

Directory of Open Access Journals (Sweden)

Charlotte Herzeel

Full Text Available elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878, we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878, elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.
Infernal 1.0: inference of RNA alignments

OpenAIRE

Nawrocki, Eric P.; Kolbe, Diana L.; Eddy, Sean R.

2009-01-01

Summary: infernal builds consensus RNA secondary structure profiles called covariance models (CMs), and uses them to search nucleic acid sequence databases for homologous RNAs, or to create new sequence- and structure-based multiple sequence alignments.
Improving your target-template alignment with MODalign.

KAUST Repository

Barbato, Alessandro; Benkert, Pascal; Schwede, Torsten; Tramontano, Anna; Kosinski, Jan

2012-01-01

, upon modification of the target-template alignment, updates the multiple sequence alignments of the two protein families, their conservation score, secondary structure and solvent accessibility values, and local quality scores of the implied three
AIC-based diffraction stacking for local earthquake locations at the Sumatran Fault (Indonesia)

Science.gov (United States)

Hendriyana, Andri; Bauer, Klaus; Muksin, Umar; Weber, Michael

2018-05-01

We present a new workflow for the localization of seismic events which is based on a diffraction stacking approach. In order to address the effects from complex source radiation patterns, we suggest to compute diffraction stacking from a characteristic function (CF) instead of stacking the original waveform data. A new CF, which is called in the following mAIC (modified from Akaike Information Criterion) is proposed. We demonstrate that both P- and S-wave onsets can be detected accurately. To avoid cross-talk between P and S waves due to inaccurate velocity models, we separate the P and S waves from the mAIC function by making use of polarization attributes. Then, the final image function is represented by the largest eigenvalue as a result of the covariance analysis between P- and S-image functions. Results from synthetic experiments show that the proposed diffraction stacking provides reliable results. The workflow of the diffraction stacking method was finally applied to local earthquake data from Sumatra, Indonesia. Recordings from a temporary network of 42 stations deployed for nine months around the Tarutung pull-apart basin were analysed. The seismic event locations resulting from the diffraction stacking method align along a segment of the Sumatran Fault. A more complex distribution of seismicity is imaged within and around the Tarutung basin. Two lineaments striking N-S were found in the centre of the Tarutung basin which support independent results from structural geology.
Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

Directory of Open Access Journals (Sweden)

Xin Yi Ng

2015-01-01

Full Text Available This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM- LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity.
Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

Science.gov (United States)

Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

2008-09-01

A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.
Finding the most significant common sequence and structure motifs in a set of RNA sequences

DEFF Research Database (Denmark)

Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

1997-01-01

We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified...
Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot.

Science.gov (United States)

Loh, Yong-Hwee Eddie; Shen, Li

2016-01-01

The continual maturation and increasing applications of next-generation sequencing technology in scientific research have yielded ever-increasing amounts of data that need to be effectively and efficiently analyzed and innovatively mined for new biological insights. We have developed ngs.plot-a quick and easy-to-use bioinformatics tool that performs visualizations of the spatial relationships between sequencing alignment enrichment and specific genomic features or regions. More importantly, ngs.plot is customizable beyond the use of standard genomic feature databases to allow the analysis and visualization of user-specified regions of interest generated by the user's own hypotheses. In this protocol, we demonstrate and explain the use of ngs.plot using command line executions, as well as a web-based workflow on the Galaxy framework. We replicate the underlying commands used in the analysis of a true biological dataset that we had reported and published earlier and demonstrate how ngs.plot can easily generate publication-ready figures. With ngs.plot, users would be able to efficiently and innovatively mine their own datasets without having to be involved in the technical aspects of sequence coverage calculations and genomic databases.
Efficient alignment-free DNA barcode analytics.

Science.gov (United States)

Kuksa, Pavel; Pavlovic, Vladimir

2009-11-10

In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding.
Axially alignable nuclear fuel pellets

International Nuclear Information System (INIS)

Johansson, E.B.; Klahn, D.H.; Marlowe, M.O.

1978-01-01

An axially alignable nuclear fuel pellet of the type stacked in end-to-end relationship within a tubular cladding is described. Fuel cladding failures can occur at pellet interface locations due to mechanical interaction between misaligned fuel pellets and the cladding. Mechanical interaction between the cladding and the fuel pellets loads the cladding and causes increased cladding stresses. Nuclear fuel pellets are provided with an end structure that increases plastic deformation of the pellets at the interface between pellets so that lower alignment forces are required to straighten axially misaligned pellets. Plastic deformation of the pellet ends results in less interactions beween the cladding and the fuel pellets and significantly lowers cladding stresses. The geometry of pellets constructed according to the invention also reduces alignment forces required to straighten fuel pellets that are tilted within the cladding. Plastic deformation of the pellets at the pellet interfaces is increased by providing pellets with at least one end face having a centrally-disposed raised area of convex shape so that the mean temperature and shear stress of the contact area is higher than that of prior art pellets
SVM-dependent pairwise HMM: an application to protein pairwise alignments.

Science.gov (United States)

Orlando, Gabriele; Raimondi, Daniele; Khan, Taushif; Lenaerts, Tom; Vranken, Wim F

2017-12-15

Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. wim.vranken@vub.be. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Band Alignment Determination of Two-Dimensional Heterojunctions and Their Electronic Applications

KAUST Repository

Chiu, Ming-Hui

2018-05-09

Two-dimensional (2D) layered materials such as MoS2 have been recognized as high on-off ratio semiconductors which are promising candidates for electronic and optoelectronic devices. In addition to the use of individual 2D materials, the accelerated field of 2D heterostructures enables even greater functionalities. Device designs differ, and they are strongly controlled by the electronic band alignment. For example, photovoltaic cells require type II heterostructures for light harvesting, and light-emitting diodes benefit from multiple quantum wells with the type I band alignment for high emission efficiency. The vertical tunneling field-effect transistor for next-generation electronics depends on nearly broken-gap band alignment for boosting its performance. To tailor these 2D layered materials toward possible future applications, the understanding of 2D heterostructure band alignment becomes critically important. In the first part of this thesis, we discuss the band alignment of 2D heterostructures. To do so, we firstly study the interlayer coupling between two dissimilar 2D materials. We conclude that a post-anneal process could enhance the interlayer coupling of as-transferred 2D heterostructures, and heterostructural stacking imposes similar symmetry changes as homostructural stacking. Later, we precisely determine the quasi particle bandgap and band alignment of the MoS2/WSe2 heterostructure by using scan tunneling microscopy/spectroscopy (STM/S) and micron-beam X-ray photoelectron spectroscopy (μ-XPS) techniques. Lastly, we prove that the band alignment of 2D heterojunctions can be accurately predicted by Anderson’s model, which has previously failed to predict conventional bulk heterostructures. In the second part of this thesis, we develop a new Chemical Vapor Deposition (CVD) method capable of precisely controlling the growth area of p- and n-type transition metal dichalcogenides (TMDCs) and further form lateral or vertical 2D heterostructures. This
SWAMP+: multiple subsequence alignment using associative massive parallelism

Energy Technology Data Exchange (ETDEWEB)

Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

2010-10-18

A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.
Data for amino acid alignment of Japanese stingray melanocortin receptors with other gnathostome melanocortin receptor sequences, and the ligand selectivity of Japanese stingray melanocortin receptors

Directory of Open Access Journals (Sweden)

Akiyoshi Takahashi

2016-06-01

Full Text Available This article contains structure and pharmacological characteristics of melanocortin receptors (MCRs related to research published in “Characterization of melanocortin receptors from stingray Dasyatis akajei, a cartilaginous fish” (Takahashi et al., 2016 [1]. The amino acid sequences of the stingray, D. akajei, MC1R, MC2R, MC3R, MC4R, and MC5R were aligned with the corresponding melanocortin receptor sequences from the elephant shark, Callorhinchus milii, the dogfish, Squalus acanthias, the goldfish, Carassius auratus, and the mouse, Mus musculus. These alignments provide the basis for phylogenetic analysis of these gnathostome melanocortin receptor sequences. In addition, the Japanese stingray melanocortin receptors were separately expressed in Chinese Hamster Ovary cells, and stimulated with stingray ACTH, α-MSH, β-MSH, γ-MSH, δ-MSH, and β-endorphin. The dose response curves reveal the order of ligand selectivity for each stingray MCR.
Alignment and prediction of cis-regulatory modules based on a probabilistic model of evolution.

Directory of Open Access Journals (Sweden)

Xin He

2009-03-01

Full Text Available Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii binding sites in distal bound sequences (relative to transcription start sites tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis, ready to be applied in a broad biological context.
Confocal Microscope Alignment of Nanocrystals for Coherent Diffraction Imaging

International Nuclear Information System (INIS)

Beitra, Loren; Watari, Moyu; Matsuura, Takashi; Shimamoto, Naonobu; Harder, Ross; Robinson, Ian

2010-01-01

We have installed and tested an Olympus LEXT confocal microscope at the 34-ID-C beamline of the Advanced Photon Source (APS). The beamline is for Coherent X-ray Diffraction (CXD) experiments in which a nanometre-sized crystal is aligned inside a focussed X-ray beam. The microscope was required for three-dimensional (3D) sample alignment to get around sphere-of-confusion issues when locating Bragg peaks in reciprocal space. In this way, and by use of strategic sample preparations, we have succeeded in measuring six Bragg peaks from a single 200 nm gold crystal and obtained six projections of its internal displacement field. This enables the clear identification of stacking-fault bands within the crystal. The confocal alignment method will allow a full determination of the strain tensor provided three or more Bragg reflections from the same crystal are found.
BFAST: an alignment tool for large scale genome resequencing.

Directory of Open Access Journals (Sweden)

Nils Homer

2009-11-01

Full Text Available The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net.

BBMap: A Fast, Accurate, Splice-Aware Aligner

Energy Technology Data Exchange (ETDEWEB)

Bushnell, Brian

2014-03-17

Alignment of reads is one of the primary computational tasks in bioinformatics. Of paramount importance to resequencing, alignment is also crucial to other areas - quality control, scaffolding, string-graph assembly, homology detection, assembly evaluation, error-correction, expression quantification, and even as a tool to evaluate other tools. An optimal aligner would greatly improve virtually any sequencing process, but optimal alignment is prohibitively expensive for gigabases of data. Here, we will present BBMap [1], a fast splice-aware aligner for short and long reads. We will demonstrate that BBMap has superior speed, sensitivity, and specificity to alternative high-throughput aligners bowtie2 [2], bwa [3], smalt, [4] GSNAP [5], and BLASR [6].
Experimental and Numerical Studies on Fiber Deformation and Formability in Thermoforming Process Using a Fast-Cure Carbon Prepreg: Effect of Stacking Sequence and Mold Geometry

Directory of Open Access Journals (Sweden)

Daeryeong Bae

2018-05-01

Full Text Available A fast-cure carbon fiber/epoxy prepreg was thermoformed against a replicated automotive roof panel mold (square-cup to investigate the effect of the stacking sequence of prepreg layers with unidirectional and plane woven fabrics and mold geometry with different drawing angles and depths on the fiber deformation and formability of the prepreg. The optimum forming condition was determined via analysis of the material properties of epoxy resin. The non-linear mechanical properties of prepreg at the deformation modes of inter- and intra-ply shear, tensile and bending were measured to be used as input data for the commercial virtual forming simulation software. The prepreg with a stacking sequence containing the plain-woven carbon prepreg on the outer layer of the laminate was successfully thermoformed against a mold with a depth of 20 mm and a tilting angle of 110°. Experimental results for the shear deformations at each corner of the thermoformed square-cup product were compared with the simulation and a similarity in the overall tendency of the shear angle in the path at each corner was observed. The results are expected to contribute to the optimization of parameters on materials, mold design and processing in the thermoforming mass-production process for manufacturing high quality automotive parts with a square-cup geometry.
Experimental and Numerical Studies on Fiber Deformation and Formability in Thermoforming Process Using a Fast-Cure Carbon Prepreg: Effect of Stacking Sequence and Mold Geometry

Science.gov (United States)

Bae, Daeryeong; Kim, Shino; Lee, Wonoh; Yi, Jin Woo; Um, Moon Kwang; Seong, Dong Gi

2018-01-01

A fast-cure carbon fiber/epoxy prepreg was thermoformed against a replicated automotive roof panel mold (square-cup) to investigate the effect of the stacking sequence of prepreg layers with unidirectional and plane woven fabrics and mold geometry with different drawing angles and depths on the fiber deformation and formability of the prepreg. The optimum forming condition was determined via analysis of the material properties of epoxy resin. The non-linear mechanical properties of prepreg at the deformation modes of inter- and intra-ply shear, tensile and bending were measured to be used as input data for the commercial virtual forming simulation software. The prepreg with a stacking sequence containing the plain-woven carbon prepreg on the outer layer of the laminate was successfully thermoformed against a mold with a depth of 20 mm and a tilting angle of 110°. Experimental results for the shear deformations at each corner of the thermoformed square-cup product were compared with the simulation and a similarity in the overall tendency of the shear angle in the path at each corner was observed. The results are expected to contribute to the optimization of parameters on materials, mold design and processing in the thermoforming mass-production process for manufacturing high quality automotive parts with a square-cup geometry. PMID:29883413
Short-Range Stacking Disorder in Mixed-Layer Compounds: A HAADF STEM Study of Bastnäsite-Parisite Intergrowths

Directory of Open Access Journals (Sweden)

Cristiana L. Ciobanu

2017-11-01

Full Text Available Atomic-scale high angle annular dark field scanning transmission electron microscopy (HAADF STEM imaging and electron diffractions are used to address the complexity of lattice-scale intergrowths of REE-fluorocarbonates from an occurrence adjacent to the Olympic Dam deposit, South Australia. The aims are to define the species present within the intergrowths and also assess the value of the HAADF STEM technique in resolving stacking sequences within mixed-layer compounds. Results provide insights into the definition of species and crystal-structural modularity. Lattice-scale intergrowths account for the compositional range between bastnäsite and parasite, as measured by electron probe microanalysis (at the µm-scale throughout the entire area of the intergrowths. These comprise rhythmic intervals of parisite and bastnäsite, or stacking sequences with gradational changes in the slab stacking between B, BBS and BS types (B—bastnäsite, S—synchysite. An additional occurrence of an unnamed B2S phase [CaCe3(CO34F3], up to 11 unit cells in width, is identified among sequences of parisite and bastnäsite within the studied lamellar intergrowths. Both B2S and associated parisite show hexagonal lattices, interpreted as 2H polytypes with c = 28 and 38 Å, respectively. 2H parisite is a new, short hexagonal polytype that can be added to the 14 previously reported polytypes (both hexagonal and rhombohedral for this mineral. The correlation between satellite reflections and the number of layers along the stacking direction (c* can be written empirically as: Nsat = [(m × 2 + (n × 4] − 1 for all BmSn compounds with S ≠ 0. The present study shows intergrowths characterised by short-range stacking disorder and coherent changes in stacking along perpendicular directions. Knowing that the same compositional range can be expressed as long-period stacking compounds in the group, the present intergrowths are interpreted as being related to disequilibrium
Global alignment algorithms implementations | Fatumo ...

African Journals Online (AJOL)

In this paper, we implemented the two routes for sequence comparison, that is; the dotplot and Needleman-wunsch algorithm for global sequence alignment. Our algorithms were implemented in python programming language and were tested on Linux platform 1.60GHz, 512 MB of RAM SUSE 9.2 and 10.1 versions.
Biological sequence analysis

DEFF Research Database (Denmark)

Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...
BlockLogo: Visualization of peptide and sequence motif conservation

DEFF Research Database (Denmark)

Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

2013-01-01

BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...
New approach for dynamic flow management within the PEMFC stack

International Nuclear Information System (INIS)

Varlam, Mihai; Culcer, Mihai; Carcadea, Elena; Stefanescu, Ioan; Iliescu, Mariana; Enache, Adrian

2009-01-01

An adequate gas and water flow management is a key issue to reach and maintain a higher output power for a PEM fuel cell stack. One of the main aspects which could limit the performance of a PEM fuel cell stack is the weak capability for a non-uniform water distribution management within the fuel cell. The produced water could become a handicap to attain the best working performance by blocking the catalytic surfaces and by preventing the mass transport process. Usually, the excess water is removed in one cell, comparatively to others from the stack and taking into account that all the cells are supplied in parallel from a common air admission pipe, a limitation of gas flow rate within that cell is created. Consequently, this constraint will reduce further the water removal speed. This feedback process will generate finally a drastic decrease of the fuel cell stack performance. A new practical solution to this water and gas non-uniformity of distributions problem is to use a sequential purge procedure of several fuel cell groups inside the stack which could guarantee a right management of water. An experimental setup has been built based on four fuel cell stack. Every fuel cell was connected to a single removal pipe via a solenoid valve. A computer-controlled hardware and software system has been designed and built, in order to generate a given opening-closing sequence for the automatic valve system. (authors)
COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.

Science.gov (United States)

Lu, Yang Young; Chen, Ting; Fuhrman, Jed A; Sun, Fengzhu

2017-03-15

The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based on sequence composition and coverage across multiple samples. The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is using L 1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy. In our study, we have investigated two types of additional knowledge, the co-alignment to reference genomes and linkage of contigs provided by paired-end reads, as well as the ensemble of both. We find that both co-alignment and linkage information further improve binning in the majority of cases. COCACOLA is scalable and faster than CONCOCT, GroopM, MaxBin and MetaBAT. The software is available at https://github.com/younglululu/COCACOLA . fsun@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Stacking with stochastic cooling

Energy Technology Data Exchange (ETDEWEB)

Caspers, Fritz E-mail: Fritz.Caspers@cern.ch; Moehl, Dieter

2004-10-11

Accumulation of large stacks of antiprotons or ions with the aid of stochastic cooling is more delicate than cooling a constant intensity beam. Basically the difficulty stems from the fact that the optimized gain and the cooling rate are inversely proportional to the number of particles 'seen' by the cooling system. Therefore, to maintain fast stacking, the newly injected batch has to be strongly 'protected' from the Schottky noise of the stack. Vice versa the stack has to be efficiently 'shielded' against the high gain cooling system for the injected beam. In the antiproton accumulators with stacking ratios up to 10{sup 5} the problem is solved by radial separation of the injection and the stack orbits in a region of large dispersion. An array of several tapered cooling systems with a matched gain profile provides a continuous particle flux towards the high-density stack core. Shielding of the different systems from each other is obtained both through the spatial separation and via the revolution frequencies (filters). In the 'old AA', where the antiproton collection and stacking was done in one single ring, the injected beam was further shielded during cooling by means of a movable shutter. The complexity of these systems is very high. For more modest stacking ratios, one might use azimuthal rather than radial separation of stack and injected beam. Schematically half of the circumference would be used to accept and cool new beam and the remainder to house the stack. Fast gating is then required between the high gain cooling of the injected beam and the low gain stack cooling. RF-gymnastics are used to merge the pre-cooled batch with the stack, to re-create free space for the next injection, and to capture the new batch. This scheme is less demanding for the storage ring lattice, but at the expense of some reduction in stacking rate. The talk reviews the 'radial' separation schemes and also gives some
Enhanced optical fields in a multilayered microsphere with a quasiperiodic spherical stack

International Nuclear Information System (INIS)

Burlak, Gennadiy N

2007-01-01

Radiation of a nanosource placed in a microsphere with a quasiperiodic subwavelength spherical stack is studied. The spectral evolution of transmittance at the change of the thickness of two-layer blocks constructed following the Fibonacci sequence is investigated. When the number of layers (Fibonacci order) increases the structure of the spectrum acquires a fractal form. Our calculations show a rising strong field peak, when the ratio of width of layers in two-layer blocks of the stack is close to the golden mean value
Braided and Stacked Electrospun Nanofibrous Scaffolds for Tendon and Ligament Tissue Engineering.

Science.gov (United States)

Rothrauff, Benjamin B; Lauro, Brian B; Yang, Guang; Debski, Richard E; Musahl, Volker; Tuan, Rocky S

2017-05-01

Tendon and ligament injuries are a persistent orthopedic challenge given their poor innate healing capacity. Nonwoven electrospun nanofibrous scaffolds composed of polyesters have been used to mimic the mechanics and topographical cues of native tendons and ligaments. However, nonwoven nanofibers have several limitations that prevent broader clinical application, including poor cell infiltration, as well as tensile and suture-retention strengths that are inferior to native tissues. In this study, multilayered scaffolds of aligned electrospun nanofibers of two designs-stacked or braided-were fabricated. Mechanical properties, including structural and mechanical properties and suture-retention strength, were determined using acellular scaffolds. Human bone marrow-derived mesenchymal stem cells (MSCs) were seeded on scaffolds for up to 28 days, and assays for tenogenic differentiation, histology, and biochemical composition were performed. Braided scaffolds exhibited improved tensile and suture-retention strengths, but reduced moduli. Both scaffold designs supported expression of tenogenic markers, although the effect was greater on braided scaffolds. Conversely, cell infiltration was superior in stacked constructs, resulting in enhanced cell number, total collagen content, and total sulfated glycosaminoglycan content. However, when normalized against cell number, both designs modulated extracellular matrix protein deposition to a similar degree. Taken together, this study demonstrates that multilayered scaffolds of aligned electrospun nanofibers supported tenogenic differentiation of seeded MSCs, but the macroarchitecture is an important consideration for applications of tendon and ligament tissue engineering.
EmuStack: An OpenStack-Based DTN Network Emulation Platform (Extended Version

Directory of Open Access Journals (Sweden)

Haifeng Li

2016-01-01

Full Text Available With the advancement of computing and network virtualization technology, the networking research community shows great interest in network emulation. Compared with network simulation, network emulation can provide more relevant and comprehensive details. In this paper, EmuStack, a large-scale real-time emulation platform for Delay Tolerant Network (DTN, is proposed. EmuStack aims at empowering network emulation to become as simple as network simulation. Based on OpenStack, distributed synchronous emulation modules are developed to enable EmuStack to implement synchronous and dynamic, precise, and real-time network emulation. Meanwhile, the lightweight approach of using Docker container technology and network namespaces allows EmuStack to support a (up to hundreds of nodes large-scale topology with only several physical nodes. In addition, EmuStack integrates the Linux Traffic Control (TC tools with OpenStack for managing and emulating the virtual link characteristics which include variable bandwidth, delay, loss, jitter, reordering, and duplication. Finally, experiences with our initial implementation suggest the ability to run and debug experimental network protocol in real time. EmuStack environment would bring qualitative change in network research works.
Method and apparatus for biological sequence comparison

Science.gov (United States)

Marr, T.G.; Chang, W.I.

1997-12-23

A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.
Sequence analysis of Leukemia DNA

Science.gov (United States)

Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

2018-03-01

Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.
RNA Structural Alignments, Part I

DEFF Research Database (Denmark)

Havgaard, Jakob Hull; Gorodkin, Jan

2014-01-01

Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as "RNA structural alignment." A class of the methods for structural alignment is based on the principles proposed by Sankoff more than 25 years ago. The Sankoff algorithm simultaneously folds and aligns...... is so high that it took more than a decade before the first implementation of a Sankoff style algorithm was published. However, with the faster computers available today and the improved heuristics used in the implementations the Sankoff-based methods have become practical. This chapter describes...... the methods based on the Sankoff algorithm. All the practical implementations of the algorithm use heuristics to make them run in reasonable time and memory. These heuristics are also described in this chapter....
Accelerator and transport line survey and alignment

International Nuclear Information System (INIS)

Ruland, R.E.

1991-10-01

This paper summarizes the survey and alignment processes of accelerators and transport lines and discusses the propagation of errors associated with these processes. The major geodetic principles governing the survey and alignment measurement space are introduced and their relationship to a lattice coordinate system shown. The paper continues with a broad overview about the activities involved in the step sequence from initial absolute alignment to final smoothing. Emphasis is given to the relative alignment of components, in particular to the importance of incorporating methods to remove residual systematic effects in surveying and alignment operations. Various approaches to smoothing used at major laboratories are discussed. 47 refs., 19 figs., 1 tab
Direct stacking of sequence-specific nuclease-induced mutations to produce high oleic and low linolenic soybean oil.

Science.gov (United States)

Demorest, Zachary L; Coffman, Andrew; Baltes, Nicholas J; Stoddard, Thomas J; Clasen, Benjamin M; Luo, Song; Retterath, Adam; Yabandith, Ann; Gamo, Maria Elena; Bissen, Jeff; Mathis, Luc; Voytas, Daniel F; Zhang, Feng

2016-10-13

The ability to modulate levels of individual fatty acids within soybean oil has potential to increase shelf-life and frying stability and to improve nutritional characteristics. Commodity soybean oil contains high levels of polyunsaturated linoleic and linolenic acid, which contribute to oxidative instability - a problem that has been addressed through partial hydrogenation. However, partial hydrogenation increases levels of trans-fatty acids, which have been associated with cardiovascular disease. Previously, we generated soybean lines with knockout mutations within fatty acid desaturase 2-1A (FAD2-1A) and FAD2-1B genes, resulting in oil with increased levels of monounsaturated oleic acid (18:1) and decreased levels of linoleic (18:2) and linolenic acid (18:3). Here, we stack mutations within FAD2-1A and FAD2-1B with mutations in fatty acid desaturase 3A (FAD3A) to further decrease levels of linolenic acid. Mutations were introduced into FAD3A by directly delivering TALENs into fad2-1a fad2-1b soybean plants. Oil from fad2-1a fad2-1b fad3a plants had significantly lower levels of linolenic acid (2.5 %), as compared to fad2-1a fad2-1b plants (4.7 %). Furthermore, oil had significantly lower levels of linoleic acid (2.7 % compared to 5.1 %) and significantly higher levels of oleic acid (82.2 % compared to 77.5 %). Transgene-free fad2-1a fad2-1b fad3a soybean lines were identified. The methods presented here provide an efficient means for using sequence-specific nucleases to stack quality traits in soybean. The resulting product comprised oleic acid levels above 80 % and linoleic and linolenic acid levels below 3 %.
Long sequence correlation coprocessor

Science.gov (United States)

Gage, Douglas W.

1994-09-01

A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.
The effect of edge interlaminar stresses on the strength of carbon/epoxy laminates of different stacking geometry

OpenAIRE

MOMCILO STEVANOVIC; MILAN GORDIC; DANIELA SEKULIC; ISIDOR DJORDJEVIC

2006-01-01

The effect of edge interlaminar stresses on strength of carbon/epoxy laminates of different stacking geometry: cross-ply, quasi-isotropic and angle-ply laminates with additional 0º and 90º ply was studied. Coupons with two widths of laminates with an inverse stacking sequence were tested in static tensile tests. The effect of edge interlaminar stresses on strength was studied, by comparing the values of the tensile strength of laminate coupons of the same width with an inverse stacking sequen...

A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

Directory of Open Access Journals (Sweden)

Robert Lindner

Full Text Available Transcriptome sequencing (RNA-Seq overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.
Algebraic stacks

Indian Academy of Sciences (India)

Deligne, Mumford and Artin [DM, Ar2]) and consider algebraic stacks, then we can cons- truct the 'moduli ... the moduli scheme and the moduli stack of vector bundles. First I will give ... 1–31. © Printed in India. 1 ...... Cultura, Spain. References.
T-BAS: Tree-Based Alignment Selector toolkit for phylogenetic-based placement, alignment downloads and metadata visualization: an example with the Pezizomycotina tree of life.

Science.gov (United States)

Carbone, Ignazio; White, James B; Miadlikowska, Jolanta; Arnold, A Elizabeth; Miller, Mark A; Kauff, Frank; U'Ren, Jana M; May, Georgiana; Lutzoni, François

2017-04-15

High-quality phylogenetic placement of sequence data has the potential to greatly accelerate studies of the diversity, systematics, ecology and functional biology of diverse groups. We developed the Tree-Based Alignment Selector (T-BAS) toolkit to allow evolutionary placement and visualization of diverse DNA sequences representing unknown taxa within a robust phylogenetic context, and to permit the downloading of highly curated, single- and multi-locus alignments for specific clades. In its initial form, T-BAS v1.0 uses a core phylogeny of 979 taxa (including 23 outgroup taxa, as well as 61 orders, 175 families and 496 genera) representing all 13 classes of largest subphylum of Fungi-Pezizomycotina (Ascomycota)-based on sequence alignments for six loci (nr5.8S, nrLSU, nrSSU, mtSSU, RPB1, RPB2 ). T-BAS v1.0 has three main uses: (i) Users may download alignments and voucher tables for members of the Pezizomycotina directly from the reference tree, facilitating systematics studies of focal clades. (ii) Users may upload sequence files with reads representing unknown taxa and place these on the phylogeny using either BLAST or phylogeny-based approaches, and then use the displayed tree to select reference taxa to include when downloading alignments. The placement of unknowns can be performed for large numbers of Sanger sequences obtained from fungal cultures and for alignable, short reads of environmental amplicons. (iii) User-customizable metadata can be visualized on the tree. T-BAS Version 1.0 is available online at http://tbas.hpc.ncsu.edu . Registration is required to access the CIPRES Science Gateway and NSF XSEDE's large computational resources. icarbon@ncsu.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Self-assembly of bimetallic AuxPd1-x alloy nanoparticles via dewetting of bilayers through the systematic control of temperature, thickness, composition and stacking sequence

Science.gov (United States)

Kunwar, Sundar; Pandey, Puran; Sui, Mao; Bastola, Sushil; Lee, Jihoon

2018-03-01

Bimetallic alloy nanoparticles (NPs) are attractive materials for various applications with their morphology and elemental composition dependent optical, electronic, magnetic and catalytic properties. This work demonstrates the evolution of AuxPd1-x alloy nanostructures by the solid-state dewetting of sequentially deposited bilayers of Au and Pd on sapphire (0001). Various shape, size and configuration of AuxPd1‑x alloy NPs are fabricated by the systematic control of annealing temperature, deposition thickness, composition as well as stacking sequence. The evolution of alloy nanostructures is attributed to the surface diffusion, interface diffusion between bilayers, surface and interface energy minimization, Volmer-Weber growth model and equilibrium configuration. Depending upon the temperature, the surface morphologies evolve with the formation of pits, grains and voids and gradually develop into isolated semi-spherical alloy NPs by the expansion of voids and agglomeration of Au and Pd adatoms. On the other hand, small isolated to enlarged elongated and over-grown layer-like alloy nanostructures are fabricated due to the coalescence, partial diffusion and inter-diffusion with the increased bilayer thickness. In addition, the composition and stacking sequence of bilayers remarkably affect the final geometry of AuxPd1‑x nanostructures due to the variation in the dewetting process. The optical analysis based on the UV–vis-NIR reflectance spectra reveals the surface morphology dependent plasmonic resonance, scattering, reflection and absorption properties of AuxPd1‑x alloy nanostructures.
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

OpenAIRE

Zhang, Han; Xu, Tao; Li, Hongsheng; Zhang, Shaoting; Wang, Xiaogang; Huang, Xiaolei; Metaxas, Dimitris

2017-01-01

Although Generative Adversarial Networks (GANs) have shown remarkable success in various tasks, they still face challenges in generating high quality images. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images. First, we propose a two-stage generative adversarial network architecture, StackGAN-v1, for text-to-image synthesis. The Stage-I GAN sketches the primitive shape and colors of the object based on given...
Structure of a stacked anthraquinone–DNA complex

Science.gov (United States)

De Luchi, Daniela; Usón, Isabel; Wright, Glenford; Gouyette, Catherine; Subirana, Juan A.

2010-01-01

The crystal structure of the telomeric sequence d(UBrAGG) interacting with an anthraquinone derivative has been solved by MAD. In all previously studied complexes of intercalating drugs, the drug is usually sandwiched between two DNA base pairs. Instead, the present structure looks like a crystal of stacked anthraquinone molecules in which isolated base pairs are intercalated. Unusual base pairs are present in the structure, such as G·G and A·UBr reverse Watson–Crick base pairs. PMID:20823516
Laser diode stack beam shaping for efficient and compact long-range laser illuminator design

Science.gov (United States)

Lutz, Y.; Poyet, J. M.

2014-04-01

Laser diode stacks are interesting laser sources for active imaging illuminators. They allow the accumulation of large amounts of energy in multi-pulse mode, which is best suited for long-range image recording. Even when the laser diode stacks are equipped with fast-axis collimation (FAC) and slow-axis collimation (SAC) micro-lenses, their beam parameter products BPP are not compatible with direct use in highly efficient and compact illuminators. This is particularly true when narrow divergences are required such as for long-range applications. A solution to overcome these difficulties is to enhance the poor slow-axis BPP by virtually restacking the laser diode stack. We present a beam shaping and homogenization method that is low-cost and efficient and has low alignment sensitivity. After conducting simulations, we have realized and characterized the illuminator. A compact long-range laser illuminator has been set up with a divergence of 3.5×2.6 mrad and a global efficiency of 81%. Here, a projection lens with a clear aperture of 62 mm and a focal length of 571 mm was used.
ASH structure alignment package: Sensitivity and selectivity in domain classification

Directory of Open Access Journals (Sweden)

Toh Hiroyuki

2007-04-01

Full Text Available Abstract Background Structure alignment methods offer the possibility of measuring distant evolutionary relationships between proteins that are not visible by sequence-based analysis. However, the question of how structural differences and similarities ought to be quantified in this regard remains open. In this study we construct a training set of sequence-unique CATH and SCOP domains, from which we develop a scoring function that can reliably identify domains with the same CATH topology and SCOP fold classification. The score is implemented in the ASH structure alignment package, for which the source code and a web service are freely available from the PDBj website http://www.pdbj.org/ASH/. Results The new ASH score shows increased selectivity and sensitivity compared with values reported for several popular programs using the same test set of 4,298,905 structure pairs, yielding an area of .96 under the receiver operating characteristic (ROC curve. In addition, weak sequence homologies between similar domains are revealed that could not be detected by BLAST sequence alignment. Also, a subset of domain pairs is identified that exhibit high similarity, even though their CATH and SCOP classification differs. Finally, we show that the ranking of alignment programs based solely on geometric measures depends on the choice of the quality measure. Conclusion ASH shows high selectivity and sensitivity with regard to domain classification, an important step in defining distantly related protein sequence families. Moreover, the CPU cost per alignment is competitive with the fastest programs, making ASH a practical option for large-scale structure classification studies.
The Subwavelength Optical Field Confinement in a Multilayered Microsphere with Quasiperiodic Spherical Stack

Directory of Open Access Journals (Sweden)

Gennadiy N. Burlak

2008-01-01

Full Text Available We study the frequency spectrum of nanoemitters placed in a microsphere with a quasiperiodic subwavelength spherical stack. The spectral evolution of transmittancy at the change of thickness of two-layer blocks, constructed following the Fibonacci sequence, is investigated. When the number of layers (Fibonacci order increases, the structure of spectrum acquires a fractal form. Our calculations show the radiation confinement and gigantic field enhancement, when the ratio of layers’ widths in twolayer blocks of the stack is close to the golden mean value.
STACKING ON COMMON REFLECTION SURFACE WITH MULTIPARAMETER TRAVELTIME

Directory of Open Access Journals (Sweden)

Montes V. Luis A.

2006-12-01

Full Text Available Commonly seismic images are displayed in time domain because the model in depth can be known only in well logs. To produce seismic sections, pre and post stack processing approaches use time or depth velocity models whereas the common reflection method does not, instead it requires a set of parameters established for the first layer. A set of synthetic data of an anticline model, with sources and receivers placed on a flat topography, was used to observe the performance of this method. As result, a better reflector recovering compared against conventional processing sequence was observed.
The procedure was extended to real data, using a dataset acquired on a zone characterized by mild topography and quiet environment reflectors in the Eastern Colombia planes, observing an enhanced and a better continuity of the reflectors in the CRS stacked section.
W-curve alignments for HIV-1 genomic comparisons.

Directory of Open Access Journals (Sweden)

Douglas J Cork

2010-06-01

Full Text Available The W-curve was originally developed as a graphical visualization technique for viewing DNA and RNA sequences. Its ability to render features of DNA also makes it suitable for computational studies. Its main advantage in this area is utilizing a single-pass algorithm for comparing the sequences. Avoiding recursion during sequence alignments offers advantages for speed and in-process resources. The graphical technique also allows for multiple models of comparison to be used depending on the nucleotide patterns embedded in similar whole genomic sequences. The W-curve approach allows us to compare large numbers of samples quickly.We are currently tuning the algorithm to accommodate quirks specific to HIV-1 genomic sequences so that it can be used to aid in diagnostic and vaccine efforts. Tracking the molecular evolution of the virus has been greatly hampered by gap associated problems predominantly embedded within the envelope gene of the virus. Gaps and hypermutation of the virus slow conventional string based alignments of the whole genome. This paper describes the W-curve algorithm itself, and how we have adapted it for comparison of similar HIV-1 genomes. A treebuilding method is developed with the W-curve that utilizes a novel Cylindrical Coordinate distance method and gap analysis method. HIV-1 C2-V5 env sequence regions from a Mother/Infant cohort study are used in the comparison.The output distance matrix and neighbor results produced by the W-curve are functionally equivalent to those from Clustal for C2-V5 sequences in the mother/infant pairs infected with CRF01_AE.Significant potential exists for utilizing this method in place of conventional string based alignment of HIV-1 genomes, such as Clustal X. With W-curve heuristic alignment, it may be possible to obtain clinically useful results in a short time-short enough to affect clinical choices for acute treatment. A description of the W-curve generation process, including a comparison
W-curve alignments for HIV-1 genomic comparisons.

Science.gov (United States)

Cork, Douglas J; Lembark, Steven; Tovanabutra, Sodsai; Robb, Merlin L; Kim, Jerome H

2010-06-01

The W-curve was originally developed as a graphical visualization technique for viewing DNA and RNA sequences. Its ability to render features of DNA also makes it suitable for computational studies. Its main advantage in this area is utilizing a single-pass algorithm for comparing the sequences. Avoiding recursion during sequence alignments offers advantages for speed and in-process resources. The graphical technique also allows for multiple models of comparison to be used depending on the nucleotide patterns embedded in similar whole genomic sequences. The W-curve approach allows us to compare large numbers of samples quickly. We are currently tuning the algorithm to accommodate quirks specific to HIV-1 genomic sequences so that it can be used to aid in diagnostic and vaccine efforts. Tracking the molecular evolution of the virus has been greatly hampered by gap associated problems predominantly embedded within the envelope gene of the virus. Gaps and hypermutation of the virus slow conventional string based alignments of the whole genome. This paper describes the W-curve algorithm itself, and how we have adapted it for comparison of similar HIV-1 genomes. A treebuilding method is developed with the W-curve that utilizes a novel Cylindrical Coordinate distance method and gap analysis method. HIV-1 C2-V5 env sequence regions from a Mother/Infant cohort study are used in the comparison. The output distance matrix and neighbor results produced by the W-curve are functionally equivalent to those from Clustal for C2-V5 sequences in the mother/infant pairs infected with CRF01_AE. Significant potential exists for utilizing this method in place of conventional string based alignment of HIV-1 genomes, such as Clustal X. With W-curve heuristic alignment, it may be possible to obtain clinically useful results in a short time-short enough to affect clinical choices for acute treatment. A description of the W-curve generation process, including a comparison technique of
OpenStack cloud security

CERN Document Server

Locati, Fabio Alessandro

2015-01-01

If you are an OpenStack administrator or developer, or wish to build solutions to protect your OpenStack environment, then this book is for you. Experience of Linux administration and familiarity with different OpenStack components is assumed.
Semiautomated improvement of RNA alignments

DEFF Research Database (Denmark)

Andersen, Ebbe Sloth; Lind-Thomsen, Allan; Knudsen, Bjarne

2007-01-01

connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database...... and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster......: the mir-399 RNA, vertebrate telomase RNA (vert-TR), bacterial transfer-messenger RNA (tmRNA), and the signal recognition particle (SRP) RNA. The general use of the method is illustrated by the ability to accommodate pseudoknots and handle even large and divergent RNA families. The open architecture...
Pairwise structure alignment specifically tuned for surface pockets and interaction interfaces

KAUST Repository

Cui, Xuefeng

2015-09-09

To detect and evaluate the similarities between the three-dimensional (3D) structures of two molecules, various kinds of methods have been proposed for the pairwise structure alignment problem [6, 9, 7, 11]. The problem plays important roles when studying the function and the evolution of biological molecules. Recently, pairwise structure alignment methods have been extended and applied on surface pocket structures [10, 3, 5] and interaction interface structures [8, 4]. The results show that, even when there are no global similarities discovered between the global sequences and the global structures, biological molecules or complexes could share similar functions because of well conserved pockets and interfaces. Thus, pairwise pocket and interface structure alignments are promising to unveil such shared functions that cannot be discovered by the well-studied global sequence and global structure alignments. State-of-the-art methods for pairwise pocket and interface structure alignments [4, 5] are direct extensions of the classic pairwise protein structure alignment methods, and thus such methods share a few limitations. First, the goal of the classic protein structure alignment methods is to align single-chain protein structures (i.e., a single fragment of residues connected by peptide bonds). However, we observed that pockets and interfaces tend to consist of tens of extremely short backbone fragments (i.e., three or fewer residues connected by peptide bonds). Thus, existing pocket and interface alignment methods based on the protein structure alignment methods still rely on the existence of long-enough backbone fragments, and the fragmentation issue of pockets and interfaces rises the risk of missing the optimal alignments. Moreover, existing interface structure alignment methods focus on protein-protein interfaces, and require a "blackbox preprocessing" before aligning protein-DNA and protein-RNA interfaces. Therefore, we introduce the PROtein STucture Alignment
SPA: a probabilistic algorithm for spliced alignment.

Directory of Open Access Journals (Sweden)

2006-04-01

Full Text Available Recent large-scale cDNA sequencing efforts show that elaborate patterns of splice variation are responsible for much of the proteome diversity in higher eukaryotes. To obtain an accurate account of the repertoire of splice variants, and to gain insight into the mechanisms of alternative splicing, it is essential that cDNAs are very accurately mapped to their respective genomes. Currently available algorithms for cDNA-to-genome alignment do not reach the necessary level of accuracy because they use ad hoc scoring models that cannot correctly trade off the likelihoods of various sequencing errors against the probabilities of different gene structures. Here we develop a Bayesian probabilistic approach to cDNA-to-genome alignment. Gene structures are assigned prior probabilities based on the lengths of their introns and exons, and based on the sequences at their splice boundaries. A likelihood model for sequencing errors takes into account the rates at which misincorporation, as well as insertions and deletions of different lengths, occurs during sequencing. The parameters of both the prior and likelihood model can be automatically estimated from a set of cDNAs, thus enabling our method to adapt itself to different organisms and experimental procedures. We implemented our method in a fast cDNA-to-genome alignment program, SPA, and applied it to the FANTOM3 dataset of over 100,000 full-length mouse cDNAs and a dataset of over 20,000 full-length human cDNAs. Comparison with the results of four other mapping programs shows that SPA produces alignments of significantly higher quality. In particular, the quality of the SPA alignments near splice boundaries and SPA's mapping of the 5' and 3' ends of the cDNAs are highly improved, allowing for more accurate identification of transcript starts and ends, and accurate identification of subtle splice variations. Finally, our splice boundary analysis on the human dataset suggests the existence of a novel non
Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

DEFF Research Database (Denmark)

de Souza, S J; Camargo, A A; Briones, M R

2000-01-01

Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central ...
Tools for integrated sequence-structure analysis with UCSF Chimera

Directory of Open Access Journals (Sweden)

Huang Conrad C

2006-07-01

Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is
HeurAA: accurate and fast detection of genetic variations with a novel heuristic amplicon aligner program for next generation sequencing.

Directory of Open Access Journals (Sweden)

Lőrinc S Pongor

Full Text Available Next generation sequencing (NGS of PCR amplicons is a standard approach to detect genetic variations in personalized medicine such as cancer diagnostics. Computer programs used in the NGS community often miss insertions and deletions (indels that constitute a large part of known human mutations. We have developed HeurAA, an open source, heuristic amplicon aligner program. We tested the program on simulated datasets as well as experimental data from multiplex sequencing of 40 amplicons in 12 oncogenes collected on a 454 Genome Sequencer from lung cancer cell lines. We found that HeurAA can accurately detect all indels, and is more than an order of magnitude faster than previous programs. HeurAA can compare reads and reference sequences up to several thousand base pairs in length, and it can evaluate data from complex mixtures containing reads of different gene-segments from different samples. HeurAA is written in C and Perl for Linux operating systems, the code and the documentation are available for research applications at http://sourceforge.net/projects/heuraa/
A Global Network Alignment Method Using Discrete Particle Swarm Optimization.

Science.gov (United States)

Huang, Jiaxiang; Gong, Maoguo; Ma, Lijia

2016-10-19

Molecular interactions data increase exponentially with the advance of biotechnology. This makes it possible and necessary to comparatively analyse the different data at a network level. Global network alignment is an important network comparison approach to identify conserved subnetworks and get insight into evolutionary relationship across species. Network alignment which is analogous to subgraph isomorphism is known to be an NP-hard problem. In this paper, we introduce a novel heuristic Particle-Swarm-Optimization based Network Aligner (PSONA), which optimizes a weighted global alignment model considering both protein sequence similarity and interaction conservations. The particle statuses and status updating rules are redefined in a discrete form by using permutation. A seed-and-extend strategy is employed to guide the searching for the superior alignment. The proposed initialization method "seeds" matches with high sequence similarity into the alignment, which guarantees the functional coherence of the mapping nodes. A greedy local search method is designed as the "extension" procedure to iteratively optimize the edge conservations. PSONA is compared with several state-of-art methods on ten network pairs combined by five species. The experimental results demonstrate that the proposed aligner can map the proteins with high functional coherence and can be used as a booster to effectively refine the well-studied aligners.

Can-Evo-Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences.

Science.gov (United States)

Ali, Safdar; Majid, Abdul

2015-04-01

The diagnostic of human breast cancer is an intricate process and specific indicators may produce negative results. In order to avoid misleading results, accurate and reliable diagnostic system for breast cancer is indispensable. Recently, several interesting machine-learning (ML) approaches are proposed for prediction of breast cancer. To this end, we developed a novel classifier stacking based evolutionary ensemble system "Can-Evo-Ens" for predicting amino acid sequences associated with breast cancer. In this paper, first, we selected four diverse-type of ML algorithms of Naïve Bayes, K-Nearest Neighbor, Support Vector Machines, and Random Forest as base-level classifiers. These classifiers are trained individually in different feature spaces using physicochemical properties of amino acids. In order to exploit the decision spaces, the preliminary predictions of base-level classifiers are stacked. Genetic programming (GP) is then employed to develop a meta-classifier that optimal combine the predictions of the base classifiers. The most suitable threshold value of the best-evolved predictor is computed using Particle Swarm Optimization technique. Our experiments have demonstrated the robustness of Can-Evo-Ens system for independent validation dataset. The proposed system has achieved the highest value of Area Under Curve (AUC) of ROC Curve of 99.95% for cancer prediction. The comparative results revealed that proposed approach is better than individual ML approaches and conventional ensemble approaches of AdaBoostM1, Bagging, GentleBoost, and Random Subspace. It is expected that the proposed novel system would have a major impact on the fields of Biomedical, Genomics, Proteomics, Bioinformatics, and Drug Development. Copyright © 2015 Elsevier Inc. All rights reserved.
GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies.

Science.gov (United States)

Kim, Jeremie S; Senol Cali, Damla; Xin, Hongyi; Lee, Donghyuk; Ghose, Saugata; Alser, Mohammed; Hassan, Hasan; Ergin, Oguz; Alkan, Can; Mutlu, Onur

2018-05-09

Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments. We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x-6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x-3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm. GRIM-Filter exploits 3D-stacked memory, which enables the efficient use of processing-in-memory, to overcome the memory bandwidth bottleneck in seed location filtering. We show that GRIM-Filter significantly improves the performance of a state-of-the-art read mapper. GRIM-Filter is a universal seed location filter that can be
High efficiency x-ray nanofocusing by the blazed stacking of binary zone plates

Science.gov (United States)

Mohacsi, I.; Karvinen, P.; Vartiainen, I.; Diaz, A.; Somogyi, A.; Kewish, C. M.; Mercere, P.; David, C.

2013-09-01

The focusing efficiency of binary Fresnel zone plate lenses is fundamentally limited and higher efficiency requires a multi step lens profile. To overcome the manufacturing problems of high resolution and high efficiency multistep zone plates, we investigate the concept of stacking two different binary zone plates in each other's optical near-field. We use a coarse zone plate with π phase shift and a double density fine zone plate with π/2 phase shift to produce an effective 4- step profile. Using a compact experimental setup with piezo actuators for alignment, we demonstrated 47.1% focusing efficiency at 6.5 keV using a pair of 500 μm diameter and 200 nm smallest zone width. Furthermore, we present a spatially resolved characterization method using multiple diffraction orders to identify manufacturing errors, alignment errors and pattern distortions and their effect on diffraction efficiency.
Predicting RNA hyper-editing with a novel tool when unambiguous alignment is impossible.

Science.gov (United States)

McKerrow, Wilson H; Savva, Yiannis A; Rezaei, Ali; Reenan, Robert A; Lawrence, Charles E

2017-07-10

Repetitive elements are now known to have relevant cellular functions, including self-complementary sequences that form double stranded (ds) RNA. There are numerous pathways that determine the fate of endogenous dsRNA, and misregulation of endogenous dsRNA is a driver of autoimmune disease, particularly in the brain. Unfortunately, the alignment of high-throughput, short-read sequences to repeat elements poses a dilemma: Such sequences may align equally well to multiple genomic locations. In order to differentiate repeat elements, current alignment methods depend on sequence variation in the reference genome. Reads are discarded when no such variations are present. However, RNA hyper-editing, a possible fate for dsRNA, introduces enough variation to distinguish between repeats that are otherwise identical. To take advantage of this variation, we developed a new algorithm, RepProfile, that simultaneously aligns reads and predicts novel variations. RepProfile accurately aligns hyper-edited reads that other methods discard. In particular we predict hyper-editing of Drosophila melanogaster repeat elements in vivo at levels previously described only in vitro, and provide validation by Sanger sequencing sixty-two individual cloned sequences. We find that hyper-editing is concentrated in genes involved in cell-cell communication at the synapse, including some that are associated with neurodegeneration. We also find that hyper-editing tends to occur in short runs. Previous studies of RNA hyper-editing discarded ambiguously aligned reads, ignoring hyper-editing in long, perfect dsRNA - the perfect substrate for hyper-editing. We provide a method that simulation and Sanger validation show accurately predicts such RNA editing, yielding a superior picture of hyper-editing.
Learning Online Alignments with Continuous Rewards Policy Gradient

OpenAIRE

Luo, Yuping; Chiu, Chung-Cheng; Jaitly, Navdeep; Sutskever, Ilya

2016-01-01

Sequence-to-sequence models with soft attention had significant success in machine translation, speech recognition, and question answering. Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition. To address this problem, we present a new method for solving sequence-to-sequence problems using hard online alignments instead of soft offlin...
Considerations in the identification of functional RNA structural elements in genomic alignments

Directory of Open Access Journals (Sweden)

Blencowe Benjamin J

2007-01-01

Full Text Available Abstract Background Accurate identification of novel, functional noncoding (nc RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in sequence alignments. Neither the algorithms nor the information gained from the individual inputs have been independently assessed. Furthermore, due to issues in modelling background signal, it has been difficult to gauge the precision of these algorithms on a genomic scale, in which even a seemingly small false-positive rate can result in a vast excess of false discoveries. Results We developed a shuffling algorithm, shuffle-pair.pl, that simultaneously preserves dinucleotide frequency, gaps, and local conservation in pairwise sequence alignments. We used shuffle-pair.pl to assess precision and recall of six ncRNA search tools (MSARI, QRNA, ddbRNA, RNAz, Evofold, and several variants of simple thermodynamic stability on a test set of 3046 alignments of known ncRNAs. Relative to mononucleotide shuffling, preservation of dinucleotide content in shuffling the alignments resulted in a drastic increase in estimated false-positive detection rates for ncRNA elements, precluding evaluation of higher order alignments, which cannot not be adequately shuffled maintaining both dinucleotides and alignment structure. On pairwise alignments, none of the covariance-based tools performed markedly better than thermodynamic scoring alone. Although the high false-positive rates call into question the veracity of any individual predicted secondary structural element in our analysis, we nevertheless identified intriguing global trends in human genome alignments. The distribution of ncRNA prediction scores in 75-base windows overlapping UTRs, introns, and intergenic regions analyzed using both thermodynamic stability and EvoFold (which has no thermodynamic component was
The impact of stack geometry and mean pressure on cold end temperature of stack in thermoacoustic refrigeration systems

Science.gov (United States)

Wantha, Channarong

2018-02-01

This paper reports on the experimental and simulation studies of the influence of stack geometries and different mean pressures on the cold end temperature of the stack in the thermoacoustic refrigeration system. The stack geometry was tested, including spiral stack, circular pore stack and pin array stack. The results of this study show that the mean pressure of the gas in the system has a significant impact on the cold end temperature of the stack. The mean pressure of the gas in the system corresponds to thermal penetration depth, which results in a better cold end temperature of the stack. The results also show that the cold end temperature of the pin array stack decreases more than that of the spiral stack and circular pore stack geometry by approximately 63% and 70%, respectively. In addition, the thermal area and viscous area of the stack are analyzed to explain the results of such temperatures of thermoacoustic stacks.
OpenStack essentials

CERN Document Server

Radez, Dan

2015-01-01

If you need to get started with OpenStack or want to learn more, then this book is your perfect companion. If you're comfortable with the Linux command line, you'll gain confidence in using OpenStack.
Centroid based clustering of high throughput sequencing reads based on n-mer counts.

Science.gov (United States)

Solovyov, Alexander; Lipkin, W Ian

2013-09-08

Many problems in computational biology require alignment-free sequence comparisons. One of the common tasks involving sequence comparison is sequence clustering. Here we apply methods of alignment-free comparison (in particular, comparison using sequence composition) to the challenge of sequence clustering. We study several centroid based algorithms for clustering sequences based on word counts. Study of their performance shows that using k-means algorithm with or without the data whitening is efficient from the computational point of view. A higher clustering accuracy can be achieved using the soft expectation maximization method, whereby each sequence is attributed to each cluster with a specific probability. We implement an open source tool for alignment-free clustering. It is publicly available from github: https://github.com/luscinius/afcluster. We show the utility of alignment-free sequence clustering for high throughput sequencing analysis despite its limitations. In particular, it allows one to perform assembly with reduced resources and a minimal loss of quality. The major factor affecting performance of alignment-free read clustering is the length of the read.
Understanding Interfacial Alignment in Solution Coated Conjugated Polymer Thin Films

International Nuclear Information System (INIS)

Qu, Ge; Zhao, Xikang; Newbloom, Gregory M.; Zhang, Fengjiao; Mohammadi, Erfan

2017-01-01

Domain alignment in conjugated polymer thin films can significantly enhance charge carrier mobility. However, the alignment mechanism during meniscus-guided solution coating remains unclear. Furthermore, interfacial alignment has been rarely studied despite its direct relevance and critical importance to charge transport. In this study, we uncover a significantly higher degree of alignment at the top interface of solution coated thin films, using a donor–acceptor conjugated polymer, poly(diketopyrrolopyrrole-co-thiopheneco- thieno[3,2-b]thiophene-co-thiophene) (DPP2T-TT), as the model system. At the molecular level, we observe in-plane π–π stacking anisotropy of up to 4.8 near the top interface with the polymer backbone aligned parallel to the coating direction. The bulk of the film is only weakly aligned with the backbone oriented transverse to coating. At the mesoscale, we observe a well-defined fibril-like morphology at the top interface with the fibril long axis pointing toward the coating direction. Significantly smaller fibrils with poor orientational order are found on the bottom interface, weakly aligned orthogonal to the fibrils on the top interface. The high degree of alignment at the top interface leads to a charge transport anisotropy of up to 5.4 compared to an anisotropy close to 1 on the bottom interface. We attribute the formation of distinct interfacial morphology to the skin-layer formation associated with high Peclet number, which promotes crystallization on the top interface while suppressing it in the bulk. As a result, we further infer that the interfacial fibril alignment is driven by the extensional flow on the top interface arisen from increasing solvent evaporation rate closer to the meniscus front.
eShadow: A tool for comparing closely related sequences

Energy Technology Data Exchange (ETDEWEB)

Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.

2004-01-15

Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualization of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/
Solid-state dewetting of Au-Ni bi-layer films mediated through individual layer thickness and stacking sequence

Science.gov (United States)

Herz, Andreas; Theska, Felix; Rossberg, Diana; Kups, Thomas; Wang, Dong; Schaaf, Peter

2018-06-01

In the present work, the solid-state dewetting of Au-Ni bi-layer thin films deposited on SiO2/Si is systematically studied with respect to individual layer thickness and stacking sequence. For this purpose, a rapid heat treatment at medium temperatures is applied in order to examine void formation at the early stages of the dewetting. Compositional variations are realized by changing the thickness ratio of the bi-layer films, while the total thickness is maintained at 20 nm throughout the study. In the event of Au/Ni films annealed at 500 °C, crystal voids exposing the substrate are missing regardless of chemical composition. In reverse order, the number of voids per unit area in two-phase Au-Ni thin films is found to be governed by the amount of Au-rich material. At higher temperatures up to 650 °C, a decreased probability of nucleation comes at the expense of a major portion of cavities, resulting in the formation of bubbles in 15 nm Ni/5 nm Au bi-layers. Film buckling predominantly occurred at phase boundaries crossing the bubbles.
Size-tunable band alignment and optoelectronic properties of transition metal dichalcogenide van der Waals heterostructures

Science.gov (United States)

Zhao, Yipeng; Yu, Wangbing; Ouyang, Gang

2018-01-01

2D transition metal dichalcogenide (TMDC)-based heterostructures exhibit several fascinating properties that can address the emerging market of energy conversion and storage devices. Current achievements show that the vertical stacked TMDC heterostructures can form type II band alignment and possess significant optoelectronic properties. However, a detailed analytical understanding of how to quantify the band alignment and band offset as well as the optimized power conversion efficiency (PCE) is still lacking. Herein, we propose an analytical model to exhibit the PCEs of TMDC van der Waals (vdW) heterostructures and explore the intrinsic mechanism of photovoltaic conversion based on the detailed balance principle and atomic-bond-relaxation correlation mechanism. We find that the PCE of monolayer MoS2/WSe2 can be up to 1.70%, and that of the MoS2/WSe2 vdW heterostructures increases with thickness, owing to increasing optical absorption. Moreover, the results are validated by comparing them with the available evidence, providing realistic efficiency targets and design principles. Highlights • Both electronic and optoelectronic models are developed for vertical stacked MoS2/WSe2 heterostructures. • The underlying mechanism on size effect of electronic and optoelectronic properties for vertical stacked MoS2/WSe2 heterostructures is clarified. • The macroscopically measurable quantities and the microscopical bond identities are connected.
Role of stacking disorder in ice nucleation.

Science.gov (United States)

Lupi, Laura; Hudait, Arpa; Peters, Baron; Grünwald, Michael; Gotchy Mullen, Ryan; Nguyen, Andrew H; Molinero, Valeria

2017-11-08

The freezing of water affects the processes that determine Earth's climate. Therefore, accurate weather and climate forecasts hinge on good predictions of ice nucleation rates. Such rate predictions are based on extrapolations using classical nucleation theory, which assumes that the structure of nanometre-sized ice crystallites corresponds to that of hexagonal ice, the thermodynamically stable form of bulk ice. However, simulations with various water models find that ice nucleated and grown under atmospheric temperatures is at all sizes stacking-disordered, consisting of random sequences of cubic and hexagonal ice layers. This implies that stacking-disordered ice crystallites either are more stable than hexagonal ice crystallites or form because of non-equilibrium dynamical effects. Both scenarios challenge central tenets of classical nucleation theory. Here we use rare-event sampling and free energy calculations with the mW water model to show that the entropy of mixing cubic and hexagonal layers makes stacking-disordered ice the stable phase for crystallites up to a size of at least 100,000 molecules. We find that stacking-disordered critical crystallites at 230 kelvin are about 14 kilojoules per mole of crystallite more stable than hexagonal crystallites, making their ice nucleation rates more than three orders of magnitude higher than predicted by classical nucleation theory. This effect on nucleation rates is temperature dependent, being the most pronounced at the warmest conditions, and should affect the modelling of cloud formation and ice particle numbers, which are very sensitive to the temperature dependence of ice nucleation rates. We conclude that classical nucleation theory needs to be corrected to include the dependence of the crystallization driving force on the size of the ice crystallite when interpreting and extrapolating ice nucleation rates from experimental laboratory conditions to the temperatures that occur in clouds.
Improving model construction of profile HMMs for remote homology detection through structural alignment

Directory of Open Access Journals (Sweden)

Zaverucha Gerson

2007-11-01

Full Text Available Abstract Background Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.
Improving model construction of profile HMMs for remote homology detection through structural alignment.

Science.gov (United States)

Bernardes, Juliana S; Dávila, Alberto M R; Costa, Vítor S; Zaverucha, Gerson

2007-11-09

Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.
Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

DEFF Research Database (Denmark)

Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan

2009-01-01

MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary...... determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than...
Aligners: the Invisible Corrector-A Boon or Bane.

Science.gov (United States)

Mahendra, Lodd

2018-03-01

The trend of clinical orthodontics has shown a palpable shift from conventional braces to innovative technologies like invisible aligners. Aligners are sequences of clear trays worn by patients to straighten their teeth. They were envisaged for the main purpose of esthetics, mainly directed toward self-conscious teenagers who otherwise would shy away from essential correction of malocclusion.
Sequence periodicity in nucleosomal DNA and intrinsic curvature.

Science.gov (United States)

Nair, T Murlidharan

2010-05-17

Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.
MatrixPlot: visualizing sequence constraints

DEFF Research Database (Denmark)

Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole

1999-01-01

MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...

Evolution of primary and secondary structures in 5S and 5.8S rRNA

International Nuclear Information System (INIS)

Curtiss, W.C.

1986-01-01

The secondary structure of Bombyx mori 5S rRNA was studied using the sing-strand specific S1 nuclease and the base pair specific cobra venom ribonuclease. The RNA was end-labeled with [ 32 P] at either the 5' or 3' end and sequenced using enzymatic digestion techniques. These enzymatic data coupled with thermodynamic structure prediction were used to generate a secondary structure for 5S rRNA. A computer algorithm has been implemented to aid in the comparison of a large set of homologous RNAs. Eukaryotic 5S rRNA sequences from thirty four diverse species were compared by (1) alignment or the sequences, (2) the positions of substitutions were located with respect to the aligned sequence and secondary structure, and (3) the R-Y model of base stacking was used to study stacking pattern relationships in the structure. Eukaryotic 5S rRNA was found to have significant sequence variation throughout much of the molecule while maintaining a relatively constant secondary structure. A detailed analysis of the sequence and structure variability in each region of the molecule is presented
Does interchain stacking morphology contribute to the singlet-triplet interconversion dynamics in polymer heterojunctions?

Energy Technology Data Exchange (ETDEWEB)

Bittner, Eric R. [Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, TX 77204 (United States)], E-mail: bittner@uh.edu; Burghardt, Irene [Departement de Chimie, Ecole Normale Superieure, 24 rue Lhomond, F-75231 Paris cedex 05 (France); Friend, Richard H. [Cavendish Laboratory, Madingley Road, Cambridge CB3 0HE (United Kingdom)

2009-02-23

Time-dependent density functional theory (TD-DFT) is used to examine the effect of stacking in a model semiconducting polymer hetrojunction system consisting of two co-facially stacked oligomers. We find that the excited electronic states are highly sensitive to the alignment of the monomer units of the two chains. In the system we examined, the exchange energy is nearly identical to both the and band off-set at the heterojunction and to the exciton binding energy. Our results indicate that the triplet excitonic states are nearly degenerate with the singlet exciplex states opening the possibility for the interconversion of singlet and triplet electronic states at the heterojunction interface via spin-orbit coupling localized on the heteroatoms. Using Russell-Saunders theory, we estimate this interconversion rate to be approximately 700-800 ps, roughly a 5-10-fold increase compared to isolated organic polymer chains.
Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin.

Science.gov (United States)

Guzzi, Pietro Hiram; Milenković, Tijana

2017-01-05

Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Porous Structures in Stacked, Crumpled and Pillared Graphene-Based 3D Materials.

Science.gov (United States)

Guo, Fei; Creighton, Megan; Chen, Yantao; Hurt, Robert; Külaots, Indrek

2014-01-01

Graphene, an atomically thin material with the theoretical surface area of 2600 m 2 g -1 , has great potential in the fields of catalysis, separation, and gas storage if properly assembled into functional 3D materials at large scale. In ideal non-interacting ensembles of non-porous multilayer graphene plates, the surface area can be adequately estimated using the simple geometric law ~ 2600 m 2 g -1 /N, where N is the number of graphene sheets per plate. Some processing operations, however, lead to secondary plate-plate stacking, folding, crumpling or pillaring, which give rise to more complex structures. Here we show that bulk samples of multilayer graphene plates stack in an irregular fashion that preserves the 2600/N surface area and creates regular slot-like pores with sizes that are multiples of the unit plate thickness. In contrast, graphene oxide deposits into films with massive area loss (2600 to 40 m 2 g -1 ) due to nearly perfect alignment and stacking during the drying process. Pillaring graphene oxide sheets by co-deposition of colloidal-phase particle-based spacers has the potential to partially restore the large monolayer surface. Surface areas as high as 1000 m 2 g -1 are demonstrated here through colloidal-phase deposition of graphene oxide with water-dispersible aryl-sulfonated ultrafine carbon black as a pillaring agent.
Radio-frequency properties of stacked long Josephson junctions with nonuniform bias current distribution

DEFF Research Database (Denmark)

Filatrella, G; Pedersen, Niels Falsig

1999-01-01

We have numerically investigated the behavior of stacks of long Josephson junctions considering a nonuniform bias profile. In the presence of a microwave field the nonuniform bias, which favors the formation of fluxons, can give rise to a change of the sequence of radio-frequency induced steps...
Modeling fuel cell stack systems

Energy Technology Data Exchange (ETDEWEB)

Lee, J H [Los Alamos National Lab., Los Alamos, NM (United States); Lalk, T R [Dept. of Mech. Eng., Texas A and M Univ., College Station, TX (United States)

1998-06-15

A technique for modeling fuel cell stacks is presented along with the results from an investigation designed to test the validity of the technique. The technique was specifically designed so that models developed using it can be used to determine the fundamental thermal-physical behavior of a fuel cell stack for any operating and design configuration. Such models would be useful tools for investigating fuel cell power system parameters. The modeling technique can be applied to any type of fuel cell stack for which performance data is available for a laboratory scale single cell. Use of the technique is demonstrated by generating sample results for a model of a Proton Exchange Membrane Fuel Cell (PEMFC) stack consisting of 125 cells each with an active area of 150 cm{sup 2}. A PEMFC stack was also used in the verification investigation. This stack consisted of four cells, each with an active area of 50 cm{sup 2}. Results from the verification investigation indicate that models developed using the technique are capable of accurately predicting fuel cell stack performance. (orig.)
Development and characterization of a three-dimensional radiochromic film stack dosimeter for megavoltage photon beam dosimetry.

Science.gov (United States)

McCaw, Travis J; Micka, John A; DeWerd, Larry A

2014-05-01

Three-dimensional (3D) dosimeters are particularly useful for verifying the commissioning of treatment planning and delivery systems, especially with the ever-increasing implementation of complex and conformal radiotherapy techniques such as volumetric modulated arc therapy. However, currently available 3D dosimeters require extensive experience to prepare and analyze, and are subject to large measurement uncertainties. This work aims to provide a more readily implementable 3D dosimeter with the development and characterization of a radiochromic film stack dosimeter for megavoltage photon beam dosimetry. A film stack dosimeter was developed using Gafchromic(®) EBT2 films. The dosimeter consists of 22 films separated by 1 mm-thick spacers. A Virtual Water™ phantom was created that maintains the radial film alignment within a maximum uncertainty of 0.3 mm. The film stack dosimeter was characterized using simulations and measurements of 6 MV fields. The absorbed-dose energy dependence and orientation dependence of the film stack dosimeter were investigated using Monte Carlo simulations. The water equivalence of the dosimeter was determined by comparing percentage-depth-dose (PDD) profiles measured with the film stack dosimeter and simulated using Monte Carlo methods. Film stack dosimeter measurements were verified with thermoluminescent dosimeter (TLD) microcube measurements. The film stack dosimeter was also used to verify the delivery of an intensity-modulated radiation therapy (IMRT) procedure. The absorbed-dose energy response of EBT2 film differs less than 1.5% between the calibration and film stack dosimeter geometries for a 6 MV spectrum. Over a series of beam angles ranging from normal incidence to parallel incidence, the overall variation in the response of the film stack dosimeter is within a range of 2.5%. Relative to the response to a normally incident beam, the film stack dosimeter exhibits a 1% under-response when the beam axis is parallel to the film
Adaptive GDDA-BLAST: fast and efficient algorithm for protein sequence embedding.

Directory of Open Access Journals (Sweden)

Yoojin Hong

2010-10-01

Full Text Available A major computational challenge in the genomic era is annotating structure/function to the vast quantities of sequence information that is now available. This problem is illustrated by the fact that most proteins lack comprehensive annotations, even when experimental evidence exists. We previously theorized that embedded-alignment profiles (simply "alignment profiles" hereafter provide a quantitative method that is capable of relating the structural and functional properties of proteins, as well as their evolutionary relationships. A key feature of alignment profiles lies in the interoperability of data format (e.g., alignment information, physio-chemical information, genomic information, etc.. Indeed, we have demonstrated that the Position Specific Scoring Matrices (PSSMs are an informative M-dimension that is scored by quantitatively measuring the embedded or unmodified sequence alignments. Moreover, the information obtained from these alignments is informative, and remains so even in the "twilight zone" of sequence similarity (<25% identity. Although our previous embedding strategy was powerful, it suffered from contaminating alignments (embedded AND unmodified and high computational costs. Herein, we describe the logic and algorithmic process for a heuristic embedding strategy named "Adaptive GDDA-BLAST." Adaptive GDDA-BLAST is, on average, up to 19 times faster than, but has similar sensitivity to our previous method. Further, data are provided to demonstrate the benefits of embedded-alignment measurements in terms of detecting structural homology in highly divergent protein sequences and isolating secondary structural elements of transmembrane and ankyrin-repeat domains. Together, these advances allow further exploration of the embedded alignment data space within sufficiently large data sets to eventually induce relevant statistical inferences. We show that sequence embedding could serve as one of the vehicles for measurement of low
JavaScript DNA translator: DNA-aligned protein translations.

Science.gov (United States)

Perry, William L

2002-12-01

There are many instances in molecular biology when it is necessary to identify ORFs in a DNA sequence. While programs exist for displaying protein translations in multiple ORFs in alignment with a DNA sequence, they are often expensive, exist as add-ons to software that must be purchased, or are only compatible with a particular operating system. JavaScript DNA Translator is a shareware application written in JavaScript, a scripting language interpreted by the Netscape Communicator and Internet Explorer Web browsers, which makes it compatible with several different operating systems. While the program uses a familiar Web page interface, it requires no connection to the Internet since calculations are performed on the user's own computer. The program analyzes one or multiple DNA sequences and generates translations in up to six reading frames aligned to a DNA sequence, in addition to displaying translations as separate sequences in FASTA format. ORFs within a reading frame can also be displayed as separate sequences. Flexible formatting options are provided, including the ability to hide ORFs below a minimum size specified by the user. The program is available free of charge at the BioTechniques Software Library (www.Biotechniques.com).
PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

Science.gov (United States)

Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

2016-07-08

The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

Directory of Open Access Journals (Sweden)

Li Wei

2005-05-01

Full Text Available Abstract Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~3.84 million shotgun sequences (0.66X coverage from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. Conclusion The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.
Comparison of three methods reducing the beam parameter product of a laser diode stack for long range laser illumination applications

Science.gov (United States)

Lutz, Yves; Poyet, Jean-Michel; Metzger, Nicolas

2013-10-01

Laser diode stacks are interesting laser sources for active imaging illuminators. They allow the accumulation of large amounts of energy in multi-pulse mode, which is well suited for long-range image recording. Even when laser diode stacks are equipped with fast-axis collimation (FAC) and slow-axis collimation (SAC) microlenses, their beam parameter product (BPP) are not compatible with a direct use in highly efficient and compact illuminators. This is particularly true when narrow divergences are required such as for long range applications. To overcome these difficulties, we conducted investigations in three different ways. A first near infrared illuminator based on the use of conductively cooled mini-bars was designed, realized and successfully tested during outdoor experimentations. This custom specified stack was then replaced in a second step by an off-the-shelf FAC + SAC micro lensed stack where the brightness was increased by polarization overlapping. The third method still based on a commercial laser diode stack uses a non imaging optical shaping principle resulting in a virtually restacked laser source with enhanced beam parameters. This low cost, efficient and low alignment sensitivity beam shaping method allows obtaining a compact and high performance laser diode illuminator for long range active imaging applications. The three methods are presented and compared in this paper.
Alignment-free genome tree inference by learning group-specific distance metrics.

Science.gov (United States)

Patil, Kaustubh R; McHardy, Alice C

2013-01-01

Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.
Field emission from vertically aligned few-layer graphene

International Nuclear Information System (INIS)

Malesevic, Alexander; Kemps, Raymond; Vanhulsel, Annick; Chowdhury, Manish Pal; Volodin, Alexander; Van Haesendonck, Chris

2008-01-01

The electric field emission behavior of vertically aligned few-layer graphene was studied in a parallel plate-type setup. Few-layer graphene was synthesized in the absence of any metallic catalyst by microwave plasma enhanced chemical vapor deposition with gas mixtures of methane and hydrogen. The deposit consists of nanostructures that are several micrometers wide, highly crystalline stacks of four to six atomic layers of graphene, aligned vertically to the substrate surface in a high density network. The few-layer graphene is found to be a good field emitter, characterized by turn-on fields as low as 1 V/μm and field amplification factors up to several thousands. We observe a clear dependence of the few-layer graphene field emission behavior on the synthesis parameters: Hydrogen is identified as an efficient etchant to improve field emission, and samples grown on titanium show lower turn-on field values and higher amplification factors when compared to samples grown on silicon
JVM: Java Visual Mapping tool for next generation sequencing read.

Science.gov (United States)

Yang, Ye; Liu, Juan

2015-01-01

We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.
BLAST and FASTA similarity searching for multiple sequence alignment.

Science.gov (United States)

Pearson, William R

2014-01-01

BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Analysis of stacking overlap in nucleic acid structures: algorithm and application.

Science.gov (United States)

Pingali, Pavan Kumar; Halder, Sukanya; Mukherjee, Debasish; Basu, Sankar; Banerjee, Rahul; Choudhury, Devapriya; Bhattacharyya, Dhananjay

2014-08-01

RNA contains different secondary structural motifs like pseudo-helices, hairpin loops, internal loops, etc. in addition to anti-parallel double helices and random coils. The secondary structures are mainly stabilized by base-pairing and stacking interactions between the planar aromatic bases. The hydrogen bonding strength and geometries of base pairs are characterized by six intra-base pair parameters. Similarly, stacking can be represented by six local doublet parameters. These dinucleotide step parameters can describe the quality of stacking between Watson-Crick base pairs very effectively. However, it is quite difficult to understand the stacking pattern for dinucleotides consisting of non canonical base pairs from these parameters. Stacking interaction is a manifestation of the interaction between two aromatic bases or base pairs and thus can be estimated best by the overlap area between the planar aromatic moieties. We have calculated base pair overlap between two consecutive base pairs as the buried van der Waals surface between them. In general, overlap values show normal distribution for the Watson-Crick base pairs in most double helices within a range from 45 to 50 Å(2) irrespective of base sequence. The dinucleotide steps with non-canonical base pairs also are seen to have high overlap value, although their twist and few other parameters are rather unusual. We have analyzed hairpin loops of different length, bulges within double helical structures and pseudo-continuous helices using our algorithm. The overlap area analyses indicate good stacking between few looped out bases especially in GNRA tetraloop, which was difficult to quantitatively characterise from analysis of the base pair or dinucleotide step parameters. This parameter is also seen to be capable to distinguish pseudo-continuous helices from kinked helix junctions.
SANSparallel: interactive homology search against Uniprot.

Science.gov (United States)

Somervuo, Panu; Holm, Liisa

2015-07-01

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Environmental assessment of phosphogypsum stacks

International Nuclear Information System (INIS)

Odat, M.; Al-Attar, L.; Raja, G.; Abdul Ghany, B.

2008-03-01

Phosphogypsum is one of the most important by-products of phosphate fertilizer industry. It is kept in large stacks to the west of Homs city. Storing Phosphogypsum as open stacks exposed to various environmental effects, wind and rain, may cause pollution of the surrounding ecosystem (soil, plant, water and air). This study was carried out in order to assess the environmental impact of Phosphogypsum stacks on the surrounding ecosystem. The obtained results show that Phosphogypsum stacks did not increase the concentration of radionuclides, i.e. Radon-222 and Radium-226, the external exposed dose of gamma rays, as well as the concentration of heavy metals in the components of the ecosystem, soil, plant, water and air, as their concentrations did not exceed the permissible limits. However, the concentration of fluorine in the upper layer of soil, located to the east of the Phosphogypsum stacks, increased sufficiently, especially in the dry period of the year. Also, the concentration of fluoride in plants growing up near-by the Phosphogypsum stacks was too high, exceeded the permissible levels. This was reflected in poising plants and animals, feeding on the plants. Consequently, increasing the concentration of fluoride in soil and plants is the main impact of Phosphogypsum stacks on the surrounding ecosystem. Minimising this effect could be achieved by establishing a 50 meter wide protection zone surrounding the Phosphogypsum stacks, which has to be planted with non palatable trees, such as pine and cypress, forming wind barriers. Increasing the concentrations of heavy metals and fluoride in infiltrated water around the stacks was high; hence cautions must be taken to prevent its usage in any application or disposal in adjacent rivers and leaks.(author)
Environmental assessment of phosphogypsum stacks

International Nuclear Information System (INIS)

Odat, M.; Al-Attar, L.; Raja, G.; Abdul Ghany, B.

2009-01-01

Phosphogypsum is one of the most important by-products of phosphate fertilizer industry. It is kept in large stacks to the west of Homs city. Storing Phosphogypsum as open stacks exposed to various environmental effects, wind and rain, may cause pollution of the surrounding ecosystem (soil, plant, water and air). This study was carried out in order to assess the environmental impact of Phosphogypsum stacks on the surrounding ecosystem. The obtained results show that Phosphogypsum stacks did not increase the concentration of radionuclides, i.e. Radon-222 and Radium-226, the external exposed dose of gamma rays, as well as the concentration of heavy metals in the components of the ecosystem, soil, plant, water and air, as their concentrations did not exceed the permissible limits. However, the concentration of fluorine in the upper layer of soil, located to the east of the Phosphogypsum stacks, increased sufficiently, especially in the dry period of the year. Also, the concentration of fluoride in plants growing up near-by the Phosphogypsum stacks was too high, exceeded the permissible levels. This was reflected in poising plants and animals, feeding on the plants. Consequently, increasing the concentration of fluoride in soil and plants is the main impact of Phosphogypsum stacks on the surrounding ecosystem. Minimising this effect could be achieved by establishing a 50 meter wide protection zone surrounding the Phosphogypsum stacks, which has to be planted with non palatable trees, such as pine and cypress, forming wind barriers. Increasing the concentrations of heavy metals and fluoride in infiltrated water around the stacks was high; hence cautions must be taken to prevent its usage in any application or disposal in adjacent rivers and leaks.(author)

The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

Science.gov (United States)

Treangen, Todd J; Ondov, Brian D; Koren, Sergey; Phillippy, Adam M

2014-01-01

Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.
Dynamic programming algorithms for biological sequence comparison.

Science.gov (United States)

Pearson, W R; Miller, W

1992-01-01

Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.
Flavivirus and Filovirus EvoPrinters: New alignment tools for the comparative analysis of viral evolution.

Directory of Open Access Journals (Sweden)

Thomas Brody

2017-06-01

Full Text Available Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest.We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1 superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2 whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3 differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4 hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a
Mastering OpenStack

CERN Document Server

Khedher, Omar

2015-01-01

This book is intended for system administrators, cloud engineers, and system architects who want to deploy a cloud based on OpenStack in a mid- to large-sized IT infrastructure. If you have a fundamental understanding of cloud computing and OpenStack and want to expand your knowledge, then this book is an excellent checkpoint to move forward.
Solid Oxide Fuel Cell Stack Diagnostics

DEFF Research Database (Denmark)

Mosbæk, Rasmus Rode; Barfod, Rasmus Gottrup

As SOFC technology is moving closer to a commercial break through, methods to measure the “state-of-health” of operating stacks are becoming of increasing interest. This requires application of advanced methods for detailed electrical and electrochemical characterization during operation....... An operating stack is subject to compositional gradients in the gaseous reactant streams, and temperature gradients across each cell and across the stack, which complicates detailed analysis. Several experimental stacks from Topsoe Fuel Cell A/S were characterized using Electrochemical Impedance Spectroscopy...... in the hydrogen fuel gas supplied to the stack. EIS was used to examine the long-term behavior and monitor the evolution of the impedance of each of the repeating units and the whole stack. The observed impedance was analyzed in detail for one of the repeating units and the whole stack and the losses reported...
Energy hyperspace for stacking interaction in AU/AU dinucleotide step: Dispersion-corrected density functional theory study.

Science.gov (United States)

Mukherjee, Sanchita; Kailasam, Senthilkumar; Bansal, Manju; Bhattacharyya, Dhananjay

2014-01-01

Double helical structures of DNA and RNA are mostly determined by base pair stacking interactions, which give them the base sequence-directed features, such as small roll values for the purine-pyrimidine steps. Earlier attempts to characterize stacking interactions were mostly restricted to calculations on fiber diffraction geometries or optimized structure using ab initio calculations lacking variation in geometry to comment on rather unusual large roll values observed in AU/AU base pair step in crystal structures of RNA double helices. We have generated stacking energy hyperspace by modeling geometries with variations along the important degrees of freedom, roll, and slide, which were chosen via statistical analysis as maximally sequence dependent. Corresponding energy contours were constructed by several quantum chemical methods including dispersion corrections. This analysis established the most suitable methods for stacked base pair systems despite the limitation imparted by number of atom in a base pair step to employ very high level of theory. All the methods predict negative roll value and near-zero slide to be most favorable for the purine-pyrimidine steps, in agreement with Calladine's steric clash based rule. Successive base pairs in RNA are always linked by sugar-phosphate backbone with C3'-endo sugars and this demands C1'-C1' distance of about 5.4 Å along the chains. Consideration of an energy penalty term for deviation of C1'-C1' distance from the mean value, to the recent DFT-D functionals, specifically ωB97X-D appears to predict reliable energy contour for AU/AU step. Such distance-based penalty improves energy contours for the other purine-pyrimidine sequences also. © 2013 Wiley Periodicals, Inc. Biopolymers 101: 107-120, 2014. Copyright © 2013 Wiley Periodicals, Inc.
Development of HT-PEMFC components and stack for CHP unit

Energy Technology Data Exchange (ETDEWEB)

Jensen, Jens Oluf; Li, Q. (Technical Univ. of Denmark, Dept. of Chemistry, Kgs. Lyngby (Denmark)); Terkelsen, C.; Rudbech, H.C.; Steenberg, T. (Danish Power System Aps, Charlottenlund (Denmark)); Thibault de Rycke (IRD Fuel Cell A/S, Svendborg (Denmark))

2009-10-15

enforcement was initiated in the project. The reason for this was to create a better surface to seal against and to prevent membrane rupture during service. A special MEA pressing tool was developed for easy alignment and control of the compression. Fuel cells stacks of the developed MEA's were constructed and tested. The aim was to construct a liquid cooled stack based on IRD's experiences. A 40 cell liquid cooled stack was made by IRD at the end of the project. The cell area was 7x17 cm. A perfluoropolyether was chosen as coolant doe to its low viscosity at all relevant temperatures combined with a low volatility. Besides an air cooled stack was built at DTU with a different materials approach which is confidential. A part of the detailed results regarding stacking was reported in a confidential annex to the main report. (LN)
ooi: OpenStack OCCI interface

Directory of Open Access Journals (Sweden)

Álvaro López García

2016-01-01

Full Text Available In this document we present an implementation of the Open Grid Forum’s Open Cloud Computing Interface (OCCI for OpenStack, namely ooi (Openstack occi interface, 2015 [1]. OCCI is an open standard for management tasks over cloud resources, focused on interoperability, portability and integration. ooi aims to implement this open interface for the OpenStack cloud middleware, promoting interoperability with other OCCI-enabled cloud management frameworks and infrastructures. ooi focuses on being non-invasive with a vanilla OpenStack installation, not tied to a particular OpenStack release version.
ooi: OpenStack OCCI interface

Science.gov (United States)

López García, Álvaro; Fernández del Castillo, Enol; Orviz Fernández, Pablo

In this document we present an implementation of the Open Grid Forum's Open Cloud Computing Interface (OCCI) for OpenStack, namely ooi (Openstack occi interface, 2015) [1]. OCCI is an open standard for management tasks over cloud resources, focused on interoperability, portability and integration. ooi aims to implement this open interface for the OpenStack cloud middleware, promoting interoperability with other OCCI-enabled cloud management frameworks and infrastructures. ooi focuses on being non-invasive with a vanilla OpenStack installation, not tied to a particular OpenStack release version.
New overlay measurement technique with an i-line stepper using embedded standard field image alignment marks for wafer bonding applications

Science.gov (United States)

Kulse, P.; Sasai, K.; Schulz, K.; Wietstruck, M.

2017-06-01

In the last decades the semiconductor technology has been driven by Moore's law leading to high performance CMOS technologies with feature sizes of less than 10 nm [1]. It has been pointed out that not only scaling but also the integration of novel components and technology modules into CMOS/BiCMOS technologies is becoming more attractive to realize smart and miniaturized systems [2]. Driven by new applications in the area of communication, health and automation, new components and technology modules such as BiCMOS embedded RF-MEMS, high-Q passives, Sibased microfluidics and InP-SiGe BiCMOS heterointegration have been demonstrated [3-6]. In contrast to standard VLSI processes fabricated on front side of the silicon wafer, these new technology modules require addition backside processing of the wafer; thus an accurate alignment between the front and backside of the wafer is mandatory. In previous work an advanced back to front side alignment technique and implementation into IHP's 0.25/0.13 μm high performance SiGe:C BiCMOS backside process module has been presented [7]. The developed technique enables a high resolution and accurate lithography on the backside of BiCMOS wafer for additional backside processing. In addition to the aforementioned back side process technologies, new applications like Through-Silicon Vias (TSV) for interposers and advanced substrate technologies for 3D heterogeneous integration demand not only single wafer fabrication but also processing of wafer stacks provided by temporary and permanent wafer bonding [8]. Therefore, the available overlay measurement techniques are not suitable if overlay and alignment marks are realized at the bonding interface of a wafer stack which consists of both a silicon device and a silicon carrier wafer. The former used EVG 40NT automated overlay measurement system, which use two opposite positioned microscopes inspecting simultaneous the wafer back and front side, is not capable measuring embedded overlay
Simultaneous identification of long similar substrings in large sets of sequences

Directory of Open Access Journals (Sweden)

Wittig Burghardt

2007-05-01

Full Text Available Abstract Background Sequence comparison faces new challenges today, with many complete genomes and large libraries of transcripts known. Gene annotation pipelines match these sequences in order to identify genes and their alternative splice forms. However, the software currently available cannot simultaneously compare sets of sequences as large as necessary especially if errors must be considered. Results We therefore present a new algorithm for the identification of almost perfectly matching substrings in very large sets of sequences. Its implementation, called ClustDB, is considerably faster and can handle 16 times more data than VMATCH, the most memory efficient exact program known today. ClustDB simultaneously generates large sets of exactly matching substrings of a given minimum length as seeds for a novel method of match extension with errors. It generates alignments of maximum length with a considered maximum number of errors within each overlapping window of a given size. Such alignments are not optimal in the usual sense but faster to calculate and often more appropriate than traditional alignments for genomic sequence comparisons, EST and full-length cDNA matching, and genomic sequence assembly. The method is used to check the overlaps and to reveal possible assembly errors for 1377 Medicago truncatula BAC-size sequences published at http://www.medicago.org/genome/assembly_table.php?chr=1. Conclusion The program ClustDB proves that window alignment is an efficient way to find long sequence sections of homogenous alignment quality, as expected in case of random errors, and to detect systematic errors resulting from sequence contaminations. Such inserts are systematically overlooked in long alignments controlled by only tuning penalties for mismatches and gaps. ClustDB is freely available for academic use.
Fine-scale structure of the mid-mantle characterised by global stacks of PP precursors

Science.gov (United States)

Bentham, H. L. M.; Rost, S.; Thorne, M. S.

2017-08-01

Subduction zones are likely a major source of compositional heterogeneities in the mantle, which may preserve a record of the subduction history and mantle convection processes. The fine-scale structure associated with mantle heterogeneities can be studied using the scattered seismic wavefield that arrives as coda to or as energy preceding many body wave arrivals. In this study we analyse precursors to PP by creating stacks recorded at globally distributed stations. We create stacks aligned on the PP arrival in 5° distance bins (with range 70-120°) from 600 earthquakes recorded at 193 stations stacking a total of 7320 seismic records. As the energy trailing the direct P arrival, the P coda, interferes with the PP precursors, we suppress the P coda by subtracting a best fitting exponential curve to this energy. The resultant stacks show that PP precursors related to scattering from heterogeneities in the mantle are present for all distances. Lateral variations are explored by producing two regional stacks across the Atlantic and Pacific hemispheres, but we find only negligible differences in the precursory signature between these two regions. The similarity of these two regions suggests that well mixed subducted material can survive at upper and mid-mantle depth. To describe the scattered wavefield in the mantle, we compare the global stacks to synthetic seismograms generated using a Monte Carlo phonon scattering technique. We propose a best-fitting layered heterogeneity model, BRT2017, characterised by a three layer mantle with a background heterogeneity strength (ɛ = 0.8%) and a depth-interval of increased heterogeneity strength (ɛ = 1%) between 1000 km and 1800 km. The scalelength of heterogeneity is found to be 8 km throughout the mantle. Since mantle heterogeneity of 8 km scale may be linked to subducted oceanic crust, the detection of increased heterogeneity at mid-mantle depths could be associated with stalled slabs due to increases in viscosity
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

Science.gov (United States)

Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

2004-06-12

The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se
Band Alignment of 2D Transition Metal Dichalcogenide Heterojunctions

KAUST Repository

Chiu, Ming-Hui

2016-09-20

It is critically important to characterize the band alignment in semiconductor heterojunctions (HJs) because it controls the electronic and optical properties. However, the well-known Anderson\\'s model usually fails to predict the band alignment in bulk HJ systems due to the presence of charge transfer at the interfacial bonding. Atomically thin 2D transition metal dichalcogenide materials have attracted much attention recently since the ultrathin HJs and devices can be easily built and they are promising for future electronics. The vertical HJs based on 2D materials can be constructed via van der Waals stacking regardless of the lattice mismatch between two materials. Despite the defect-free characteristics of the junction interface, experimental evidence is still lacking on whether the simple Anderson rule can predict the band alignment of HJs. Here, the validity of Anderson\\'s model is verified for the 2D heterojunction systems and the success of Anderson\\'s model is attributed to the absence of dangling bonds (i.e., interface dipoles) at the van der Waal interface. The results from the work set a foundation allowing the use of powerful Anderson\\'s rule to determine the band alignments of 2D HJs, which is beneficial to future electronic, photonic, and optoelectronic devices. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Stack gas treatment

Science.gov (United States)

Reeves, Adam A.

1977-04-12

Hot stack gases transfer contained heat to a gravity flow of pebbles treated with a catalyst, cooled stacked gases and a sulfuric acid mist is withdrawn from the unit, and heat picked up by the pebbles is transferred to air for combustion or other process. The sulfuric acid (or sulfur, depending on the catalyst) is withdrawn in a recovery unit.
A Time-predictable Stack Cache

DEFF Research Database (Denmark)

Abbaspour, Sahar; Brandner, Florian; Schoeberl, Martin

2013-01-01

Real-time systems need time-predictable architectures to support static worst-case execution time (WCET) analysis. One architectural feature, the data cache, is hard to analyze when different data areas (e.g., heap allocated and stack allocated data) share the same cache. This sharing leads to le...... of a cache for stack allocated data. Our port of the LLVM C++ compiler supports the management of the stack cache. The combination of stack cache instructions and the hardware implementation of the stack cache is a further step towards timepredictable architectures.......Real-time systems need time-predictable architectures to support static worst-case execution time (WCET) analysis. One architectural feature, the data cache, is hard to analyze when different data areas (e.g., heap allocated and stack allocated data) share the same cache. This sharing leads to less...... precise results of the cache analysis part of the WCET analysis. Splitting the data cache for different data areas enables composable data cache analysis. The WCET analysis tool can analyze the accesses to these different data areas independently. In this paper we present the design and implementation...
Sedimentation stacking diagram of binary colloidal mixtures and bulk phases in the plane of chemical potentials

International Nuclear Information System (INIS)

Heras, Daniel de las; Schmidt, Matthias

2015-01-01

We give a full account of a recently proposed theory that explicitly relates the bulk phase diagram of a binary colloidal mixture to its phase stacking phenomenology under gravity (de las Heras and Schmidt 2013 Soft Matter 9 8636). As we demonstrate, the full set of possible phase stacking sequences in sedimentation-diffusion equilibrium originates from straight lines (sedimentation paths) in the chemical potential representation of the bulk phase diagram. From the analysis of various standard topologies of bulk phase diagrams, we conclude that the corresponding sedimentation stacking diagrams can be very rich, even more so when finite sample height is taken into account. We apply the theory to obtain the stacking diagram of a mixture of nonadsorbing polymers and colloids. We also present a catalog of generic phase diagrams in the plane of chemical potentials in order to facilitate the practical application of our concept, which also generalizes to multi-component mixtures. (paper)
Survey and alignment of high energy physics accelerators and transport lines

International Nuclear Information System (INIS)

Ruland, R.E.

1992-11-01

This talk summarizes the survey and alignment processes of accelerators and transport lines and discusses the propagation of errors associated with these processes. The major geodetic principles governing the survey and alignment measurement space are revisited and their relationship to a lattice coordinate system shown. The paper continues with a broad overview about the activities involved in the step by step sequence from initial absolute alignment to final smoothing. Emphasis is given to the relative alignment of components, in particular to the importance of incorporating methods to remove residual systematic effects in surveying and alignment operations
SAMMate: a GUI tool for processing short read alignments in SAM/BAM format

Directory of Open Access Journals (Sweden)

Flemington Erik

2011-01-01

Full Text Available Abstract Background Next Generation Sequencing (NGS technology generates tens of millions of short reads for each DNA/RNA sample. A key step in NGS data analysis is the short read alignment of the generated sequences to a reference genome. Although storing alignment information in the Sequence Alignment/Map (SAM or Binary SAM (BAM format is now standard, biomedical researchers still have difficulty accessing this information. Results We have developed a Graphical User Interface (GUI software tool named SAMMate. SAMMate allows biomedical researchers to quickly process SAM/BAM files and is compatible with both single-end and paired-end sequencing technologies. SAMMate also automates some standard procedures in DNA-seq and RNA-seq data analysis. Using either standard or customized annotation files, SAMMate allows users to accurately calculate the short read coverage of genomic intervals. In particular, for RNA-seq data SAMMate can accurately calculate the gene expression abundance scores for customized genomic intervals using short reads originating from both exons and exon-exon junctions. Furthermore, SAMMate can quickly calculate a whole-genome signal map at base-wise resolution allowing researchers to solve an array of bioinformatics problems. Finally, SAMMate can export both a wiggle file for alignment visualization in the UCSC genome browser and an alignment statistics report. The biological impact of these features is demonstrated via several case studies that predict miRNA targets using short read alignment information files. Conclusions With just a few mouse clicks, SAMMate will provide biomedical researchers easy access to important alignment information stored in SAM/BAM files. Our software is constantly updated and will greatly facilitate the downstream analysis of NGS data. Both the source code and the GUI executable are freely available under the GNU General Public License at http://sammate.sourceforge.net.
A MapReduce Framework for DNA Sequencing Data Processing

Directory of Open Access Journals (Sweden)

Samy Ghoneimy

2016-12-01

Full Text Available Genomics and Next Generation Sequencers (NGS like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA, Sequence Alignment/Map (SAM ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable

Protein Alignment on the Intel Xeon Phi Coprocessor

OpenAIRE

Ramstad, Jorun

2015-01-01

There is an increasing need for sensitive, high perfomance sequence alignemnet tools. With the growing databases of scientificly analyzed protein sequences, more compute power is necessary. Specialized architectures arise, and a transition from serial to specialized implementationsis is required. This thesis is a study of whether Intel 60's cores Xeon Phi coprocessor is a suitable architecture for implementation of a sequence alignment tool. The performance relative to existing tools are eval...
Implementation of a Parallel Protein Structure Alignment Service on Cloud

Directory of Open Access Journals (Sweden)

Che-Lun Hung

2013-01-01

Full Text Available Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.
FACIES PARTITIONING AND SEQUENCE STRATIGRAPHY OF A MIXED SILICICLASTIC-CARBONATE RAMP STACK IN THE GELASIAN OF SICILY (S ITALY: A POTENTIAL MODEL FOR ICEHOUSE, DISTALLY-STEEPENED HETEROZOAN RAMPS

Directory of Open Access Journals (Sweden)

FRANCESCO MASSARI

2012-11-01

Full Text Available The Gelasian succession of the Capodarso area (Enna-Caltanissetta basin, Sicily, Italy consists of an offlapping stack of cycles composed of siliciclastic units passing to carbonate heterozoan, clino-stratified wedges, developed from a growing positive tectonic structure. Identification of a number of facies tracts, based on sedimentary facies, biofacies and taphofacies, provided important information about the differentiation and characterisation of systems tracts and key stratal surfaces of sequence stratigraphy. The bulk of carbonate wedges are interpreted as representing the rapid falling-stage progradation of distally steepened ramps. The inferred highest rate of carbonate production during forced regressions was concomitant with active downramp resedimentation by storm-driven downwelling flows, leading to storing of most carbonate sediment on the ramp slope as clino-beds of the prograding bodies. Comparison of the Capodarso ramps with other icehouse carbonate ramps, with particular regard to the Mediterranean Plio-Pleistocene, provides clues for defining some common features. These are inferred to include: (1 brief, rapid episodes of progradation concomitant with orbitally-forced sea-level changes, resulting in limited ramp width; (2 preferential fostering of growth and downramp resedimentation of heterozoan carbonates during glacial hemicycles marked by enhanced atmospheric and marine circulation; (3 building out from positive features of entirely submerged distally-steepened ramps with storm-wave-graded profile and distinctive clinoforms; (4 ramp stacks generally consisting of mixed clastic-carbonate sequences showing an ordered spectrum of distinct frequencies; (5 rapid, continuous changes in environmental parameters, leading to the short-lived persistence of faunal communities, climax communities generally having insufficient time to form.
Lightweight Stacks of Direct Methanol Fuel Cells

Science.gov (United States)

Narayanan, Sekharipuram; Valdez, Thomas

2004-01-01

An improved design concept for direct methanol fuel cells makes it possible to construct fuel-cell stacks that can weigh as little as one-third as much as do conventional bipolar fuel-cell stacks of equal power. The structural-support components of the improved cells and stacks can be made of relatively inexpensive plastics. Moreover, in comparison with conventional bipolar fuel-cell stacks, the improved fuel-cell stacks can be assembled, disassembled, and diagnosed for malfunctions more easily. These improvements are expected to bring portable direct methanol fuel cells and stacks closer to commercialization. In a conventional bipolar fuel-cell stack, the cells are interspersed with bipolar plates (also called biplates), which are structural components that serve to interconnect the cells and distribute the reactants (methanol and air). The cells and biplates are sandwiched between metal end plates. Usually, the stack is held together under pressure by tie rods that clamp the end plates. The bipolar stack configuration offers the advantage of very low internal electrical resistance. However, when the power output of a stack is only a few watts, the very low internal resistance of a bipolar stack is not absolutely necessary for keeping the internal power loss acceptably low.
Alignment of muscle precursor cells on the vertical edges of thick carbon nanotube films

Energy Technology Data Exchange (ETDEWEB)

Holt, Ian, E-mail: ian.holt@rjah.nhs.uk [Wolfson Centre for Inherited Neuromuscular Disease, RJAH Orthopaedic Hospital, Oswestry, Shropshire SY10 7AG (United Kingdom); Institute for Science and Technology in Medicine, Keele University, Keele, Staffordshire ST5 5BG (United Kingdom); Gestmann, Ingo, E-mail: Ingo.Gestmann@fei.com [FEI Europe B.V., Achtseweg Noord 5, 5651 Eindhoven (Netherlands); Wright, Andrew C., E-mail: a.wright@glyndwr.ac.uk [Advanced Materials Research Laboratory, Glyndwr University, Plas Coch, Mold Rd, Wrexham LL11 2AW (United Kingdom)

2013-10-15

The development of scaffolds and templates is an essential aspect of tissue engineering. We show that thick (> 0.5 mm) vertically aligned carbon nanotube films, made by chemical vapour deposition, can be used as biocompatible substrates for the directional alignment of mouse muscle cells where the cells grow on the exposed sides of the films. Ultra high resolution scanning electron microscopy reveals that the films themselves consist mostly of small diameter (10 nm) multi-wall carbon nanotubes of wavy morphology with some single wall carbon nanotubes. Our findings show that for this alignment to occur the nanotubes must be in pristine condition. Mechanical wiping of the films to create directional alignment is detrimental to directional bioactivity. Larger areas for study have been formed from a composite of multiply stacked narrow strips of nanotubes wipe-transferred onto elastomer supports. These composite substrates appear to show a useful degree of alignment of the cells. Highlights: • Highly oriented muscle precursor cells grown on edges of carbon nanotube pads • Mechanical treatment of nanotube pads highly deleterious to cell growth on edges • Larger areas created from wipe-transfer of narrow strips of nanotubes onto elastomer supports • Very high resolution SEM reveals clues to aligned cell growth.
Alignment of muscle precursor cells on the vertical edges of thick carbon nanotube films

International Nuclear Information System (INIS)

Holt, Ian; Gestmann, Ingo; Wright, Andrew C.

2013-01-01

The development of scaffolds and templates is an essential aspect of tissue engineering. We show that thick (> 0.5 mm) vertically aligned carbon nanotube films, made by chemical vapour deposition, can be used as biocompatible substrates for the directional alignment of mouse muscle cells where the cells grow on the exposed sides of the films. Ultra high resolution scanning electron microscopy reveals that the films themselves consist mostly of small diameter (10 nm) multi-wall carbon nanotubes of wavy morphology with some single wall carbon nanotubes. Our findings show that for this alignment to occur the nanotubes must be in pristine condition. Mechanical wiping of the films to create directional alignment is detrimental to directional bioactivity. Larger areas for study have been formed from a composite of multiply stacked narrow strips of nanotubes wipe-transferred onto elastomer supports. These composite substrates appear to show a useful degree of alignment of the cells. Highlights: • Highly oriented muscle precursor cells grown on edges of carbon nanotube pads • Mechanical treatment of nanotube pads highly deleterious to cell growth on edges • Larger areas created from wipe-transfer of narrow strips of nanotubes onto elastomer supports • Very high resolution SEM reveals clues to aligned cell growth
Quantification of Cardiomyocyte Alignment from Three-Dimensional (3D) Confocal Microscopy of Engineered Tissue.

Science.gov (United States)

Kowalski, William J; Yuan, Fangping; Nakane, Takeichiro; Masumoto, Hidetoshi; Dwenger, Marc; Ye, Fei; Tinney, Joseph P; Keller, Bradley B

2017-08-01

Biological tissues have complex, three-dimensional (3D) organizations of cells and matrix factors that provide the architecture necessary to meet morphogenic and functional demands. Disordered cell alignment is associated with congenital heart disease, cardiomyopathy, and neurodegenerative diseases and repairing or replacing these tissues using engineered constructs may improve regenerative capacity. However, optimizing cell alignment within engineered tissues requires quantitative 3D data on cell orientations and both efficient and validated processing algorithms. We developed an automated method to measure local 3D orientations based on structure tensor analysis and incorporated an adaptive subregion size to account for multiple scales. Our method calculates the statistical concentration parameter, κ, to quantify alignment, as well as the traditional orientational order parameter. We validated our method using synthetic images and accurately measured principal axis and concentration. We then applied our method to confocal stacks of cleared, whole-mount engineered cardiac tissues generated from human-induced pluripotent stem cells or embryonic chick cardiac cells and quantified cardiomyocyte alignment. We found significant differences in alignment based on cellular composition and tissue geometry. These results from our synthetic images and confocal data demonstrate the efficiency and accuracy of our method to measure alignment in 3D tissues.
Design of an Image-Servo Mask Alignment System Using Dual CCDs with an XXY Stage

Directory of Open Access Journals (Sweden)

Chih-Jer Lin

2016-02-01

Full Text Available Mask alignment of photolithography technology is used in many applications, such as micro electro mechanical systems’ semiconductor process, printed circuits board, and flat panel display. As the dimensions of the product are getting smaller and smaller, the automatic mask alignment of photolithography is becoming more and more important. The traditional stacked XY-Θz stage is heavy and it has cumulative flatness errors due to its stacked assembly mechanism. The XXY stage has smaller cumulative error due to its coplanar design and it can move faster than the traditional XY-Θz stage. However, the relationship between the XXY stage’s movement and the commands of the three motors is difficult to compute, because the movements of the three motors on the same plane are coupling. Therefore, an artificial neural network is studied to establish a nonlinear mapping from the desired position and orientation of the stage to three motors’ commands. Further, this paper proposes an image-servo automatic mask alignment system, which consists of a coplanar XXY stage, dual GIGA-E CCDs with lens and a programmable automatic controller (PAC. Before preforming the compensation, a self-developed visual-servo provides the positioning information which is obtained from the image processing and pattern recognition according to the specified fiducial marks. To obtain better precision, two methods including the center of gravity method and the generalize Hough Transformation are studied to correct the shift positioning error.
Evaluation of microRNA alignment techniques

Science.gov (United States)

Kaspi, Antony; El-Osta, Assam

2016-01-01

Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164
Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

Science.gov (United States)

Nishizawa, M; Nishizawa, K

2000-10-01

The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.
Statistical alignment: computational properties, homology testing and goodness-of-fit

DEFF Research Database (Denmark)

Hein, J; Wiuf, Carsten; Møller, Martin

2000-01-01

The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical...... alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum...... analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test...
TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.

Science.gov (United States)

Menges, Fabian; Narzisi, Giuseppe; Mishra, Bud

2011-09-01

Currently, re-sequencing approaches use multiple modules serially to interpret raw sequencing data from next-generation sequencing platforms, while remaining oblivious to the genomic information until the final alignment step. Such approaches fail to exploit the full information from both raw sequencing data and the reference genome that can yield better quality sequence reads, SNP-calls, variant detection, as well as an alignment at the best possible location in the reference genome. Thus, there is a need for novel reference-guided bioinformatics algorithms for interpreting analog signals representing sequences of the bases ({A, C, G, T}), while simultaneously aligning possible sequence reads to a source reference genome whenever available. Here, we propose a new base-calling algorithm, TotalReCaller, to achieve improved performance. A linear error model for the raw intensity data and Burrows-Wheeler transform (BWT) based alignment are combined utilizing a Bayesian score function, which is then globally optimized over all possible genomic locations using an efficient branch-and-bound approach. The algorithm has been implemented in soft- and hardware [field-programmable gate array (FPGA)] to achieve real-time performance. Empirical results on real high-throughput Illumina data were used to evaluate TotalReCaller's performance relative to its peers-Bustard, BayesCall, Ibis and Rolexa-based on several criteria, particularly those important in clinical and scientific applications. Namely, it was evaluated for (i) its base-calling speed and throughput, (ii) its read accuracy and (iii) its specificity and sensitivity in variant calling. A software implementation of TotalReCaller as well as additional information, is available at: http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/ fabian.menges@nyu.edu.
Helping Students Design HyperCard Stacks.

Science.gov (United States)

Dunham, Ken

1995-01-01

Discusses how to teach students to design HyperCard stacks. Highlights include introducing HyperCard, developing storyboards, introducing design concepts and scripts, presenting stacks, evaluating storyboards, and continuing projects. A sidebar presents a HyperCard stack evaluation form. (AEF)
Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer

Directory of Open Access Journals (Sweden)

Kujin Tang

2018-04-01

Full Text Available Horizontal gene transfer (HGT plays an important role in the evolution of microbial organisms including bacteria. Alignment-free methods based on single genome compositional information have been used to detect HGT. Currently, Manhattan and Euclidean distances based on tetranucleotide frequencies are the most commonly used alignment-free dissimilarity measures to detect HGT. By testing on simulated bacterial sequences and real data sets with known horizontal transferred genomic regions, we found that more advanced alignment-free dissimilarity measures such as CVTree and d2* that take into account the background Markov sequences can solve HGT detection problems with significantly improved performance. We also studied the influence of different factors such as evolutionary distance between host and donor sequences, size of sliding window, and host genome composition on the performances of alignment-free methods to detect HGT. Our study showed that alignment-free methods can predict HGT accurately when host and donor genomes are in different order levels. Among all methods, CVTree with word length of 3, d2* with word length 3, Markov order 1 and d2* with word length 4, Markov order 1 outperform others in terms of their highest F1-score and their robustness under the influence of different factors.
Modular fuel-cell stack assembly

Science.gov (United States)

Patel, Pinakin

2010-07-13

A fuel cell assembly having a plurality of fuel cells arranged in a stack. An end plate assembly abuts the fuel cell at an end of said stack. The end plate assembly has an inlet area adapted to receive an exhaust gas from the stack, an outlet area and a passage connecting the inlet area and outlet area and adapted to carry the exhaust gas received at the inlet area from the inlet area to the outlet area. A further end plate assembly abuts the fuel cell at a further opposing end of the stack. The further end plate assembly has a further inlet area adapted to receive a further exhaust gas from the stack, a further outlet area and a further passage connecting the further inlet area and further outlet area and adapted to carry the further exhaust gas received at the further inlet area from the further inlet area to the further outlet area.
Evaluation of field emission properties from multiple-stacked Si quantum dots

International Nuclear Information System (INIS)

Takeuchi, Daichi; Makihara, Katsunori; Ohta, Akio; Ikeda, Mitsuhisa; Miyazaki, Seiichi

2016-01-01

Multiple-stacked Si quantum dots (QDs) with ultrathin SiO 2 interlayers were formed on ultrathin SiO 2 layers by repeating a process sequence consisting of the formation of Si-QDs by low pressure chemical vapor deposition using a SiH 4 gas and the surface oxidation and subsequent surface modification by remote hydrogen and oxygen plasmas, respectively. To clarify the electron emission mechanism from multiple-stacked Si-QDs covered with an ultrathin Au top electrode, the energy distribution of the emitted electrons and its electric field dependence was measured using a hemispherical electron energy analyzer in an X-ray photoelectron spectroscopy system under DC bias application to the multiple-stacked Si-QD structure. At − 6 V and over, the energy distributions reached a peak at ~ 2.5 eV with a tail toward the higher energy side. While the electron emission intensity was increased exponentially with an increase in the applied DC bias, there was no significant increase in the emission peak energy. The observed emission characteristics can be interpreted in terms of field emissions from the second and/or third topmost Si-QDs resulting from the electric concentration there. - Highlights: • Electron field emission from 6-fold stack of Si-QDs has been evaluated. • AFM measurements show the local electron emission from individual Si-QDs. • Impact of applied bias on the electron emission energy distribution was investigated.
USPIO-enhanced 3D-cine self-gated cardiac MRI based on a stack-of-stars golden angle short echo time sequence: Application on mice with acute myocardial infarction.

Science.gov (United States)

Trotier, Aurélien J; Castets, Charles R; Lefrançois, William; Ribot, Emeline J; Franconi, Jean-Michel; Thiaudière, Eric; Miraux, Sylvain

2016-08-01

To develop and assess a 3D-cine self-gated method for cardiac imaging of murine models. A 3D stack-of-stars (SOS) short echo time (STE) sequence with a navigator echo was performed at 7T on healthy mice (n = 4) and mice with acute myocardial infarction (MI) (n = 4) injected with ultrasmall superparamagnetic iron oxide (USPIO) nanoparticles. In all, 402 spokes were acquired per stack with the incremental or the golden angle method using an angle increment of (360/402)° or 222.48°, respectively. A cylindrical k-space was filled and repeated with a maximum number of repetitions (NR) of 10. 3D cine cardiac images at 156 μm resolution were reconstructed retrospectively and compared for the two methods in terms of contrast-to-noise ratio (CNR). The golden angle images were also reconstructed with NR = 10, 6, and 3, to assess cardiac functional parameters (ejection fraction, EF) on both animal models. The combination of 3D SOS-STE and USPIO injection allowed us to optimize the identification of cardiac peaks on navigator signal and generate high CNR between blood and myocardium (15.3 ± 1.0). The golden angle method resulted in a more homogeneous distribution of the spokes inside a stack (P cine images could be obtained without electrocardiogram or respiratory gating in mice. It allows precise measurement of cardiac functional parameters even on MI mice. J. Magn. Reson. Imaging 2016;44:355-365. © 2016 Wiley Periodicals, Inc.
Guanine base stacking in G-quadruplex nucleic acids

Science.gov (United States)

Lech, Christopher Jacques; Heddi, Brahim; Phan, Anh Tuân

2013-01-01

G-quadruplexes constitute a class of nucleic acid structures defined by stacked guanine tetrads (or G-tetrads) with guanine bases from neighboring tetrads stacking with one another within the G-tetrad core. Individual G-quadruplexes can also stack with one another at their G-tetrad interface leading to higher-order structures as observed in telomeric repeat-containing DNA and RNA. In this study, we investigate how guanine base stacking influences the stability of G-quadruplexes and their stacked higher-order structures. A structural survey of the Protein Data Bank is conducted to characterize experimentally observed guanine base stacking geometries within the core of G-quadruplexes and at the interface between stacked G-quadruplex structures. We couple this survey with a systematic computational examination of stacked G-tetrad energy landscapes using quantum mechanical computations. Energy calculations of stacked G-tetrads reveal large energy differences of up to 12 kcal/mol between experimentally observed geometries at the interface of stacked G-quadruplexes. Energy landscapes are also computed using an AMBER molecular mechanics description of stacking energy and are shown to agree quite well with quantum mechanical calculated landscapes. Molecular dynamics simulations provide a structural explanation for the experimentally observed preference of parallel G-quadruplexes to stack in a 5′–5′ manner based on different accessible tetrad stacking modes at the stacking interfaces of 5′–5′ and 3′–3′ stacked G-quadruplexes. PMID:23268444
Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

Directory of Open Access Journals (Sweden)

Thierry-Mieg Danielle

2009-06-01

Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.
Stack filter classifiers

Energy Technology Data Exchange (ETDEWEB)

Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory

2009-01-01

Just as linear models generalize the sample mean and weighted average, weighted order statistic models generalize the sample median and weighted median. This analogy can be continued informally to generalized additive modeels in the case of the mean, and Stack Filters in the case of the median. Both of these model classes have been extensively studied for signal and image processing but it is surprising to find that for pattern classification, their treatment has been significantly one sided. Generalized additive models are now a major tool in pattern classification and many different learning algorithms have been developed to fit model parameters to finite data. However Stack Filters remain largely confined to signal and image processing and learning algorithms for classification are yet to be seen. This paper is a step towards Stack Filter Classifiers and it shows that the approach is interesting from both a theoretical and a practical perspective.

Study of interfaces and band offsets in TiN/amorphous LaLuO3 gate stacks

KAUST Repository

Mitrovic, Ivona Z.

2011-07-01

TiN/LaLuO3 (LLO) gate stacks formed by molecular beam deposition have been investigated by X-ray photoelectron spectroscopy, medium energy ion scattering, spectroscopic ellipsometry, scanning transmission electron microscopy, electron energy loss spectroscopy and atomic force microscopy. The results indicate an amorphous structure for deposited LLO films. The band offset between the Fermi level of TiN and valence band of LLO is estimated to be 2.65 ± 0.05 eV. A weaker La-O-Lu bond and a prominent Ti2p sub-peak which relates to Ti bond to interstitial oxygen have been identified for an ultra-thin 1.7 nm TiN/3 nm LLO gate stack. The angle-dependent XPS analysis of Si2s spectra as well as shifts of La4d, La3d and Lu4d core levels suggests a silicate-type with Si-rich SiOx LLO/Si interface. Symmetrical valence and conduction band offsets for LLO to Si of 2.2 eV and the bandgap of 5.5 ± 0.1 eV have been derived from the measurements. The band alignment for ultra-thin TiN/LLO gate stack is affected by structural changes. Copyright © 2011 Published by Elsevier B.V. All rights reserved.
Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space

Directory of Open Access Journals (Sweden)

Richard Wilton

2015-03-01

Full Text Available When computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU hardware.We followed this approach in implementing a read aligner called Arioc that uses GPU-based parallel sort and reduction techniques to identify high-priority locations where potential alignments may be found. We then carried out a read-by-read comparison of Arioc’s reported alignments with the alignments found by several leading read aligners. With simulated reads, Arioc has comparable or better accuracy than the other read aligners we tested. With human sequencing reads, Arioc demonstrates significantly greater throughput than the other aligners we evaluated across a wide range of sensitivity settings. The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.
From the components to the stack. Developing and designing 5kW HT-PEFC stacks; Von der Komponente zum Stack. Entwicklung und Auslegung von HT-PEFC-Stacks der 5 kW-Klasse

Energy Technology Data Exchange (ETDEWEB)

Bendzulla, Anne

2010-12-22

The aim of the present project is to develop a stack design for a 5-kW HTPEFC system. First, the state of the art of potential materials and process designs will be discussed for each component. Then, using this as a basis, three potential stack designs with typical attributes will be developed and assessed in terms of practicality with the aid of a specially derived evaluation method. Two stack designs classified as promising will be discussed in detail, constructed and then characterized using short stack tests. Comparing the stack designs reveals that both designs are fundamentally suitable for application in a HT-PEFC system with on-board supply. However, some of the performance data differ significantly for the two stack designs. The preferred stack design for application in a HT-PEFC system is characterized by robust operating behaviour and reproducible high-level performance data. Moreover, in compact constructions (120 W/l at 60 W/kg), the stack design allows flexible cooling with thermal oil or air, which can be adapted to suit specific applications. Furthermore, a defined temperature gradient can be set during operation, allowing the CO tolerance to be increased by up to 10 mV. The short stack design developed within the scope of the present work therefore represents an ideal basis for developing a 5-kW HT-PEFC system. Topics for further research activities include improving the performance by reducing weight and/or volume, as well as optimizing the heat management. The results achieved within the framework of this work clearly show that HTPEFC stacks have the potential to play a decisive role in increasing efficiency in the future, particularly when combined with an on-board supply system. (orig.) [German] Ziel der vorliegenden Arbeit ist die Entwicklung eines Stackkonzeptes fuer ein 5 kW-HT-PEFC System. Dazu wird zunaechst fuer jede Komponente der Stand der Technik moeglicher Materialien und Prozesskonzepte diskutiert. Darauf aufbauend werden drei
Node fingerprinting: an efficient heuristic for aligning biological networks.

Science.gov (United States)

Radu, Alex; Charleston, Michael

2014-10-01

With the continuing increase in availability of biological data and improvements to biological models, biological network analysis has become a promising area of research. An emerging technique for the analysis of biological networks is through network alignment. Network alignment has been used to calculate genetic distance, similarities between regulatory structures, and the effect of external forces on gene expression, and to depict conditional activity of expression modules in cancer. Network alignment is algorithmically complex, and therefore we must rely on heuristics, ideally as efficient and accurate as possible. The majority of current techniques for network alignment rely on precomputed information, such as with protein sequence alignment, or on tunable network alignment parameters, which may introduce an increased computational overhead. Our presented algorithm, which we call Node Fingerprinting (NF), is appropriate for performing global pairwise network alignment without precomputation or tuning, can be fully parallelized, and is able to quickly compute an accurate alignment between two biological networks. It has performed as well as or better than existing algorithms on biological and simulated data, and with fewer computational resources. The algorithmic validation performed demonstrates the low computational resource requirements of NF.
Text-Filled Stacked Area Graphs

DEFF Research Database (Denmark)

Kraus, Martin

2011-01-01

-filled stacked area graphs; i.e., graphs that feature stacked areas that are filled with small-typed text. Since these graphs allow for computing the text layout automatically, it is possible to include large amounts of textual detail with very little effort. We discuss the most important challenges and some...... solutions for the design of text-filled stacked area graphs with the help of an exemplary visualization of the genres, publication years, and titles of a database of several thousand PC games....
Formation of long-period stacking ordered structures in Mg88M5Y7 (M = Ti, Ni and Pb) casting alloys

International Nuclear Information System (INIS)

Jin, Qian-Qian; Fang, Can-Feng; Mi, Shao-Bo

2013-01-01

Highlights: •Apart from 18R-LPSO, 14H-LPSO structure was determined in the Mg-Ni-Y alloys. •The appearance of twin-related structure in 18R-LPSO structure results from the stacking faults in the stacking sequence of the closely packed planes. •A new (Pb, Mg) 2 Y phase with a body-centered orthorhombic structure was determined in the Mg-Pb-Y alloy. •No LPSO structures were found in the Mg-Pb-Y and Mg-Ti-Y casting alloys. -- Abstract: Formation of long-period stacking ordered (LPSO) structures is investigated in Mg 88 M 5 Y 7 (M = Ti, Ni and Pb) casting alloys by means of electron microscopy and X-ray diffraction. In the Mg 88 Ni 5 Y 7 casting alloy, 14H-LPSO structure is observed in a small amount, which coexists with 18R-LPSO structure. The appearance of stacking faults in 18R-LPSO structure results in twin-related structure in the stacking sequence of the closely packed planes. A new (Pb, Mg) 2 Y phase with a body-centered orthorhombic structure is determined in the Mg 88 Pb 5 Y 7 alloy. No LPSO structures are found in the Mg 88 Pb 5 Y 7 and Mg 88 Ti 5 Y 7 casting alloys. In terms of the atomic radius and heat of mixing, the formation ability of LPSO structure in the present alloys is discussed
Alignment enhancement of a symmetric top molecule by two short laser pulses

DEFF Research Database (Denmark)

Bisgaard, Christer Z; Viftrup, Simon; Stapelfeldt, Henrik

2006-01-01

equation. It is shown that the strongest degree of one-dimensional (single axis) field-free alignment obtainable with a single pulse can be enhanced using the two-pulse sequence in a parallel polarization geometry. The conditions for alignment enhancement are: (1) The second pulse must be sent near...
Van der Waals stacks of few-layer h-AlN with graphene: an ab initio study of structural, interaction and electronic properties

International Nuclear Information System (INIS)

Dos Santos, Renato B; Mota, F de Brito; Rivelino, R; Kakanakova-Georgieva, A; Gueorguiev, G K

2016-01-01

Graphite-like hexagonal AlN (h-AlN) multilayers have been experimentally manifested and theoretically modeled. The development of any functional electronics applications of h-AlN would most certainly require its integration with other layered materials, particularly graphene. Here, by employing vdW-corrected density functional theory calculations, we investigate structure, interaction energy, and electronic properties of van der Waals stacking sequences of few-layer h-AlN with graphene. We find that the presence of a template such as graphene induces enough interlayer charge separation in h-AlN, favoring a graphite-like stacking formation. We also find that the interface dipole, calculated per unit cell of the stacks, tends to increase with the number of stacked layers of h-AlN and graphene. (paper)
Maturing of SOFC cell and stack production technology and preparation for demonstration of SOFC stacks. Part 2

Energy Technology Data Exchange (ETDEWEB)

2006-07-01

The TOFC/Riso pilot plant production facility for the manufacture of anode-supported cells has been further up-scaled with an automated continuous spraying process and an extra sintering capacity resulting in production capacity exceeding 15,000 standard cells (12x12 cm2) in 2006 with a success rate of about 85% in the cell production. All processing steps such as tape-casting, spraying, screen-printing and atmospheric air sintering in the cell production have been selected on condition that up-scaling and cost effective, flexible, industrial mass production are feasible. The standard cell size is currently being increased to 18x18 cm2, and 150 cells of this size have been produced in 2006 for our further stack development. To improve quality and lower production cost, a new screen printing line is under establishment. TOFC's stack design is an ultra compact multilayer assembly of cells (including contact layers), metallic interconnects, spacer frames and glass seals. The compactness ensures minimized material consumption and low cost. Standard stacks with cross flow configuration contains 75 cells (12x12cm2) delivering about 1.2 kW at optimal operation conditions with pre-reformed NG as fuel. Stable performance has been demonstrated for 500-1000 hours. Significantly improved materials, especially concerning the metallic interconnect and the coatings have been introduced during the last year. Small stacks (5-10 cells) exhibit no detectable stack degradation using our latest cells and stack materials during test periods of 500-1000 hours. Larger stacks (50-75 cells) suffer from mal-distribution of gas and air inside the stacks, gas leakage, gas cross-over, pressure drop, and a certain loss of internal electrical contact during operation cycles. Measures have been taken to find solutions during the following development work. The stack production facilities have been improved and up-scaled. In 2006, 5 standard stacks have been assembled and burned in based on
High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

Directory of Open Access Journals (Sweden)

Khaled Benkrid

2012-01-01

Full Text Available This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs, Graphics Processor Units (GPUs, and IBM’s Cell Broadband Engine (Cell BE, in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools, FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.
Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

Science.gov (United States)

King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

2014-01-01

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Opto-mechanical devices for the Antares automatic beam alignment system

International Nuclear Information System (INIS)

Swann, T.; Combs, C.; Witt, J.

1981-01-01

Antares is a 24-beam CO 2 laser system for controlled fusion research, under construction at Los Alamos National Laboratory. Rapid automatic alignment of this system is required prior to each experimental shot. Unique opto-mechanical alignment devices, which have been developed specifically for this automatic alignment system, are discussed. A variable focus alignment telescope views point light sources. A beam expander/spatial filter processes both a visible Krypton Ion and a 10.6 μm CO 2 alignment laser. The periscope/carousel device provides the means by which the alignment telescope can sequentially view each of twelve optical trains in each power amplifier. The polyhedron alignment device projects a point-light source for both centering and pointing alignment at the polyhedron mirror. The rotating wedge alignment device provides a sequencing point-light source and also compensates for dispersion between visible and 10.6 μm radiation. The back reflector flip in remotely positions point-light sources at the back reflector mirrors. A light source box illuminates optic fibers with high intensity white light which is distributed to the various point-light sources in the system
Ultra-high-aspect-orthogonal and tunable three dimensional polymeric nanochannel stack array for BioMEMS applications

Science.gov (United States)

Heo, Joonseong; Kwon, Hyukjin J.; Jeon, Hyungkook; Kim, Bumjoo; Kim, Sung Jae; Lim, Geunbae

2014-07-01

Nanofabrication technologies have been a strong advocator for new scientific fundamentals that have never been described by traditional theory, and have played a seed role in ground-breaking nano-engineering applications. In this study, we fabricated ultra-high-aspect (~106 with O(100) nm nanochannel opening and O(100) mm length) orthogonal nanochannel array using only polymeric materials. Vertically aligned nanochannel arrays in parallel can be stacked to form a dense nano-structure. Due to the flexibility and stretchability of the material, one can tune the size and shape of the nanochannel using elongation and even roll the stack array to form a radial-uniformly distributed nanochannel array. The roll can be cut at discretionary lengths for incorporation with a micro/nanofluidic device. As examples, we demonstrated ion concentration polarization with the device for Ohmic-limiting/overlimiting current-voltage characteristics and preconcentrated charged species. The density of the nanochannel array was lower than conventional nanoporous membranes, such as anodic aluminum oxide membranes (AAO). However, accurate controllability over the nanochannel array dimensions enabled multiplexed one microstructure-on-one nanostructure interfacing for valuable biological/biomedical microelectromechanical system (BioMEMS) platforms, such as nano-electroporation.Nanofabrication technologies have been a strong advocator for new scientific fundamentals that have never been described by traditional theory, and have played a seed role in ground-breaking nano-engineering applications. In this study, we fabricated ultra-high-aspect (~106 with O(100) nm nanochannel opening and O(100) mm length) orthogonal nanochannel array using only polymeric materials. Vertically aligned nanochannel arrays in parallel can be stacked to form a dense nano-structure. Due to the flexibility and stretchability of the material, one can tune the size and shape of the nanochannel using elongation and even
Intrinsic alignments in redMaPPer clusters – I. Central galaxy alignments and angular segregation of satellites

International Nuclear Information System (INIS)

Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.; Chen, Yen-Chi

2016-01-01

The shapes of cluster central galaxies are not randomly oriented, but rather exhibit coherent alignments with the shapes of their parent clusters as well as with the surrounding large-scale structures. In this work, we aim to identify the galaxy and cluster quantities that most strongly predict the central galaxy alignment phenomenon among a large parameter space with a sample of 8237 clusters and 94 817 members within 0.1 < z < 0.35, based on the red-sequence Matched-filter Probabilistic Percolation cluster catalogue constructed from the Sloan Digital Sky Survey. We first quantify the alignment between the projected central galaxy shapes and the distribution of member satellites, to understand what central galaxy and cluster properties most strongly correlate with these alignments. Next, we investigate the angular segregation of satellites with respect to their central galaxy major axis directions, to identify the satellite properties that most strongly predict their angular segregation. We find that central galaxies are more aligned with their member galaxy distributions in clusters that are more elongated and have higher richness, and for central galaxies with larger physical size, higher luminosity and centring probability, and redder colour. Satellites with redder colour, higher luminosity, located closer to the central galaxy, and with smaller ellipticity show a stronger angular segregation towards their central galaxy major axes. Lastly, we provide physical explanations for some of the identified correlations, and discuss the connection to theories of central galaxy alignments, the impact of primordial alignments with tidal fields, and the importance of anisotropic accretion.
Vision Servo Motion Control and Error Analysis of a Coplanar XXY Stage for Image Alignment Motion

Directory of Open Access Journals (Sweden)

Hau-Wei Lee

2013-01-01

Full Text Available In recent years, as there is demand for smart mobile phones with touch panels, the alignment/compensation system of alignment stage with vision servo control has also increased. Due to the fact that the traditional stacked-type XYθ stage has cumulative errors of assembly and it is heavy, it has been gradually replaced by the coplanar stage characterized by three actuators on the same plane with three degrees of freedom. The simplest image alignment mode uses two cameras as the equipments for feedback control, and the work piece is placed on the working stage. The work piece is usually engraved/marked. After the cameras capture images and when the position of the mark in the camera is obtained by image processing, the mark can be moved to the designated position in the camera by moving the stage and using alignment algorithm. This study used a coplanar XXY stage with 1 μm positioning resolution. Due to the fact that the resolution of the camera is about 3.75 μm per pixel, thus a subpixel technology is used, and the linear and angular alignment repeatability of the alignment system can achieve 1 μm and 5 arcsec, respectively. The visual servo motion control for alignment motion is completed within 1 second using the coplanar XXY stage.
Template synthesis and magnetic properties of highly aligned barium hexaferrite (BaFe12O19) nanofibers

International Nuclear Information System (INIS)

Huang, Boneng; Li, Congju; Wang, Jiaona

2013-01-01

Using electrospun poly(ethylene terephthalate)/citric acid (PET/CA) microfibers as the template, highly aligned barium hexaferrite (BaFe 12 O 19 ) nanofibers with diameters of ca. 800 nm and lengths up to 2 cm were synthesized by sol–gel precursor coating technique and subsequent high temperature calcination. Structural and morphological investigations revealed that individual BaFe 12 O 19 nanofibers were composed of numerous nanocrystallites stacking alternatively along the nanofiber axis, the average grain size was ca. 225 nm and the single crystallites on each BaFe 12 O 19 nanofibers were of random orientations. The formation mechanism of aligned BaFe 12 O 19 nanofibers was proposed based on experiment. The magnetic measurement revealed that the aligned BaFe 12 O 19 nanofibers exhibited orientation-dependent magnetic behavior with respect to the applied magnetic field. The magnetic anisotropy with the easy magnetizing axis along the length of the nanofibers was due to the shape anisotropy. Such aligned magnetic nanofibers can find relevance in application requiring an orientation-dependent physical response. - Highlights: ► A simple method was used to synthesize the aligned BaFe 12 O 19 nanofibers. ► The aligned BaFe 12 O 19 nanofibers display an obvious orientation-dependent magnetic behavior. ► The method can be readily applied to other aligned one-dimensional inorganic nanomaterials
Graphene as transmissive electrodes and aligning layers for liquid-crystal-based electro-optic devices.

Science.gov (United States)

Basu, Rajratan; Shalov, Samuel A

2017-07-01

In a conventional liquid crystal (LC) cell, polyimide layers are used to align the LC homogeneously in the cell, and transmissive indium tin oxide (ITO) electrodes are used to apply the electric field to reorient the LC along the field. It is experimentally presented here that monolayer graphene films on the two glass substrates can function concurrently as the LC aligning layers and the transparent electrodes to fabricate an LC cell, without using the conventional polyimide and ITO substrates. This replacement can effectively decrease the thickness of all the alignment layers and electrodes from about 100 nm to less than 1 nm. The interaction between LC and graphene through π-π electron stacking imposes a planar alignment on the LC in the graphene-based cell-which is verified using a crossed polarized microscope. The graphene-based LC cell exhibits an excellent nematic director reorientation process from planar to homeotropic configuration through the application of an electric field-which is probed by dielectric and electro-optic measurements. Finally, it is shown that the electro-optic switching is significantly faster in the graphene-based LC cell than in a conventional ITO-polyimide LC cell.
A Mid-Holocene Relative Sea-Level Stack, New Jersey, USA

Science.gov (United States)

Horton, B.; Walker, J. S.; Kemp, A.; Shaw, T. J.; Kopp, R. E.

2017-12-01

Most high resolution (decimeter- and decadal-scale) relative sea-level (RSL) records using salt-marsh microfossils as a proxy only extend through the Common Era, limiting our understanding of driving mechanisms of RSL change and how sea-level is influenced by changing climate. Records beyond the Common Era are limited by the depth of continuous sequences of salt-marsh peat suitable for high resolution reconstructions, as well as contamination by local processes such as sediment compaction. In contrast, sequences of basal peats have produced compaction-free RSL records through the Holocene, but at a low resolution (meter- and centennial-scale). We devise a new Multi-Proxy Presence/Absence Method (MP2AM) to develop a mid-Holocene RSL stack. We stack a series of 1 m basal peat cores that overlap along a uniform elevational gradient above an incompressible basal sand. We analyzed three sea-level indicators from 14 cores: foraminifera, testate amoebae, and stable carbon isotope geochemistry. To reconstruct RSL, this multi-proxy approach uses the timesaving presence/absence of forams and testates to determine the elevation of the highest occurrence of forams and the lowest occurrence of testates in each basal core. We use stable carbon isotope geochemistry to determine the C3/C4 vegetation boundary in each core. We develop age-depth models for each core using a series of radiocarbon dates. The RSL records from each 1 m basal core are combined to create a stack or, in effect, one long core of salt-marsh material. This method removes the issue of compaction to create a continuous RSL record to address temporal changes and periods of climate and sea-level variability. We reconstruct a southern NJ mid-Holocene RSL record from Edwin B. Forsythe National Wildlife Refuge, where Kemp et al. (2013) completed a 2500 yr RSL record using a foraminifera-based transfer function approach. Preliminary radiocarbon dates suggest the basal sequence is at least 4246-4408 cal yrs BP
A Clustal Alignment Improver Using Evolutionary Algorithms

DEFF Research Database (Denmark)

Thomsen, Rene; Fogel, Gary B.; Krink, Thimo

2002-01-01

Multiple sequence alignment (MSA) is a crucial task in bioinformatics. In this paper we extended previous work with evolutionary algorithms (EA) by using MSA solutions obtained from the wellknown Clustal V algorithm as a candidate solution seed of the initial EA population. Our results clearly show...
V-stack piezoelectric actuator

Science.gov (United States)

Ardelean, Emil V.; Clark, Robert L.

2001-07-01

Aeroelastic control of wings by means of a distributed, trailing-edge control surface is of interest with regards to maneuvers, gust alleviation, and flutter suppression. The use of high energy density, piezoelectric materials as motors provides an appealing solution to this problem. A comparative analysis of the state of the art actuators is currently being conducted. A new piezoelectric actuator design is presented. This actuator meets the requirements for trailing edge flap actuation in both stroke and force. It is compact, simple, sturdy, and leverages stroke geometrically with minimum force penalties while displaying linearity over a wide range of stroke. The V-Stack Piezoelectric Actuator, consists of a base, a lever, two piezoelectric stacks, and a pre-tensioning element. The work is performed alternately by the two stacks, placed on both sides of the lever. Pre-tensioning can be readily applied using a torque wrench, obviating the need for elastic elements and this is for the benefit of the stiffness of the actuator. The characteristics of the actuator are easily modified by changing the base or the stacks. A prototype was constructed and tested experimentally to validate the theoretical model.

Clustering of reads with alignment-free measures and quality values.

Science.gov (United States)

Comin, Matteo; Leoni, Andrea; Schimd, Michele

2015-01-01

The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15 %). In this scenario it will be fundamental to exploit quality value information within the alignment-free framework. To the best of our knowledge this is the first study that incorporates quality value information and k-mers counts, in the context of alignment-free measures, for the comparison of reads data. Based on this principles, in this paper we present a family of alignment-free measures called D (q) -type. A set of experiments on simulated and real reads data confirms that the new measures are superior to other classical alignment-free statistics, especially when erroneous reads are considered. Also results on de novo assembly and metagenomic reads classification show that the introduction of quality values improves over standard alignment-free measures. These statistics are implemented in a software called QCluster (http://www.dei.unipd.it/~ciompin/main/qcluster.html).
Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

DEFF Research Database (Denmark)

Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.

2005-01-01

sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human...
Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features.

Science.gov (United States)

Iwata, Hiroaki; Gotoh, Osamu

2012-11-01

Spliced alignment plays a central role in the precise identification of eukaryotic gene structures. Even though many spliced alignment programs have been developed, recent rapid progress in DNA sequencing technologies demands further improvements in software tools. Benchmarking algorithms under various conditions is an indispensable task for the development of better software; however, there is a dire lack of appropriate datasets usable for benchmarking spliced alignment programs. In this study, we have constructed two types of datasets: simulated sequence datasets and actual cross-species datasets. The datasets are designed to correspond to various real situations, i.e. divergent eukaryotic species, different types of reference sequences, and the wide divergence between query and target sequences. In addition, we have developed an extended version of our program Spaln, which incorporates two additional features to the scoring scheme of the original version, and examined this extended version, Spaln2, together with the original Spaln and other representative aligners based on our benchmark datasets. Although the effects of the modifications are not individually striking, Spaln2 is consistently most accurate and reasonably fast in most practical cases, especially for plants and fungi and for increasingly divergent pairs of target and query sequences.
Sequence comparison and phylogenetic analysis of core gene of ...

African Journals Online (AJOL)

STORAGESEVER

2010-07-19

Jul 19, 2010 ... and antisense primers, a single band of 573 base pairs .... Amino acid sequence alignment of Cluster I and Cluster II of phylogenetic tree. First ten sequences ... sequence weighting, postion-spiecific gap penalties and weight.
Tasting soil fungal diversity with earth tongues: phylogenetic test of SATe alignments for environmental ITS data.

Directory of Open Access Journals (Sweden)

Zheng Wang

Full Text Available An abundance of novel fungal lineages have been indicated by DNA sequencing of the nuclear ribosomal ITS region from environmental samples such as soil and wood. Although phylogenetic analysis of these novel lineages is a key component of unveiling the structure and diversity of complex communities, such analyses are rare for environmental ITS data due to the difficulties of aligning this locus across significantly divergent taxa. One potential approach to this issue is simultaneous alignment and tree estimation. We targeted divergent ITS sequences of the earth tongue fungi (Geoglossomycetes, a basal class in the Ascomycota, to assess the performance of SATé, recent software that combines progressive alignment and tree building. We found that SATé performed well in generating high-quality alignments and in accurately estimating the phylogeny of earth tongue fungi. Drawing from a data set of 300 sequences of earth tongues and progressively more distant fungal lineages, 30 insufficiently identified ITS sequences from the public sequence databases were assigned to the Geoglossomycetes. The association between earth tongues and plants has been hypothesized for a long time, but hard evidence is yet to be collected. The ITS phylogeny showed that four ectomycorrhizal isolates shared a clade with Geoglossum but not with Trichoglossum earth tongues, pointing to the significant potential inherent to ecological data mining of environmental samples. Environmental sampling holds the key to many focal questions in mycology, and simultaneous alignment and tree estimation, as performed by SATé, can be a highly efficient companion in that pursuit.
SPHINX--an algorithm for taxonomic binning of metagenomic sequences.

Science.gov (United States)

Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Singh, Nitin Kumar; Mande, Sharmila S

2011-01-01

Compared with composition-based binning algorithms, the binning accuracy and specificity of alignment-based binning algorithms is significantly higher. However, being alignment-based, the latter class of algorithms require enormous amount of time and computing resources for binning huge metagenomic datasets. The motivation was to develop a binning approach that can analyze metagenomic datasets as rapidly as composition-based approaches, but nevertheless has the accuracy and specificity of alignment-based algorithms. This article describes a hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms. Validation results with simulated sequence datasets indicate that SPHINX is able to analyze metagenomic sequences as rapidly as composition-based algorithms. Furthermore, the binning efficiency (in terms of accuracy and specificity of assignments) of SPHINX is observed to be comparable with results obtained using alignment-based algorithms. A web server for the SPHINX algorithm is available at http://metagenomics.atc.tcs.com/SPHINX/.
FEAST: sensitive local alignment with multiple rates of evolution.

Science.gov (United States)

Hudek, Alexander K; Brown, Daniel G

2011-01-01

We present a pairwise local aligner, FEAST, which uses two new techniques: a sensitive extension algorithm for identifying homologous subsequences, and a descriptive probabilistic alignment model. We also present a new procedure for training alignment parameters and apply it to the human and mouse genomes, producing a better parameter set for these sequences. Our extension algorithm identifies homologous subsequences by considering all evolutionary histories. It has higher maximum sensitivity than Viterbi extensions, and better balances specificity. We model alignments with several submodels, each with unique statistical properties, describing strongly similar and weakly similar regions of homologous DNA. Training parameters using two submodels produces superior alignments, even when we align with only the parameters from the weaker submodel. Our extension algorithm combined with our new parameter set achieves sensitivity 0.59 on synthetic tests. In contrast, LASTZ with default settings achieves sensitivity 0.35 with the same false positive rate. Using the weak submodel as parameters for LASTZ increases its sensitivity to 0.59 with high error. FEAST is available at http://monod.uwaterloo.ca/feast/.
Reflector imaging by diffraction stacking with stacking velocity analysis; Jugo sokudo kaiseki wo tomonau sanran jugoho ni yoru hanshamen imaging

Energy Technology Data Exchange (ETDEWEB)

Matsushima, J; Rokugawa, S; Kato, Y [The University of Tokyo, Tokyo (Japan). Faculty of Engineering; Yokota, T [Japan National Oil Corp., Tokyo (Japan); Miyazaki, T [Geological Survey of Japan, Tsukuba (Japan)

1997-10-22

Concerning seismic reflection survey for geometrical arrangement between pits, the scattering stacking method with stacking velocity analysis is compared with the CDP (common depth point horizontal stacking method). The advantages of the CDP supposedly include the following. Since it presumes an average velocity field, it can determine velocities having stacking effects. The method presumes stratification and, since such enables the division of huge quantities of observed data into smaller groups, more data can be calculated in a shorter time period. The method has disadvantages, attributable to its presuming an average velocity field, that accuracy in processing is lower when the velocity field contrast is higher, that accuracy in processing is low unless stratification is employed, and that velocities obtained from stacking velocity analysis are affected by dipped structures. Such shortcomings may be remedied in the scattering stacking method with stacking velocity analysis. Possibilities are that, as far as the horizontal reflection plane is concerned, it may yield stack records higher in S/N ratio than the CDP. Findings relative to dipped reflection planes will be introduced at the presentation. 6 refs., 12 figs.
Dynamics based alignment of proteins: an alternative approach to quantify dynamic similarity

Directory of Open Access Journals (Sweden)

Lyngsø Rune

2010-04-01

Full Text Available Abstract Background The dynamic motions of many proteins are central to their function. It therefore follows that the dynamic requirements of a protein are evolutionary constrained. In order to assess and quantify this, one needs to compare the dynamic motions of different proteins. Comparing the dynamics of distinct proteins may also provide insight into how protein motions are modified by variations in sequence and, consequently, by structure. The optimal way of comparing complex molecular motions is, however, far from trivial. The majority of comparative molecular dynamics studies performed to date relied upon prior sequence or structural alignment to define which residues were equivalent in 3-dimensional space. Results Here we discuss an alternative methodology for comparative molecular dynamics that does not require any prior alignment information. We show it is possible to align proteins based solely on their dynamics and that we can use these dynamics-based alignments to quantify the dynamic similarity of proteins. Our method was tested on 10 representative members of the PDZ domain family. Conclusions As a result of creating pair-wise dynamics-based alignments of PDZ domains, we have found evolutionarily conserved patterns in their backbone dynamics. The dynamic similarity of PDZ domains is highly correlated with their structural similarity as calculated with Dali. However, significant differences in their dynamics can be detected indicating that sequence has a more refined role to play in protein dynamics than just dictating the overall fold. We suggest that the method should be generally applicable.
Analysis of 3D stacked fully functional CMOS Active Pixel Sensor detectors

International Nuclear Information System (INIS)

Passeri, D; Servoli, L; Meroli, S

2009-01-01

The IC technology trend is to move from 3D flexible configurations (package on package, stacked dies) to real 3D ICs. This is mainly due to i) the increased electrical performances and ii) the cost of 3D integration which may be cheaper than to keep shrinking 2D circuits. Perspective advantages for particle tracking and vertex detectors applications in High Energy Physics can be envisaged: in this work, we will focus on the capabilities of the state-of-the-art vertical scale integration technologies, allowing for the fabrication of very compact, fully functional, multiple layers CMOS Active Pixel Sensor (APS) detectors. The main idea is to exploit the features of the 3D technologies for the fabrication of a ''stack'' of very thin and precisely aligned CMOS APS layers, leading to a single, integrated, multi-layers pixel sensor. The adoption of multiple-layers single detectors can dramatically reduce the mass of conventional, separated detectors (thus reducing multiple scattering issues), at the same time allowing for very precise measurements of particle trajectory and momentum. As a proof of concept, an extensive device and circuit simulation activity has been carried out, aiming at evaluate the suitability of such a kind of CMOS active pixel layers for particle tracking purposes.
Interfacial, Electrical, and Band Alignment Characteristics of HfO2/Ge Stacks with In Situ-Formed SiO2 Interlayer by Plasma-Enhanced Atomic Layer Deposition

Science.gov (United States)

Cao, Yan-Qiang; Wu, Bing; Wu, Di; Li, Ai-Dong

2017-05-01

In situ-formed SiO2 was introduced into HfO2 gate dielectrics on Ge substrate as interlayer by plasma-enhanced atomic layer deposition (PEALD). The interfacial, electrical, and band alignment characteristics of the HfO2/SiO2 high-k gate dielectric stacks on Ge have been well investigated. It has been demonstrated that Si-O-Ge interlayer is formed on Ge surface during the in situ PEALD SiO2 deposition process. This interlayer shows fantastic thermal stability during annealing without obvious Hf-silicates formation. In addition, it can also suppress the GeO2 degradation. The electrical measurements show that capacitance equivalent thickness of 1.53 nm and a leakage current density of 2.1 × 10-3 A/cm2 at gate bias of Vfb + 1 V was obtained for the annealed sample. The conduction (valence) band offsets at the HfO2/SiO2/Ge interface with and without PDA are found to be 2.24 (2.69) and 2.48 (2.45) eV, respectively. These results indicate that in situ PEALD SiO2 may be a promising interfacial control layer for the realization of high-quality Ge-based transistor devices. Moreover, it can be demonstrated that PEALD is a much more powerful technology for ultrathin interfacial control layer deposition than MOCVD.
Towards stacked zone plates

International Nuclear Information System (INIS)

Werner, S; Rehbein, S; Guttman, P; Heim, S; Schneider, G

2009-01-01

Fresnel zone plates are the key optical elements for soft and hard x-ray microscopy. For short exposure times and minimum radiation load of the specimen the diffraction efficiency of the zone plate objectives has to be maximized. As the efficiency strongly depends on the height of the diffracting zone structures the achievable aspect ratio of the nanostructures determines these limits. To reach aspect ratios ≥ 20:1 for high efficient optics we propose to superimpose zone plates on top of each other. With this multiplication approach the final aspect ratio is only limited by the number of stacked zone plate layers. For the stack process several nanostructuring process steps have to be developed and/or improved. Our results show for the first time two layers of zone plates stacked on top of each other.
Web-Beagle: a web server for the alignment of RNA secondary structures.

Science.gov (United States)

Mattei, Eugenio; Pietrosanto, Marco; Ferrè, Fabrizio; Helmer-Citterich, Manuela

2015-07-01

Web-Beagle (http://beagle.bio.uniroma2.it) is a web server for the pairwise global or local alignment of RNA secondary structures. The server exploits a new encoding for RNA secondary structure and a substitution matrix of RNA structural elements to perform RNA structural alignments. The web server allows the user to compute up to 10 000 alignments in a single run, taking as input sets of RNA sequences and structures or primary sequences alone. In the latter case, the server computes the secondary structure prediction for the RNAs on-the-fly using RNAfold (free energy minimization). The user can also compare a set of input RNAs to one of five pre-compiled RNA datasets including lncRNAs and 3' UTRs. All types of comparison produce in output the pairwise alignments along with structural similarity and statistical significance measures for each resulting alignment. A graphical color-coded representation of the alignments allows the user to easily identify structural similarities between RNAs. Web-Beagle can be used for finding structurally related regions in two or more RNAs, for the identification of homologous regions or for functional annotation. Benchmark tests show that Web-Beagle has lower computational complexity, running time and better performances than other available methods. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Tunable electro-optic filter stack

Science.gov (United States)

Fontecchio, Adam K.; Shriyan, Sameet K.; Bellingham, Alyssa

2017-09-05

A holographic polymer dispersed liquid crystal (HPDLC) tunable filter exhibits switching times of no more than 20 microseconds. The HPDLC tunable filter can be utilized in a variety of applications. An HPDLC tunable filter stack can be utilized in a hyperspectral imaging system capable of spectrally multiplexing hyperspectral imaging data acquired while the hyperspectral imaging system is airborne. HPDLC tunable filter stacks can be utilized in high speed switchable optical shielding systems, for example as a coating for a visor or an aircraft canopy. These HPDLC tunable filter stacks can be fabricated using a spin coating apparatus and associated fabrication methods.
A MEMORY EFFICIENT HARDWARE BASED PATTERN MATCHING AND PROTEIN ALIGNMENT SCHEMES FOR HIGHLY COMPLEX DATABASES

OpenAIRE

Bennet, M.Anto; Sankaranarayanan, S.; Deepika, M.; Nanthini, N.; Bhuvaneshwari, S.; Priyanka, M.

2017-01-01

Protein sequence alignment to find correlation between different species, or genetic mutations etc. is the most computational intensive task when performing protein comparison. To speed-up the alignment, Systolic Arrays (SAs) have been used. In order to avoid the internal-loop problem which reduces the performance, pipeline interleaving strategy has been presented. This strategy is applied to an SA for Smith Waterman (SW) algorithm which is an alignment algorithm to locally align two proteins...
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

Science.gov (United States)

Xu, Weijia; Ozer, Stuart; Gutell, Robin R

2009-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Correlation and Stacking of Relative Paleointensity and Oxygen Isotope Data

Science.gov (United States)

Lurcock, P. C.; Channell, J. E.; Lee, D.

2012-12-01

The transformation of a depth-series into a time-series is routinely implemented in the geological sciences. This transformation often involves correlation of a depth-series to an astronomically calibrated time-series. Eyeball tie-points with linear interpolation are still regularly used, although these have the disadvantages of being non-repeatable and not based on firm correlation criteria. Two automated correlation methods are compared: the simulated annealing algorithm (Huybers and Wunsch, 2004) and the Match protocol (Lisiecki and Lisiecki, 2002). Simulated annealing seeks to minimize energy (cross-correlation) as "temperature" is slowly decreased. The Match protocol divides records into intervals, applies penalty functions that constrain accumulation rates, and minimizes the sum of the squares of the differences between two series while maintaining the data sequence in each series. Paired relative paleointensity (RPI) and oxygen isotope records, such as those from IODP Site U1308 and/or reference stacks such as LR04 and PISO, are warped using known warping functions, and then the un-warped and warped time-series are correlated to evaluate the efficiency of the correlation methods. Correlations are performed in tandem to simultaneously optimize RPI and oxygen isotope data. Noise spectra are introduced at differing levels to determine correlation efficiency as noise levels change. A third potential method, known as dynamic time warping, involves minimizing the sum of distances between correlated point pairs across the whole series. A "cost matrix" between the two series is analyzed to find a least-cost path through the matrix. This least-cost path is used to nonlinearly map the time/depth of one record onto the depth/time of another. Dynamic time warping can be expanded to more than two dimensions and used to stack multiple time-series. This procedure can improve on arithmetic stacks, which often lose coherent high-frequency content during the stacking process.
Forced Air-Breathing PEMFC Stacks

Directory of Open Access Journals (Sweden)

K. S. Dhathathreyan

2012-01-01

Full Text Available Air-breathing fuel cells have a great potential as power sources for various electronic devices. They differ from conventional fuel cells in which the cells take up oxygen from ambient air by active or passive methods. The air flow occurs through the channels due to concentration and temperature gradient between the cell and the ambient conditions. However developing a stack is very difficult as the individual cell performance may not be uniform. In order to make such a system more realistic, an open-cathode forced air-breathing stacks were developed by making appropriate channel dimensions for the air flow for uniform performance in a stack. At CFCT-ARCI (Centre for Fuel Cell Technology-ARC International we have developed forced air-breathing fuel cell stacks with varying capacity ranging from 50 watts to 1500 watts. The performance of the stack was analysed based on the air flow, humidity, stability, and so forth, The major advantage of the system is the reduced number of bipolar plates and thereby reduction in volume and weight. However, the thermal management is a challenge due to the non-availability of sufficient air flow to remove the heat from the system during continuous operation. These results will be discussed in this paper.
Method for monitoring stack gases for uranium activity

International Nuclear Information System (INIS)

Beverly, C.R.; Ernstberger, H.G.

1988-01-01

A method for sampling stack gases emanating from the purge cascade of a gaseous diffusion cascade system utilized to enrich uranium for determining the presence and extent of uranium in the stack gases in the form of gaseous uranium hexafluoride, is described comprising the steps of removing a side stream of gases from the stack gases, contacting the side stream of the stack gases with a stream of air sufficiently saturated with moisture for reacting with and converting any gaseous uranium hexafluroide contracted thereby in the side stream of stack gases to particulate uranyl fluoride. Thereafter contacting the side stream of stack gases containing the particulate uranyl fluoride with moving filter means for continuously intercepting and conveying the intercepted particulate uranyl fluoride away from the side stream of stack gases, and continually scanning the moving filter means with radiation monitoring means for sensing the presence and extent of particulate uranyl fluoride on the moving filter means which is indicative of the extent of particulate uranyl fluoride in the side stream of stack gases which in turn is indicative of the presence and extent of uranium hexafluoride in the stack gases
Three wavelength optical alignment of the Nova laser

International Nuclear Information System (INIS)

Swift, C.D.; Bliss, E.S.; Jones, W.A.; Seppala, L.G.

1983-01-01

The Nova laser, presently under construction at Lawrence Livermore National Laboratory, will be capable of delivering more than 100 kJ of focused energy to an Inertial Confinement Fusion (ICF) target. Operation at the fundamental wavelength of the laser (1.05 μm) and at the second and third harmonic will be possible. This paper will discuss the optical alignment systems and techniques being implemented to align the laser output to the target at these wavelengths prior to each target irradiation. When experiments require conversion of the laser light to wavelengths of 0.53 μm and 0.35 μm prior to target irradiation, this will be accomplished in harmonic conversion crystals located at the beam entrances to the target chamber. The harmonic alignment system will be capable of introducing colinear alignment beams of all three wavelengths into the laser chains at the final spatial filter. The alignment beam at 1.05 μm will be about three cm in diameter and intense enough to align the conversion crystals. Beams at 0.53 μm and 0.35 μm will be expanded by the spatial filter to full aperture (74 cm) and used to illuminate the target and other alignment aids at the target chamber focus. This harmonic illumination system will include viewing capability as well. A final alignment sensor will be located at the target chamber. It will view images of the chamber focal plane at all three wavelengths. In this way, each beam can be aligned at the desired wavelength to produce the focal pattern required for each target irradiation. The design of the major components in the harmonic alignment system will be described, and a typical alignment sequence for alignment to a target will be presented

Dislocation content of geometrically necessary boundaries aligned with slip planes in rolled aluminium

DEFF Research Database (Denmark)

Hong, Chuanshi; Huang, Xiaoxu; Winther, Grethe

2013-01-01

Previous studies have revealed that dislocation structures in metals with medium-to-high stacking fault energy, depend on the grain orientation and therefore on the slip systems. In the present work, the dislocations in eight slip-plane-aligned geometrically necessary boundaries (GNBs) in three...... expected active dominate. The dislocations predicted inactive are primarily attributed to dislocation reactions in the boundary. Two main types of dislocation networks in the boundaries were identified: (1) a hexagonal network of the three dislocations in the slip plane with which the boundary was aligned......; two of these come from the active slip systems, the third is attributed to dislocation reactions (2) a network of three dislocations from both of the active slip planes; two of these react to form Lomer locks. The results indicate a systematic boundary formation process for the GNBs. Redundant...
Flexural characteristics of a stack leg

International Nuclear Information System (INIS)

Cook, J.

1979-06-01

A 30 MV tandem Van de Graaff accelerator is at present under construction at Daresbury Laboratory. The insulating stack of the machine is of modular construction, each module being 860 mm in length. Each live section stack module contains 8 insulating legs mounted between bulkhead rings. The design, fabrication (from glass discs bonded to stainless steel discs using an epoxy film adhesive) and testing of the stack legs is described. (U.K.)
Crowdsourcing RNA structural alignments with an online computer game.

Science.gov (United States)

Waldispühl, Jérôme; Kam, Arthur; Gardner, Paul P

2015-01-01

The annotation and classification of ncRNAs is essential to decipher molecular mechanisms of gene regulation in normal and disease states. A database such as Rfam maintains alignments, consensus secondary structures, and corresponding annotations for RNA families. Its primary purpose is the automated, accurate annotation of non-coding RNAs in genomic sequences. However, the alignment of RNAs is computationally challenging, and the data stored in this database are often subject to improvements. Here, we design and evaluate Ribo, a human-computing game that aims to improve the accuracy of RNA alignments already stored in Rfam. We demonstrate the potential of our techniques and discuss the feasibility of large scale collaborative annotation and classification of RNA families.
Time-predictable Stack Caching

DEFF Research Database (Denmark)

Abbaspourseyedi, Sahar

completely. Thus, in systems with hard deadlines the worst-case execution time (WCET) of the real-time software running on them needs to be bounded. Modern architectures use features such as pipelining and caches for improving the average performance. These features, however, make the WCET analysis more...... addresses, provides an opportunity to predict and tighten the WCET of accesses to data in caches. In this thesis, we introduce the time-predictable stack cache design and implementation within a time-predictable processor. We introduce several optimizations to our design for tightening the WCET while...... keeping the timepredictability of the design intact. Moreover, we provide a solution for reducing the cost of context switching in a system using the stack cache. In design of these caches, we use custom hardware and compiler support for delivering time-predictable stack data accesses. Furthermore...
Principles for Instructional Stack Development in HyperCard.

Science.gov (United States)

McEneaney, John E.

The purpose of this paper is to provide information about obtaining and using HyperCard stacks that introduce users to principles of stack development. The HyperCard stacks described are available for downloading free of charge from a server at Indiana University South Bend. Specific directions are given for stack use, with advice for beginners. A…
Sequence analysis of cereal sucrose synthase genes and isolation ...

African Journals Online (AJOL)

SERVER

2007-10-18

Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.
A basic analysis toolkit for biological sequences

Directory of Open Access Journals (Sweden)

Siragusa Enrico

2007-09-01

Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.
DBaaS with OpenStack Trove

CERN Document Server

Giardini, Andrea

2013-01-01

The purpose of the project was to evaluate the Trove component for OpenStack, understand if it can be used with the CERN infrastructure and report the benefits and disadvantages of this software. Currently, databases for CERN projects are provided by a DbaaS software developed inside the IT-DB group. This solution works well with the actual infrastructure but it is not easy to maintain. With the migration of the CERN infrastructure to OpenStack the Database group started to evaluate the Trove component. Instead of mantaining an own DbaaS service it can be interesting to migrate everything to OpenStack and replace the actual DbaaS software with Trove. This way both virtual machines and databases will be managed by OpenStack itself.
Skeleton-based human action recognition using multiple sequence alignment

Science.gov (United States)

Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong

2015-05-01

Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.
Leveraging FPGAs for Accelerating Short Read Alignment.

Science.gov (United States)

Arram, James; Kaplan, Thomas; Luk, Wayne; Jiang, Peiyong

2017-01-01

One of the key challenges facing genomics today is how to efficiently analyze the massive amounts of data produced by next-generation sequencing platforms. With general-purpose computing systems struggling to address this challenge, specialized processors such as the Field-Programmable Gate Array (FPGA) are receiving growing interest. The means by which to leverage this technology for accelerating genomic data analysis is however largely unexplored. In this paper, we present a runtime reconfigurable architecture for accelerating short read alignment using FPGAs. This architecture exploits the reconfigurability of FPGAs to allow the development of fast yet flexible alignment designs. We apply this architecture to develop an alignment design which supports exact and approximate alignment with up to two mismatches. Our design is based on the FM-index, with optimizations to improve the alignment performance. In particular, the n-step FM-index, index oversampling, a seed-and-compare stage, and bi-directional backtracking are included. Our design is implemented and evaluated on a 1U Maxeler MPC-X2000 dataflow node with eight Altera Stratix-V FPGAs. Measurements show that our design is 28 times faster than Bowtie2 running with 16 threads on dual Intel Xeon E5-2640 CPUs, and nine times faster than Soap3-dp running on an NVIDIA Tesla C2070 GPU.
An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

Directory of Open Access Journals (Sweden)

Taneda Akito

2008-12-01

Full Text Available Abstract Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared S. cerevisiae genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%. By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.
Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments

Directory of Open Access Journals (Sweden)

Jyh-Da Wei

2017-08-01

Full Text Available High-end graphics processing units (GPUs, such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1, which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs. Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform. Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.
Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments.

Science.gov (United States)

Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

2017-01-01

High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.
Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment

Science.gov (United States)

Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

2014-01-01

Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670
Design pattern mining using distributed learning automata and DNA sequence alignment.

Directory of Open Access Journals (Sweden)

Mansour Esmaeilpour

Full Text Available CONTEXT: Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. OBJECTIVE: This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA and deoxyribonucleic acid (DNA sequences alignment. METHOD: The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. RESULTS: The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. CONCLUSION: The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.
Design pattern mining using distributed learning automata and DNA sequence alignment.

Science.gov (United States)

Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

2014-01-01

Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.
SOAP2: an improved ultrafast tool for short read alignment

DEFF Research Database (Denmark)

Li, Ruiqiang; Yu, Chang; Li, Yingrui

2009-01-01

SUMMARY: SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy...... for indexing the reference sequence in the main memory. We tested it on the whole human genome and found that this new algorithm reduced memory usage from 14.7 to 5.4 GB and improved alignment speed by 20-30 times. SOAP2 is compatible with both single- and paired-end reads. Additionally, this tool now supports...... multiple text and compressed file formats. A consensus builder has also been developed for consensus assembly and SNP detection from alignment of short reads on a reference genome. AVAILABILITY: http://soap.genomics.org.cn....
Stacking the Equiangular Spiral

OpenAIRE

Agrawal, A.; Azabi, Y. O.; Rahman, B. M.

2013-01-01

We present an algorithm that adapts the mature Stack and Draw (SaD) methodology for fabricating the exotic Equiangular Spiral Photonic Crystal Fiber. (ES-PCF) The principle of Steiner chains and circle packing is exploited to obtain a non-hexagonal design using a stacking procedure based on Hexagonal Close Packing. The optical properties of the proposed structure are promising for SuperContinuum Generation. This approach could make accessible not only the equiangular spiral but also other qua...
Sequence analysis by iterated maps, a review.

Science.gov (United States)

Almeida, Jonas S

2014-05-01

Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.
Stochastic stacking without filters

International Nuclear Information System (INIS)

Johnson, R.P.; Marriner, J.

1982-12-01

The rate of accumulation of antiprotons is a critical factor in the design of p anti p colliders. A design of a system to accumulate higher anti p fluxes is presented here which is an alternative to the schemes used at the CERN AA and in the Fermilab Tevatron I design. Contrary to these stacking schemes, which use a system of notch filters to protect the dense core of antiprotons from the high power of the stack tail stochastic cooling, an eddy current shutter is used to protect the core in the region of the stack tail cooling kicker. Without filters one can have larger cooling bandwidths, better mixing for stochastic cooling, and easier operational criteria for the power amplifiers. In the case considered here a flux of 1.4 x 10 8 per sec is achieved with a 4 to 8 GHz bandwidth

Atomic and electronic structure of trilayer graphene/SiC(0001): Evidence of Strong Dependence on Stacking Sequence and charge transfer.

Science.gov (United States)

Pierucci, Debora; Brumme, Thomas; Girard, Jean-Christophe; Calandra, Matteo; Silly, Mathieu G; Sirotti, Fausto; Barbier, Antoine; Mauri, Francesco; Ouerghi, Abdelkarim

2016-09-15

The transport properties of few-layer graphene are the directly result of a peculiar band structure near the Dirac point. Here, for epitaxial graphene grown on SiC, we determine the effect of charge transfer from the SiC substrate on the local density of states (LDOS) of trilayer graphene using scaning tunneling microscopy/spectroscopy and angle resolved photoemission spectroscopy (ARPES). Different spectra are observed and are attributed to the existence of two stable polytypes of trilayer: Bernal (ABA) and rhomboedreal (ABC) staking. Their electronic properties strongly depend on the charge transfer from the substrate. We show that the LDOS of ABC stacking shows an additional peak located above the Dirac point in comparison with the LDOS of ABA stacking. The observed LDOS features, reflecting the underlying symmetry of the two polytypes, were reproduced by explicit calculations within density functional theory (DFT) including the charge transfer from the substrate. These findings demonstrate the pronounced effect of stacking order and charge transfer on the electronic structure of trilayer or few layer graphene. Our approach represents a significant step toward understand the electronic properties of graphene layer under electrical field.
Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework

Directory of Open Access Journals (Sweden)

Toh Hiroyuki

2008-04-01

Full Text Available Abstract Background Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs. Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized. Results We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1 pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2 a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage. Conclusion The BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at the webpage found in the Availability and requirements section.
Assessing Elementary Algebra with STACK

Science.gov (United States)

Sangwin, Christopher J.

2007-01-01

This paper concerns computer aided assessment (CAA) of mathematics in which a computer algebra system (CAS) is used to help assess students' responses to elementary algebra questions. Using a methodology of documentary analysis, we examine what is taught in elementary algebra. The STACK CAA system, http://www.stack.bham.ac.uk/, which uses the CAS…
Learning OpenStack networking (Neutron)

CERN Document Server

Denton, James

2014-01-01

If you are an OpenStack-based cloud operator with experience in OpenStack Compute and nova-network but are new to Neutron networking, then this book is for you. Some networking experience is recommended, and a physical network infrastructure is required to provide connectivity to instances and other network resources configured in the book.
Consolidity: Stack-based systems change pathway theory elaborated

Directory of Open Access Journals (Sweden)

Hassen Taher Dorrah

2014-06-01

Full Text Available This paper presents an elaborated analysis for investigating the stack-based layering processes during the systems change pathway. The system change pathway is defined as the path resulting from the combinations of all successive changes induced on the system when subjected to varying environments, activities, events, or any excessive internal or external influences and happenings “on and above” its normal stands, situations or set-points during its course of life. The analysis is essentially based on the important overall system paradigm of “Time driven-event driven-parameters change”. Based on this paradigm, it is considered that any affected activity, event or varying environment is intelligently self-recorded inside the system through an incremental consolidity-scaled change in system parameters of the stack-based layering types. Various joint stack-based mathematical and graphical approaches supported by representable case studies are suggested for the identification, extraction, and processing of various stack-based systems changes layering of different classifications and categorizations. Moreover, some selected real life illustrative applications are provided to demonstrate the (infinite stack-based identification and recognition of the change pathway process in the areas of geology, archeology, life sciences, ecology, environmental science, engineering, materials, medicine, biology, sociology, humanities, and other important fields. These case studies and selected applications revealed that there are general similarities of the stack-based layering structures and formations among all the various research fields. Such general similarities clearly demonstrate the global concept of the “fractals-general stacking behavior” of real life systems during their change pathways. Therefore, it is recommended that concentrated efforts should be expedited toward building generic modular stack-based systems or blocks for the mathematical
On calculating the probability of a set of orthologous sequences

Directory of Open Access Journals (Sweden)

Junfeng Liu

2009-02-01

Full Text Available Junfeng Liu1,2, Liang Chen3, Hongyu Zhao4, Dirk F Moore1,2, Yong Lin1,2, Weichung Joe Shih1,21Biometrics Division, The Cancer, Institute of New Jersey, New Brunswick, NJ, USA; 2Department of Biostatistics, School of Public Health, University of Medicine and Dentistry of New Jersey, Piscataway, NJ, USA; 3Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA; 4Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, USAAbstract: Probabilistic DNA sequence models have been intensively applied to genome research. Within the evolutionary biology framework, this article investigates the feasibility for rigorously estimating the probability of a set of orthologous DNA sequences which evolve from a common progenitor. We propose Monte Carlo integration algorithms to sample the unknown ancestral and/or root sequences a posteriori conditional on a reference sequence and apply pairwise Needleman–Wunsch alignment between the sampled and nonreference species sequences to estimate the probability. We test our algorithms on both simulated and real sequences and compare calculated probabilities from Monte Carlo integration to those induced by single multiple alignment.Keywords: evolution, Jukes–Cantor model, Monte Carlo integration, Needleman–Wunsch alignment, orthologous
Glassy carbon based supercapacitor stacks

Energy Technology Data Exchange (ETDEWEB)

Baertsch, M; Braun, A; Koetz, R; Haas, O [Paul Scherrer Inst. (PSI), Villigen (Switzerland)

1997-06-01

Considerable effort is being made to develop electrochemical double layer capacitors (EDLC) that store relatively large quantities of electrical energy and possess at the same time a high power density. Our previous work has shown that glassy carbon is suitable as a material for capacitor electrodes concerning low resistance and high capacity requirements. We present the development of bipolar electrochemical glassy carbon capacitor stacks of up to 3 V. Bipolar stacks are an efficient way to meet the high voltage and high power density requirements for traction applications. Impedance and cyclic voltammogram measurements are reported here and show the frequency response of a 1, 2, and 3 V stack. (author) 3 figs., 1 ref..
Full-length sequencing and identification of novel polymorphisms in ...

Indian Academy of Sciences (India)

The aim of this work was to sequence the entirecoding region of ACACA gene in Valle del Belice sheep breed to identify polymorphic sites. A total of 51 coding exons of ACACA gene were sequenced in 32 individuals of Valle del Belice sheep breed. Sequencing analysis and alignment of obtained sequences showed the ...
Status of MCFC stack technology at IHI

Energy Technology Data Exchange (ETDEWEB)

Hosaka, M.; Morita, T.; Matsuyama, T.; Otsubo, M. [Ishikawajima-Harima Heavy Industries Co., Ltd., Tokyo (Japan)

1996-12-31

The molten carbonate fuel cell (MCFC) is a promising option for highly efficient power generation possible to enlarge. IHI has been studying parallel flow MCFC stacks with internal manifolds that have a large electrode area of 1m{sup 2}. IHI will make two 250 kW stacks for MW plant, and has begun to make cell components for the plant. To improve the stability of stack, soft corrugated plate used in the separator has been developed, and a way of gathering current from stacks has been studied. The DC output potential of the plant being very high, the design of electric insulation will be very important. A 20 kW short stack test was conducted in 1995 FY to certificate some of the improvements and components of the MW plant. These activities are presented below.
Characterization and sequence analysis of cysteine and glycine-rich ...

African Journals Online (AJOL)

Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).
Dynamical stability of slip-stacking particles

Energy Technology Data Exchange (ETDEWEB)

Eldred, Jeffrey; Zwaska, Robert

2014-09-01

We study the stability of particles in slip-stacking configuration, used to nearly double proton beam intensity at Fermilab. We introduce universal area factors to calculate the available phase space area for any set of beam parameters without individual simulation. We find perturbative solutions for stable particle trajectories. We establish Booster beam quality requirements to achieve 97% slip-stacking efficiency. We show that slip-stacking dynamics directly correspond to the driven pendulum and to the system of two standing-wave traps moving with respect to each other.
Nonparametric combinatorial sequence models.

Science.gov (United States)

Wauthier, Fabian L; Jordan, Michael I; Jojic, Nebojsa

2011-11-01

This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This article presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three biological sequence families which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution over sequence representations induced by the prior. By integrating out the posterior, our method compares favorably to leading binding predictors.
Finding optimal interaction interface alignments between biological complexes

KAUST Repository

Cui, Xuefeng

2015-06-13

Motivation: Biological molecules perform their functions through interactions with other molecules. Structure alignment of interaction interfaces between biological complexes is an indispensable step in detecting their structural similarities, which are keys to understanding their evolutionary histories and functions. Although various structure alignment methods have been developed to successfully access the similarities of protein structures or certain types of interaction interfaces, existing alignment tools cannot directly align arbitrary types of interfaces formed by protein, DNA or RNA molecules. Specifically, they require a \\'blackbox preprocessing\\' to standardize interface types and chain identifiers. Yet their performance is limited and sometimes unsatisfactory. Results: Here we introduce a novel method, PROSTA-inter, that automatically determines and aligns interaction interfaces between two arbitrary types of complex structures. Our method uses sequentially remote fragments to search for the optimal superimposition. The optimal residue matching problem is then formulated as a maximum weighted bipartite matching problem to detect the optimal sequence order-independent alignment. Benchmark evaluation on all non-redundant protein-DNA complexes in PDB shows significant performance improvement of our method over TM-align and iAlign (with the \\'blackbox preprocessing\\'). Two case studies where our method discovers, for the first time, structural similarities between two pairs of functionally related protein-DNA complexes are presented. We further demonstrate the power of our method on detecting structural similarities between a protein-protein complex and a protein-RNA complex, which is biologically known as a protein-RNA mimicry case. © The Author 2015. Published by Oxford University Press.
YAHA: fast and flexible long-read alignment with optimal breakpoint detection.

Science.gov (United States)

Faust, Gregory G; Hall, Ira M

2012-10-01

With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA. imh4y@virginia.edu.
The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences.

Science.gov (United States)

Fourment, Mathieu; Gibbs, Mark J

2008-02-05

Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.
Spherical Torus Center Stack Design

International Nuclear Information System (INIS)

C. Neumeyer; P. Heitzenroeder; C. Kessel; M. Ono; M. Peng; J. Schmidt; R. Woolley; I. Zatz

2002-01-01

The low aspect ratio spherical torus (ST) configuration requires that the center stack design be optimized within a limited available space, using materials within their established allowables. This paper presents center stack design methods developed by the National Spherical Torus Experiment (NSTX) Project Team during the initial design of NSTX, and more recently for studies of a possible next-step ST (NSST) device
The untyped stack calculus and Bohm's theorem

Directory of Open Access Journals (Sweden)

Alberto Carraro

2013-03-01

Full Text Available The stack calculus is a functional language in which is in a Curry-Howard correspondence with classical logic. It enjoys confluence but, as well as Parigot's lambda-mu, does not admit the Bohm Theorem, typical of the lambda-calculus. We present a simple extension of stack calculus which is for the stack calculus what Saurin's Lambda-mu is for lambda-mu.
Full Piezoelectric Multilayer-Stacked Hybrid Actuation/Transduction Systems

Science.gov (United States)

Su, Ji; Jiang, Xiaoning; Zu, Tian-Bing

2011-01-01

The Stacked HYBATS (Hybrid Actuation/Transduction system) demonstrates significantly enhanced electromechanical performance by using the cooperative contributions of the electromechanical responses of multilayer, stacked negative strain components and positive strain components. Both experimental and theoretical studies indicate that, for Stacked HYBATS, the displacement is over three times that of a same-sized conventional flextensional actuator/transducer. The coupled resonance mode between positive strain and negative strain components of Stacked HYBATS is much stronger than the resonance of a single element actuation only when the effective lengths of the two kinds of elements match each other. Compared with the previously invented hybrid actuation system (HYBAS), the multilayer Stacked HYBATS can be designed to provide high mechanical load capability, low voltage driving, and a highly effective piezoelectric constant. The negative strain component will contract, and the positive strain component will expand in the length directions when an electric field is applied on the device. The interaction between the two elements makes an enhanced motion along the Z direction for Stacked-HYBATS. In order to dominate the dynamic length of Stacked-HYBATS by the negative strain component, the area of the cross-section for the negative strain component will be much larger than the total cross-section areas of the two positive strain components. The transverse strain is negative and longitudinal strain positive in inorganic materials, such as ceramics/single crystals. Different piezoelectric multilayer stack configurations can make a piezoelectric ceramic/single-crystal multilayer stack exhibit negative strain or positive strain at a certain direction without increasing the applied voltage. The difference of this innovation from the HYBAS is that all the elements can be made from one-of-a-kind materials. Stacked HYBATS can provide an extremely effective piezoelectric
CORAL: aligning conserved core regions across domain families.

Science.gov (United States)

Fong, Jessica H; Marchler-Bauer, Aron

2009-08-01

Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile-profile method CORAL that aligns individual core regions as gap-free units. CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved 'readability' that facilitate manual refinement. CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. Supplementary data are available at Bioinformatics online.
Simple Stacking Methods for Silicon Micro Fuel Cells

Directory of Open Access Journals (Sweden)

Gianmario Scotti

2014-08-01

Full Text Available We present two simple methods, with parallel and serial gas flows, for the stacking of microfabricated silicon fuel cells with integrated current collectors, flow fields and gas diffusion layers. The gas diffusion layer is implemented using black silicon. In the two stacking methods proposed in this work, the fluidic apertures and gas flow topology are rotationally symmetric and enable us to stack fuel cells without an increase in the number of electrical or fluidic ports or interconnects. Thanks to this simplicity and the structural compactness of each cell, the obtained stacks are very thin (~1.6 mm for a two-cell stack. We have fabricated two-cell stacks with two different gas flow topologies and obtained an open-circuit voltage (OCV of 1.6 V and a power density of 63 mW·cm−2, proving the viability of the design.

Multiple alignment analysis on phylogenetic tree of the spread of SARS epidemic using distance method

Science.gov (United States)

Amiroch, S.; Pradana, M. S.; Irawan, M. I.; Mukhlash, I.

2017-09-01

Multiple Alignment (MA) is a particularly important tool for studying the viral genome and determine the evolutionary process of the specific virus. Application of MA in the case of the spread of the Severe acute respiratory syndrome (SARS) epidemic is an interesting thing because this virus epidemic a few years ago spread so quickly that medical attention in many countries. Although there has been a lot of software to process multiple sequences, but the use of pairwise alignment to process MA is very important to consider. In previous research, the alignment between the sequences to process MA algorithm, Super Pairwise Alignment, but in this study used a dynamic programming algorithm Needleman wunchs simulated in Matlab. From the analysis of MA obtained and stable region and unstable which indicates the position where the mutation occurs, the system network topology that produced the phylogenetic tree of the SARS epidemic distance method, and system area networks mutation.
Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix.

Directory of Open Access Journals (Sweden)

Jakob H Havgaard

2007-10-01

Full Text Available It has become clear that noncoding RNAs (ncRNA play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may not be present and pre-folding ignores the comparative information. Here, pruning of the dynamical programming matrix is presented as an alternative novel heuristic constraint. All subalignments that do not exceed a length-dependent minimum score are discarded as the matrix is filled out, thus giving the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained. Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool for searching for new ncRNAs. The software package is available for download at http://foldalign.ku.dk.
Open stack thermal battery tests

Energy Technology Data Exchange (ETDEWEB)

Long, Kevin N. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Roberts, Christine C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Grillet, Anne M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Headley, Alexander J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Fenton, Kyle [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Wong, Dennis [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Ingersoll, David [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2017-04-17

We present selected results from a series of Open Stack thermal battery tests performed in FY14 and FY15 and discuss our findings. These tests were meant to provide validation data for the comprehensive thermal battery simulation tools currently under development in Sierra/Aria under known conditions compared with as-manufactured batteries. We are able to satisfy this original objective in the present study for some test conditions. Measurements from each test include: nominal stack pressure (axial stress) vs. time in the cold state and during battery ignition, battery voltage vs. time against a prescribed current draw with periodic pulses, and images transverse to the battery axis from which cell displacements are computed. Six battery configurations were evaluated: 3, 5, and 10 cell stacks sandwiched between 4 layers of the materials used for axial thermal insulation, either Fiberfrax Board or MinK. In addition to the results from 3, 5, and 10 cell stacks with either in-line Fiberfrax Board or MinK insulation, a series of cell-free “control” tests were performed that show the inherent settling and stress relaxation based on the interaction between the insulation and heat pellets alone.
Adding large EM stack support

KAUST Repository

Holst, Glendon

2016-12-01

Serial section electron microscopy (SSEM) image stacks generated using high throughput microscopy techniques are an integral tool for investigating brain connectivity and cell morphology. FIB or 3View scanning electron microscopes easily generate gigabytes of data. In order to produce analyzable 3D dataset from the imaged volumes, efficient and reliable image segmentation is crucial. Classical manual approaches to segmentation are time consuming and labour intensive. Semiautomatic seeded watershed segmentation algorithms, such as those implemented by ilastik image processing software, are a very powerful alternative, substantially speeding up segmentation times. We have used ilastik effectively for small EM stacks – on a laptop, no less; however, ilastik was unable to carve the large EM stacks we needed to segment because its memory requirements grew too large – even for the biggest workstations we had available. For this reason, we refactored the carving module of ilastik to scale it up to large EM stacks on large workstations, and tested its efficiency. We modified the carving module, building on existing blockwise processing functionality to process data in manageable chunks that can fit within RAM (main memory). We review this refactoring work, highlighting the software architecture, design choices, modifications, and issues encountered.
Laser pulse stacking method

Science.gov (United States)

Moses, E.I.

1992-12-01

A laser pulse stacking method is disclosed. A problem with the prior art has been the generation of a series of laser beam pulses where the outer and inner regions of the beams are generated so as to form radially non-synchronous pulses. Such pulses thus have a non-uniform cross-sectional area with respect to the outer and inner edges of the pulses. The present invention provides a solution by combining the temporally non-uniform pulses in a stacking effect to thus provide a more uniform temporal synchronism over the beam diameter. 2 figs.
Support for linguistic macrofamilies from weighted sequence alignment

Science.gov (United States)

Jäger, Gerhard

2015-01-01

Computational phylogenetics is in the process of revolutionizing historical linguistics. Recent applications have shed new light on controversial issues, such as the location and time depth of language families and the dynamics of their spread. So far, these approaches have been limited to single-language families because they rely on a large body of expert cognacy judgments or grammatical classifications, which is currently unavailable for most language families. The present study pursues a different approach. Starting from raw phonetic transcription of core vocabulary items from very diverse languages, it applies weighted string alignment to track both phonetic and lexical change. Applied to a collection of ∼1,000 Eurasian languages and dialects, this method, combined with phylogenetic inference, leads to a classification in excellent agreement with established findings of historical linguistics. Furthermore, it provides strong statistical support for several putative macrofamilies contested in current historical linguistics. In particular, there is a solid signal for the Nostratic/Eurasiatic macrofamily. PMID:26403857
The impact of base stacking on the conformations and electrostatics of single-stranded DNA.

Science.gov (United States)

Plumridge, Alex; Meisburger, Steve P; Andresen, Kurt; Pollack, Lois

2017-04-20

Single-stranded DNA (ssDNA) is notable for its interactions with ssDNA binding proteins (SSBs) during fundamentally important biological processes including DNA repair and replication. Previous work has begun to characterize the conformational and electrostatic properties of ssDNA in association with SSBs. However, the conformational distributions of free ssDNA have been difficult to determine. To capture the vast array of ssDNA conformations in solution, we pair small angle X-ray scattering with novel ensemble fitting methods, obtaining key parameters such as the size, shape and stacking character of strands with different sequences. Complementary ion counting measurements using inductively coupled plasma atomic emission spectroscopy are employed to determine the composition of the ion atmosphere at physiological ionic strength. Applying this combined approach to poly dA and poly dT, we find that the global properties of these sequences are very similar, despite having vastly different propensities for single-stranded helical stacking. These results suggest that a relatively simple mechanism for the binding of ssDNA to non-specific SSBs may be at play, which explains the disparity in binding affinities observed for these systems. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer.

Directory of Open Access Journals (Sweden)

Raquel Bromberg

2016-06-01

Full Text Available Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz.
Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer

Science.gov (United States)

Grishin, Nick V.; Otwinowski, Zbyszek

2016-01-01

Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz. PMID:27336403
Automated insertion of sequences into a ribosomal RNA alignment: An application of computational linguistics in molecular biology

Energy Technology Data Exchange (ETDEWEB)

Taylor, Ronald C. [Case Western Reserve Univ., Cleveland, OH (United States)

1991-11-01

This thesis involved the construction of (1) a grammar that incorporates knowledge on base invariancy and secondary structure in a molecule and (2) a parser engine that uses the grammar to position bases into the structural subunits of the molecule. These concepts were combined with a novel pinning technique to form a tool that semi-automates insertion of a new species into the alignment for the 16S rRNA molecule (a component of the ribosome) maintained by Dr. Carl Woese`s group at the University of Illinois at Urbana. The tool was tested on species extracted from the alignment and on a group of entirely new species. The results were very encouraging, and the tool should be substantial aid to the curators of the 16S alignment. The construction of the grammar was itself automated, allowing application of the tool to alignments for other molecules. The logic programming language Prolog was used to construct all programs involved. The computational linguistics approach used here was found to be a useful way to attach the problem of insertion into an alignment.
Automated insertion of sequences into a ribosomal RNA alignment: An application of computational linguistics in molecular biology

Energy Technology Data Exchange (ETDEWEB)

Taylor, R.C.

1991-11-01

This thesis involved the construction of (1) a grammar that incorporates knowledge on base invariancy and secondary structure in a molecule and (2) a parser engine that uses the grammar to position bases into the structural subunits of the molecule. These concepts were combined with a novel pinning technique to form a tool that semi-automates insertion of a new species into the alignment for the 16S rRNA molecule (a component of the ribosome) maintained by Dr. Carl Woese's group at the University of Illinois at Urbana. The tool was tested on species extracted from the alignment and on a group of entirely new species. The results were very encouraging, and the tool should be substantial aid to the curators of the 16S alignment. The construction of the grammar was itself automated, allowing application of the tool to alignments for other molecules. The logic programming language Prolog was used to construct all programs involved. The computational linguistics approach used here was found to be a useful way to attach the problem of insertion into an alignment.
Planar intrinsic Josephson junctions with in-plane aligned YBCO films

CERN Document Server

Zhang, L; Kobayashi, T; Goto, T; Mukaida, M

2002-01-01

Planar type devices were fabricated by patterning in-plane aligned YBa sub 2 Cu sub 3 O sub 7 sub - subdelta (YBCO) films. The current-voltage characteristics along the c-axis at various temperatures and oxygen contents were measured. The current voltage curves showing supercurrent and hysteresis were obtained for the samples annealed at an oxygen pressure of 1.3 x 10 sup 4 Pa, while the supercurrent and hysteresis became smaller and even disappeared as the oxygen pressure decreased. The relationships between the critical currents and temperatures are similar to those of d-wave superconducting tunnel junctions. These results indicate the formation of stacks of intrinsic Josephson junctions, which are useful for developing high-frequency electron devices.
Planar intrinsic Josephson junctions with in-plane aligned YBCO films

International Nuclear Information System (INIS)

Zhang, L; Moriya, M; Kobayashi, T; Goto, T; Mukaida, M

2002-01-01

Planar type devices were fabricated by patterning in-plane aligned YBa 2 Cu 3 O 7-δ (YBCO) films. The current-voltage characteristics along the c-axis at various temperatures and oxygen contents were measured. The current voltage curves showing supercurrent and hysteresis were obtained for the samples annealed at an oxygen pressure of 1.3 x 10 4 Pa, while the supercurrent and hysteresis became smaller and even disappeared as the oxygen pressure decreased. The relationships between the critical currents and temperatures are similar to those of d-wave superconducting tunnel junctions. These results indicate the formation of stacks of intrinsic Josephson junctions, which are useful for developing high-frequency electron devices
Levitation characteristics of HTS tape stacks

Energy Technology Data Exchange (ETDEWEB)

Pokrovskiy, S. V.; Ermolaev, Y. S.; Rudnev, I. A. [National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow (Russian Federation)

2015-03-15

Due to the considerable development of the technology of second generation high-temperature superconductors and a significant improvement in their mechanical and transport properties in the last few years it is possible to use HTS tapes in the magnetic levitation systems. The advantages of tapes on a metal substrate as compared with bulk YBCO material primarily in the strength, and the possibility of optimizing the convenience of manufacturing elements of levitation systems. In the present report presents the results of the magnetic levitation force measurements between the stack of HTS tapes containing of tapes and NdFeB permanent magnet in the FC and ZFC regimes. It was found a non- linear dependence of the levitation force from the height of the array of stack in both modes: linear growth at small thickness gives way to flattening and constant at large number of tapes in the stack. Established that the levitation force of stacks comparable to that of bulk samples. The numerical calculations using finite element method showed that without the screening of the applied field the levitation force of the bulk superconductor and the layered superconductor stack with a critical current of tapes increased by the filling factor is exactly the same, and taking into account the screening force slightly different.
Polyfluorophore Labels on DNA: Dramatic Sequence Dependence of Quenching

Science.gov (United States)

Teo, Yin Nah; Wilson, James N.

2010-01-01

We describe studies carried out in the DNA context to test how a common fluorescence quencher, dabcyl, interacts with oligodeoxynu-cleoside fluorophores (ODFs)—a system of stacked, electronically interacting fluorophores built on a DNA scaffold. We tested twenty different tetrameric ODF sequences containing varied combinations and orderings of pyrene (Y), benzopyrene (B), perylene (E), dimethylaminostilbene (D), and spacer (S) monomers conjugated to the 3′ end of a DNA oligomer. Hybridization of this probe sequence to a dabcyl-labeled complementary strand resulted in strong quenching of fluorescence in 85% of the twenty ODF sequences. The high efficiency of quenching was also established by their large Stern–Volmer constants (KSV) of between 2.1 × 104 and 4.3 × 105M−1, measured with a free dabcyl quencher. Interestingly, quenching of ODFs displayed strong sequence dependence. This was particularly evident in anagrams of ODF sequences; for example, the sequence BYDS had a KSV that was approximately two orders of magnitude greater than that of BSDY, which has the same dye composition. Other anagrams, for example EDSY and ESYD, also displayed different responses upon quenching by dabcyl. Analysis of spectra showed that apparent excimer and exciplex emission bands were quenched with much greater efficiency compared to monomer emission bands by at least an order of magnitude. This suggests an important role played by delocalized excited states of the π stack of fluorophores in the amplified quenching of fluorescence. PMID:19780115
Start-Stop Test Procedures on the PEMFC Stack Level

DEFF Research Database (Denmark)

Mitzel, Jens; Nygaard, Frederik; Veltzé, Sune

The test is addressed to investigate the influence on stack durability of a long stop followed by a restart of a stack. Long stop should be defined as a stop in which the anodic compartment is fully filled by air due to stack leakages. In systems, leakage level of the stack is low and time to fil...
How genome complexity can explain the difficulty of aligning reads to genomes.

Science.gov (United States)

Phan, Vinhthuy; Gao, Shanshan; Tran, Quang; Vo, Nam S

2015-01-01

Although it is frequently observed that aligning short reads to genomes becomes harder if they contain complex repeat patterns, there has not been much effort to quantify the relationship between complexity of genomes and difficulty of short-read alignment. Existing measures of sequence complexity seem unsuitable for the understanding and quantification of this relationship. We investigated several measures of complexity and found that length-sensitive measures of complexity had the highest correlation to accuracy of alignment. In particular, the rate of distinct substrings of length k, where k is similar to the read length, correlated very highly to alignment performance in terms of precision and recall. We showed how to compute this measure efficiently in linear time, making it useful in practice to estimate quickly the difficulty of alignment for new genomes without having to align reads to them first. We showed how the length-sensitive measures could provide additional information for choosing aligners that would align consistently accurately on new genomes. We formally established a connection between genome complexity and the accuracy of short-read aligners. The relationship between genome complexity and alignment accuracy provides additional useful information for selecting suitable aligners for new genomes. Further, this work suggests that the complexity of genomes sometimes should be thought of in terms of specific computational problems, such as the alignment of short reads to genomes.
RV-Typer: A Web Server for Typing of Rhinoviruses Using Alignment-Free Approach.

Directory of Open Access Journals (Sweden)

Pandurang S Kolekar

Full Text Available Rhinoviruses (RV are increasingly being reported to cause mild to severe infections of respiratory tract in humans. RV are antigenically the most diverse species of the genus Enterovirus and family Picornaviridae. There are three species of RV (RV-A, -B and -C, with 80, 32 and 55 serotypes/types, respectively. Antigenic variation is the main limiting factor for development of a cross-protective vaccine against RV.Serotyping of Rhinoviruses is carried out using cross-neutralization assays in cell culture. However, these assays become laborious and time-consuming for the large number of strains. Alternatively, serotyping of RV is carried out by alignment-based phylogeny of both protein and nucleotide sequences of VP1. However, serotyping of RV based on alignment-based phylogeny is a multi-step process, which needs to be repeated every time a new isolate is sequenced. In view of the growing need for serotyping of RV, an alignment-free method based on "return time distribution" (RTD of amino acid residues in VP1 protein has been developed and implemented in the form of a web server titled RV-Typer. RV-Typer accepts nucleotide or protein sequences as an input and computes return times of di-peptides (k = 2 to assign serotypes. The RV-Typer performs with 100% sensitivity and specificity. It is significantly faster than alignment-based methods. The web server is available at http://bioinfo.net.in/RV-Typer/home.html.
SRS reactor stack plume marking tests

International Nuclear Information System (INIS)

Petry, S.F.

1992-03-01

Tests performed in 105-K in 1987 and 1988 demonstrated that the stack plume can successfully be made visible (i.e., marked) by introducing smoke into the stack breech. The ultimate objective of these tests is to provide a means during an emergency evacuation so that an evacuee can readily identify the stack plume and evacuate in the opposite direction, thus minimizing the potential of severe radiation exposure. The EPA has also requested DOE to arrange for more tests to settle a technical question involving the correct calculation of stack downwash. New test canisters were received in 1988 designed to produce more smoke per unit time; however, these canisters have not been evaluated, because normal ventilation conditions have not been reestablished in K Area. Meanwhile, both the authorization and procedure to conduct the tests have expired. The tests can be performed during normal reactor operation. It is recommended that appropriate authorization and procedure approval be obtained to resume testing after K Area restart
Trace interpolation by slant-stack migration

International Nuclear Information System (INIS)

Novotny, M.

1990-01-01

The slant-stack migration formula based on the radon transform is studied with respect to the depth steep Δz of wavefield extrapolation. It can be viewed as a generalized trace-interpolation procedure including wave extrapolation with an arbitrary step Δz. For Δz > 0 the formula yields the familiar plane-wave decomposition, while for Δz > 0 it provides a robust tool for migration transformation of spatially under sampled wavefields. Using the stationary phase method, it is shown that the slant-stack migration formula degenerates into the Rayleigh-Sommerfeld integral in the far-field approximation. Consequently, even a narrow slant-stack gather applied before the diffraction stack can significantly improve the representation of noisy data in the wavefield extrapolation process. The theory is applied to synthetic and field data to perform trace interpolation and dip reject filtration. The data examples presented prove that the radon interpolator works well in the dip range, including waves with mutual stepouts smaller than half the dominant period

Some links on this page may take you to non-federal websites. Their policies may differ from this site.